[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 12: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1218/ -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel()
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/8050 Change subject: IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel() .. IMPALA-5923: Print binary ID as hex in ChildQuery::Cancel() ChildQuery::Cancel() prints a binary ID into the log which can show up as random characters. One fix is to print it as a hex string. I tested this by running test_cancellation::test_cancel_insert and making sure the ID is printed as hex. Change-Id: Ie1a9516d5c03524e2585255700bb84e8a301a7ee --- M be/src/service/child-query.cc M be/src/util/debug-util.cc 2 files changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/8050/1 -- To view, visit http://gerrit.cloudera.org:8080/8050 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie1a9516d5c03524e2585255700bb84e8a301a7ee Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5856: Fix outer join predicate assignment. .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/8039/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS1, Line 1381: // Right-most full-outer join table ref that is equal to or to the left of the : // table ref materializing 'tids'. Trying to visualize this hurts my brain :) Not sure if it helps to understand the logic in this function. The comment in L1385 should suffice. PS1, Line 1385: this What is "this" refer to? After reading this more carefully, I think I understand what it means. I think it is better to move the comment below L1388. PS1, Line 1398: globalState_.ojClauseByConjunct.get(e.getId()); getOjRef(e)? http://gerrit.cloudera.org:8080/#/c/8039/1/testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test File testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test: PS1, Line 1020: from the On-clause of a left outer join Out of curiosity, did you check if we have the same issue if there is a left outer join followed by a full outer join? If we don't have a test like that, maybe add it for completion. Line 1028: and t1.bigint_col > 10 and t2.bigint_col > 30 To make it more interesting, add a where clause with one predicate that references t1 and t2 and one that references t3. -- To view, visit http://gerrit.cloudera.org:8080/8039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex BehmGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5211: Simplifying ifnull conditional.
Alex Behm has posted comments on this change. Change subject: IMPALA-5211: Simplifying ifnull conditional. .. Patch Set 6: (16 comments) http://gerrit.cloudera.org:8080/#/c/7829/6//COMMIT_MSG Commit Message: Line 9: Re-writes ifnull(x, y) into if(x IS DISTINCT FROM y, x, NULL), Comment somewhat hard to read, maybe create bullet points foe each change, should be clear which one is a "conversion" and which one adds a new rewrite rule I prefer "convert" instead of re-write because the latter has a very specific meaning to us (done via ExprRewriteRule) Line 18: The error messages are slightly altered by this change: Not just error messages, might be easier to just state that the original ifnull() is printed as the converted if() everywhere including error messages, explain plans, column labels, etc. Line 20: Before: Not sure we need to flesh this out so much. Like you said below, this is what you already get with similar conversions, e.g., NVL2() or decode() http://gerrit.cloudera.org:8080/#/c/7829/6/be/src/exprs/conditional-functions-ir.cc File be/src/exprs/conditional-functions-ir.cc: Line 46 I like red! http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 85: public static Expr createExpr(FunctionName fnName, FunctionParams params) I think we should add tests along the lines of what we have for NVL2() in AnalyzeExprsTest#TestFunctionCallExpr Line 86: throws AnalysisException { This isn't right - createExpr() is and should only be called from the parser from which we should not throw an AnalysisException. We should strive to keep parsing and analysis distinct phases. I think the right way to fix the dilemma below is to only transform the NULLIF() into IF() if there are exactly two arguments - otherwise we can just create a FunctionCallExpr with the original NULLIF function name which will later fail gracefully during analysis. Line 108: NullLiteral.create(Type.NULL) // NULL Use new NullLiteral(). The create() returns an *analyzed* NullLiteral which is not what you want because we have not started analysis yet (still in parsing). create() is typically used for creating an analyzed NullLiteral of a specific (non-NULL) type http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java: Line 152: // Unary functions like isfalse, isnotfalse, istrue, isnottrue, nullvalue, Move to class comment Line 155: // nullif and nvl2 are re-written into an if. re-written -> converted Line 204:* Note that FunctionalCallExpr.createExpr() re-writes "nvl2" into "if", re-writes -> converts http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java File fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java: Line 39: public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException { Doesn't throw. I just noticed we declare throws unnecessarily in a bunch of other rules. Mind cleaning those up, too? http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: Line 387: RewritesOk("nvl2(null, 1, 2)", rules, "2"); Let's try not to mix these tests. Constant folding should be tested in the constant folding test. The fact that this is a conditional function is secondary. The tests are grouped by rewrite rule - not by type of expr. Line 424: // nullif: handled by folding move to constant folding test Line 583: // nullif: handled by rewrite to if and simplification converted to if and simplified Line 584: RewritesOk("nullif(bool_col, bool_col)", rules, "NULL"); Move into SimplifyConditionalsTest http://gerrit.cloudera.org:8080/#/c/7829/6/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: Line 90: public void ParserAnalysisError(String stmt, String expectedErrorString) { Parser does analyze. Also I don't see this used. -- To view, visit http://gerrit.cloudera.org:8080/7829 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id91ca968a0c0be44e1ec54ad8602f91a5cb2e0e5 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Philip Zeyliger Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5927: Fix enable distcc for zsh
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/8049 Change subject: IMPALA-5927: Fix enable_distcc for zsh .. IMPALA-5927: Fix enable_distcc for zsh enable_distcc didn't work on zsh anymore since it relies on automatic variable splitting, which only works in bash. This change makes zsh temporarily emulate sh to make enable_distcc work again. I tested it on zsh and bash. Change-Id: I88284e4f68c309bb46ce4b5a842ccc576cd487ed --- M bin/distcc/distcc_env.sh 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/8049/1 -- To view, visit http://gerrit.cloudera.org:8080/8049 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I88284e4f68c309bb46ce4b5a842ccc576cd487ed Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker
[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Alex Behm has posted comments on this change. Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: Line 1331: Type.DOUBLE, ScalarType.createDecimalType(5, 1)); > Why is the result not DECIMAL(2,1)? I mean DECIMAL(3,1) of course. -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Alex Behm has posted comments on this change. Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. Patch Set 5: (11 comments) http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: Line 513:* Converts numeric literal in the expr tree rooted at this expr to return floating literals Line 516:* Decimal has a higher processing cost than floating point and we should not pay Please add subsections "DECIMAL_V1" and "DECIMAL_V2" to explain the rationale for their casting behavior. In particular, does decimal still have a higher processing cost? This way it becomes clear that the old behavior was intended to maximize performance and the new behavior is intended to provide accurate/consistent behavior (but does it take a perf hit for that?) Line 529: protected void convertNumericLiteralsFromDecimal(Analyzer analyzer) I can't wait for this function to be deleted. http://gerrit.cloudera.org:8080/#/c/7916/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: Line 1306:* Test expressions that resolve to different types depend on the DECIMAL_V2 setting. Test expressions that return different decimal types depending on the DECIMAL_V2 setting. Line 1319:* Test expressions that resolve to the same type with either DECIMAL_V2 value. resolve to -> return (in the FE 'resolve' has a specific meaning for SlotRefs and TableRefs which we should not confuse here) Line 1331: Type.DOUBLE, ScalarType.createDecimalType(5, 1)); Why is the result not DECIMAL(2,1)? Decimal type seems weird, but not the focus of this patch. Line 1342: // DECIMAL_V2: floating point + decimal expr = decimal This does not seem to explain what happens for "floating point + numeric literal" like in the DECIMAL_V1 case. What happens in cases where the numeric literal is a float because it does not fit into our max decimal? Needs test. Might be clearer if you list: DECIMAL_V2: float + decimal literal = decimal DECIMAL_V2: float + float literal = float Line 1386: // Multiplying a floating point type with any other type always results in a double. Why? Seems inconsistent. Line 1409: checkDecimalReturnType("select round(1.2345, 2) * pow(10, 10)", Type.DOUBLE); add same test with "+" Line 1426: // Test behavior of compound expressions with a single column and many literals. column -> slot ref (and elsewhere) Line 1465: checkDecimalReturnType("select 1.123e-2", ScalarType.createDecimalType(5, 5)); What about literals that don't fit into a decimal? -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job
Jim Apple has posted comments on this change. Change subject: IMPALA-5905: add script for all-build-options job .. Patch Set 2: Code-Review+2 (3 comments) Can you add a note about ccache and ninja? http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh File bin/all-build-options.sh: Line 1 > Yeah the jenkins/ subdirectory makes sense to me, there's a lot of junk in I named the Jenkins job "all-build-options". I'm OK with a name change of any sort, or leaving it the same. Maybe "build-with-all-flag-combinations.sh"? Line 21 Required the ninja build system and ccache to be installed, which are not strictly build requirements. PS1, Line 35: : : : : : : > I just preserved this logic from the original Jenkins job script. I didn't One difference: if clean.sh fails in the call to buildall, that call fails but we continue to test more build options. This way, if we can't clean, we just dies since all the remaining calls to buildall.sh should also fail but will provide no additional information. -- To view, visit http://gerrit.cloudera.org:8080/8043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Alex Behm has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1217/ -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1216/ -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Alex Behm has posted comments on this change. Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: (2 comments) Changed to INTERNAL_ERROR for the possible error in GetNextBuffer(). http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS10, Line 200: DISK_IO_ERROR > Seems to me that INTERNAL_ERROR should be in the list of unrecoverable erro Done PS10, Line 200: DISK_IO_ERROR > Makes sense to me. Done -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8011 to look at the new patch set (#12). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/common/status.h M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 9 files changed, 104 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/12 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Tianyi Wang has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7949/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 587: } catch (IllegalStateException e) { > catch (Exception e) Done Line 592: // Types of targetExpr and srcExpr should be exactly the same since runtime filter is > Types of targetExpr and srcExpr must be exactly the same since runtime filt Done Line 597: } catch (AnalysisException | IllegalArgumentException e) { > catch (Exception e) Done -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Tianyi Wang has uploaded a new patch set (#4). Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. IMPALA-5597: Try casting targetExpr when building runtime filter plan This patch fixes a bug that fails a precondition check when generating runtime filter plans. The lhs and rhs or join predicate might have different types when the eq predicate function accepts wildcard-typed parameters. In this case in existing code the types of source and target expr will be found mismatch and an exception will be thrown when generating runtime filters. This patch tries to cast target expr to be of the same type as source expr. A testcase is added to joins.test Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test 6 files changed, 53 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/4 -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Tianyi Wang has posted comments on this change. Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8047/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 18: > what happened here? Sorry. Didn't notice it in diff. http://gerrit.cloudera.org:8080/#/c/8047/1/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: Line 90: private final String outDir_ = System.getenv("IMPALA_FE_TEST_LOGS_DIR") > I suggest setting this in setUp() and checking whether the env var is set. Done -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Tianyi Wang has uploaded a new patch set (#2). Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. IMPALA-3516: Avoid writing to /tmp in testing Currently some parts of the tests write to /tmp: 1. PlannerTest result files are written to /tmp/PlannerTest 2. FE tests load libfesupport, which writes logs to /tmp 3. Updated results in EE tests (run-tests.py --update_results) is written to /tmp This patch changes them into writing to $IMPALA_HOME/logs. Specifically: 1. PlannerTest result files are written to $IMPALA_FE_TEST_LOGS_DIR/PlannerTest 2. libfesupport logs are written to $IMPALA_FE_TEST_LOGS_DIR 3. Updated EE test results are written to $IMPALA_EE_TEST_LOGS_DIR Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 --- M be/src/service/fe-support.cc M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M tests/common/impala_test_suite.py 3 files changed, 12 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8047/2 -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has posted comments on this change. Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. Patch Set 7: (21 comments) http://gerrit.cloudera.org:8080/#/c/8014/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 2876: > don't think you need that Done. removed in latest patch. PS1, Line 2878: > same here Done. removed in latest patch. Line 2891: | UNMATCHED_STRING_LITERAL:l expr:e > nit: extra spaces (see marked as red). Done. removed in latest patch. http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 273: KW_UNBOUNDED, KW_UNCACHED, KW_UNION, KW_UPDATE, KW_UPDATE_FN, KW_UPSERT, KW_USE, > nit: long line (see vertical separator, that's ok for .test files but for e Done. reverted this change when I moved away from using a keyword. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS6, Line 337: rules.add(BetweenToCompoundRule.INSTANCE); > Maybe add a brief comment about this rule as well? Both rules treated the same way, so updated this comment. PS6, Line 1093: ss > nit: them Done PS6, Line 1098: : // TODO: add a method to Expr to take care of this. > remove. Similar comment above (L1082). Done http://gerrit.cloudera.org:8080/#/c/8014/2/fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java File fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java: PS2, Line 26: > "a boolean test predicate."? I think you can also add the grammar spec here Done PS2, Line 33: This predicate needs to be rewritten into a Compo > For literals we usually make them static as well; also, remove "_" from the Done PS2, Line 38: ivate > No need for "this". We use "_" in the name to indicate private fields. Done PS2, Line 41: blic BoolTestPredicate(Expr e, LiteralExpr v, boolean isNegated) { : super(); > You can use the addChild() function here, same thing. Done PS2, Line 88: analyzeImpl(analyzer); > Will that work with something like "select 1 is true"? For instance, "selec since "select 1 = true" is supported, it makes sense to inject an implicit cast here as well. I changed this so that I don't type-check here since it would duplicate the type-checking that "eq" already does. so, to be consistent with "eq", I let the lhs through and assume that the analyzed rewritten expr will handle type-checking (and subqueries, etc). the main downside I see with this is that the error message from "eq" is exposed, which could be surprising to the user since its at a lower level of abstraction. PS2, Line 94: LHS is type-checked follow > Similar comment as above. e.g. select 1 is 1; I've specified the parser rule to require that the rhs is true, false, or unknown (that's per the standard). I'd prefer to check as-is; let me know if you think it makes more sense to relax the parser and move more flexibility here. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java File fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java: PS6, Line 29: > "a compound predicate." Done Line 64: result = new CompoundPredicate(CompoundPredicate.Operator.NOT, result, null); > Preconditions.checkNotNull(result); Done http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java: PS6, Line 578: > nit: extra space Done PS6, Line 580: TestBoolTestPredicate > A few more test cases to consider: added select list tests here added the others to the rewrites and end-to-end since the analysis pass does not do much now. if there is a sure-fire way to be consistent with the casting rules used by "=" (and without code duplication), it would make more sense to add the checks to analysis (and therefore more tests here). PS6, Line 586: > Also use "unknown" whoops, those nulls should not have been here. PS6, Line 591: > This doesn't seem to be consistent with how we handle for instance expr lik see comment above. PS6, Line 591: > Add a few more negative cases. Example: added negative tests and coverage to rewrites and end-to-end. the analysis for this expr does not do as much. http://gerrit.cloudera.org:8080/#/c/8014/6/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java: PS6, Line 168: RewritesOk("50.0 between null and 5000", rule, > Did you figure that out? If so, remove the TODO. the rewritten query does not come with parens, as shown, but I verified that the sql string returned from the shell (e.g., via a show create table )
[Impala-ASF-CR] IMPALA-1767 Adds predicate to test boolean values true, false, unknown.
Vuk Ercegovac has uploaded a new patch set (#7). Change subject: IMPALA-1767 Adds predicate to test boolean values true, false, unknown. .. IMPALA-1767 Adds predicate to test boolean values true, false, unknown. Adds a new expression to represent the following boolean predicate: IS [NOT] (TRUE | FALSE | UNKNOWN) The expression is rewritten into compound expressions using negation, is null, and equality. As a result, there are no BE operators to eval this predicate. Testing: - Frontend: parser, analyzer, rewriter, tosql - EndToEnd query expressions Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/Analyzer.java A fe/src/main/java/org/apache/impala/analysis/BoolTestPredicate.java A fe/src/main/java/org/apache/impala/rewrite/BoolTestToCompoundRule.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M testdata/workloads/functional-query/queries/QueryTest/exprs.test 9 files changed, 467 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/14/8014/7 -- To view, visit http://gerrit.cloudera.org:8080/8014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bd0aa82a0316ac236b7ecf8612ef4b30c016a00 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Vuk ErcegovacGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. IMPALA-4987: Fix flaky test test_row_availability.py This patch keeps test_row_availbility from randomly failing. In this test the time interval between the 'Rows available' timeline event and the previous event in the runtime profile is measured in order to make sure that the rows become available after a specific amount of time. This measurement is not correct since the previous event is that the coordinator finished sending the query fragments to the backends, which means the execution on some backends might have already started. This patch tracks another event "Ready to start" as the beginning of the time interval instead. The coordinator begins to send the query fragments to the backends after this event so the time check should always pass. Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Reviewed-on: http://gerrit.cloudera.org:8080/8036 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M tests/query_test/test_rows_availability.py 1 file changed, 30 insertions(+), 21 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 11: Code-Review+1 The stricter invariant makes sense to me. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Alex Behm has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 3: (4 comments) lgtm after this round http://gerrit.cloudera.org:8080/#/c/7949/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 587: } catch (IllegalStateException e) { catch (Exception e) Line 592: // Types of targetExpr and srcExpr should be exactly the same since runtime filter is Types of targetExpr and srcExpr must be exactly the same since runtime filters are based on hashing. Line 597: } catch (AnalysisException | IllegalArgumentException e) { catch (Exception e) http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test File testdata/workloads/functional-query/queries/QueryTest/joins.test: Line 774 > Done. What does propagation mean here? I agree "propagation" may be confusing. I think it refers to the fact we are "sprinkling" runtime filters throughout the plan tree. I think it's more confusing than not (might be better to rename in a separate patch). -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Alex Behm has posted comments on this change. Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8047/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: Line 18: what happened here? http://gerrit.cloudera.org:8080/#/c/8047/1/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java: Line 90: private final String outDir_ = System.getenv("IMPALA_FE_TEST_LOGS_DIR") I suggest setting this in setUp() and checking whether the env var is set. Otherwise, might be hard to debug what went wrong. -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has abandoned this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Abandoned While this change works as expected, I think we can be more uniform in our approach so that the system is easy to reason about. With this patch, its possible that the Catalog can support large topic size while Impalads can't deserialize it (for example, a single table larger than 2GB). While we explore other options, I'm abandoning this change. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5920: Remove admission control dependency on YARN RM jar
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5920: Remove admission control dependency on YARN RM jar .. Patch Set 2: (3 comments) It's a bit unfortunate that we have to pull in this much code to get it to work, but it seems like you've already pruned out the unnecessary stuff. http://gerrit.cloudera.org:8080/#/c/8035/2//COMMIT_MSG Commit Message: Line 23: 'common/yarn-extras', taken from Hadoop 2.6.0. Is the original source code that you cribbed this from available publicly somewhere? Would be handy to diff. http://gerrit.cloudera.org:8080/#/c/8035/2/common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java File common/yarn-extras/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/AllocationConfiguration.java: Line 19: //YARNUTIL: MODIFIED Is there an easy way I can see the diff? If you point me to the hadoop source this was derived from I can do it myself. http://gerrit.cloudera.org:8080/#/c/8035/2/fe/pom.xml File fe/pom.xml: Line 111: system Is systemPath necessary? We seem to be able to resolve data-source-api above without specifying this. -- To view, visit http://gerrit.cloudera.org:8080/8035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7efdd8ebea298836ca2a82c0a4ae037ac9285bcf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS10, Line 200: DISK_IO_ERROR > Seems to me that INTERNAL_ERROR should be in the list of unrecoverable erro Makes sense to me. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS10, Line 200: DISK_IO_ERROR > I can change that, but then we also have to abort the query in GetNextInter Seems to me that INTERNAL_ERROR should be in the list of unrecoverable errors. Who knows what state things are in when hitting one of them (which signifies a bug). Tim, what do you think? If you think it's too risky to make that change now, we can do it later. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3516: Avoid writing to /tmp in testing
Tianyi Wang has uploaded a new change for review. http://gerrit.cloudera.org:8080/8047 Change subject: IMPALA-3516: Avoid writing to /tmp in testing .. IMPALA-3516: Avoid writing to /tmp in testing Currently some parts of the tests write to /tmp: 1. PlannerTest result files are written to /tmp/PlannerTest 2. FE tests load libfesupport, which writes logs to /tmp 3. Updated results in EE tests (run-tests.py --update_results) is written to /tmp This patch changes them into writing to $IMPALA_HOME/logs. Specifically: 1. PlannerTest result files are written to $IMPALA_FE_TEST_LOGS_DIR/PlannerTest 2. libfesupport logs are written to $IMPALA_FE_TEST_LOGS_DIR 3. Updated EE test results are written to $IMPALA_EE_TEST_LOGS_DIR Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 --- M be/src/service/fe-support.cc M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java M tests/common/impala_test_suite.py 3 files changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/8047/1 -- To view, visit http://gerrit.cloudera.org:8080/8047 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9e503eb7d333c1b89dc8aea87cf30504838c44f9 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Alex Behm has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5905: add script for all-build-options job .. IMPALA-5905: add script for all-build-options job This checks in a modified version of the job script for https://jenkins.impala.io/view/Experimental/job/all-build-options which adds UBSAN and TSAN. The script is also modified to not reference any jenkins environment variables, e.g. WORKSPACE, since the Jenkins job script intermingled references to those with the script logic. Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 --- A bin/jenkins/all-build-options.sh 1 file changed, 57 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8043/2 -- To view, visit http://gerrit.cloudera.org:8080/8043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS10, Line 154: if the scan range has returned eosr before > if that was the case, why can we have !eosr at line 152? I think this was an old comment and I couldn't convince myself anymore that it was correct. I also discovered that the DCHECK should read (!status.ok || ...) because if the status is OK, that implies the buffer must not be NULL). PS10, Line 155: must > why "must"? or do you mean "can't"? Yes, can't is what I meant to say. Removed the comment though. PS10, Line 155: this is the first time the function : // is called > why is that significant? and why is it okay to have io_buffer_==NULL in tha Removed the comment. PS10, Line 200: DISK_IO_ERROR > why not INTERNAL_ERROR, since that's the code that indicates an internal er I can change that, but then we also have to abort the query in GetNextInternal() on INTERNAL_ERROR. I think we re-used DISK_IO_ERROR here to keep the set of aborting errors small. Would you prefer to switch to INTERNAL_ERROR? PS10, Line 351: if (boundary_buffer_bytes_left_ > 0 && : (output_buffer_pos_ != _buffer_pos_ || : output_buffer_bytes_left_ != _buffer_bytes_left_)) { : return false; > this is effectively a double negative statement, which makes it harder to r Done. I think in the else case (boundary_buffer_bytes_left_ == 0) the buffer pointers could point to either buffer. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8011 to look at the new patch set (#11). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/common/status.h M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 9 files changed, 97 insertions(+), 55 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/11 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5905: add script for all-build-options job .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh File bin/all-build-options.sh: Line 1: #!/usr/bin/env bash > Nit: I don't like the file name. all-build-options.sh sounds like it's some Yeah the jenkins/ subdirectory makes sense to me, there's a lot of junk in bin/ PS1, Line 35: do : OPTIONS[4]=$BUILD_SHARED_LIBS : if ! ./bin/clean.sh : then : echo "Clean failed" : exit 1 : fi > You could remove noclean in line 26 and remove this. It's obviously not exa I just preserved this logic from the original Jenkins job script. I didn't add this so I don't know if there was some deeper motivation or it. I could probably try to clean this and other things up but I'd prefer to check this in with the minimal changes required to make it work then worry about cleanup later. -- To view, visit http://gerrit.cloudera.org:8080/8043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. Patch Set 5: Updated with some additional tests to get more exhaustive coverage of the different possible permutations. -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Hello Greg Rahn, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7916 to look at the new patch set (#5). Change subject: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion This changes the behaviour of applying an arithmetic operator to constant DECIMAL and non-DECIMAL arguments. In DECIMAL_V1, this caused an implicit conversion to floating point, which caused users a lot of confusion in some cases. In DECIMAL_V2 the typing rules are simplified: constant decimals are treated the same as any other decimals. Testing: Added some expression tests for different arithmetic operators and binary predicates (the two Expr subclasses that call convertNumericLiteralsFromDecimal()). Extended analyzer tests to test DECIMAL_V2 behaviour. Added many additional test for various combinations of literals and non-literals to get better coverage of existing and new behaviour. Ran core tests. Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da --- M be/src/exprs/expr-test.cc M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java 3 files changed, 200 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/7916/5 -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Tianyi Wang has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 585: targetExpr = targetExpr.substitute(smap, analyzer,false); > Better to use try/catch and return null in case the re-analysis (part of su Done Line 587: Type srcType = filter.getSrcExpr().getType(); > Add a comment explaining why the types must be identical Done Line 591: } catch(AnalysisException | IllegalArgumentException e){ > catch Exception Done Line 594: Preconditions.checkState(targetExpr.getType().equals(srcType)); > remove (obviously true from code above) Done http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test File testdata/workloads/functional-query/queries/QueryTest/joins.test: Line 774: QUERY > Also add a planner test in runtime-filter-propagation.test Done. What does propagation mean here? Line 775: # IMPALA-5597: Tests that a join on clause with lhs and rhs having different types will > Comment is wrong. Done Line 777: select * > EE test should go into runtime_row_filters.test Done -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Tianyi Wang has uploaded a new patch set (#3). Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. IMPALA-5597: Try casting targetExpr when building runtime filter plan This patch fixes a bug that fails a precondition check when generating runtime filter plans. The lhs and rhs or join predicate might have different types when the eq predicate function accepts wildcard-typed parameters. In this case in existing code the types of source and target expr will be found mismatch and an exception will be thrown when generating runtime filters. This patch tries to cast target expr to be of the same type as source expr. A testcase is added to joins.test Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test 6 files changed, 54 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/3 -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has uploaded a new patch set (#3). Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. IMPALA-5538: Use explicit catalog versions for deleted objects This commit changes the way deletions are handled in the catalog and disseminated to the impalad nodes through the statestore. Previously, deletions of catalog objects were not explicitly annotated with the catalog version in which the deletion occured and the impalads were using the max catalog version in a catalog update in order to decide whether a deletion should be applied to the local catalog cache or not. This works correctly under the assumption that all the changes that occurred in the catalog between an update's min and max catalog version are included in that update, i.e. no gaps or missing updates. With the upcoming fix for IMPALA-5058, that constraint will be relaxed, thus allowing for gaps in the catalog updates. To avoid breaking the existing behavior, this patch introduced the following changes: * Deletions in the catalog are explicitly recorded in a log with the catalog version in which they occurred. As before, added and deleted catalog objects are sent to the statestore. * Topic entries associated with deleted catalog objects have non-empty values (besided keys) that contain minimal object metadata including the catalog version. * Statestore is no longer using the existence or not of topic entry values in order to identify deleted topic entries. Deleted topic entries should be explicitly marked as such by the statestore subscribers that produce them. * Statestore subscribers now use the 'deleted' flag to determine if a topic entry corresponds to a deleted item. * Impalads use the deleted objects' catalog versions when updating the local catalog cache from a catalog update and not the update's maximum catalog version. Testing: - No new tests were added as these paths are already exercised by existing tests. - Run all core tests. Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c --- M be/src/catalog/catalog-server.cc M be/src/catalog/catalog-server.h M be/src/scheduling/admission-controller.cc M be/src/scheduling/admission-controller.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler.cc M be/src/service/impala-server.cc M be/src/statestore/statestore.cc M be/src/statestore/statestore.h M common/thrift/CatalogInternalService.thrift M common/thrift/StatestoreService.thrift M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/statestore/test_statestore.py 16 files changed, 421 insertions(+), 336 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/7731/3 -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-5538: Use explicit catalog versions for deleted objects
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5538: Use explicit catalog versions for deleted objects .. Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: PS2, Line 233: 0, if they have then skip this step and use that data > nit: perhaps easier to read? "...version 0. If so, then skip ..." Done Line 305: BuildTopicUpdatesHelper(catalog_objects.updated_objects, false); > I like the previous version more where this is inlined into L292. A quick c Done Line 317: if (catalog_object.catalog_version <= last_sent_catalog_version_) continue; > Move this above L312? Done PS2, Line 336: if (topic_deletions) { : VLOG(1) << "Publishing deletion: " << entry_key << "@" : << catalog_object.catalog_version; : } else { : VLOG(1) << "Publishing update: " << entry_key << "@" : << catalog_object.catalog_version; : } > perhaps remove some redundancy this way: Done http://gerrit.cloudera.org:8080/#/c/7731/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: Line 1340: LOG(INFO) << "Received large catalog update(>100mb): " > Received a large catalog item? Done PS2, Line 1366: if (item.deleted) { : VLOG(3) << "Deleted item: " << item.key << " version: " : << catalog_object.catalog_version; : } else { : VLOG(3) << "Added item: " << item.key << " version: " : << catalog_object.catalog_version; : } > same suggestion as in catalog_server to reduce redundancy. Done http://gerrit.cloudera.org:8080/#/c/7731/2/common/thrift/StatestoreService.thrift File common/thrift/StatestoreService.thrift: Line 86: 3: required bool deleted = false; > Isn't it better to not have a default value to make sure this is always set Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java File fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java: Line 56: * determine which catalog objects were deleted since the last catalog update topic. > catalog topic update Done Line 57: * Once the catalog update topic is constructed, the old deleted catalog objects are > catalog topic update Done Line 74: // Retrieve all the removed catalog objects with version > 'fromVersion'. > /** style comment Done http://gerrit.cloudera.org:8080/#/c/7731/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: PS2, Line 152: . > the word "delta" implies that we're dealing with inserts, deletes, and poss Done Line 347: // Identify the catalog objects that were updated (added/dropped) in the catalog with > /** style comment Done Line 351: Set addedCatalogObjectKeys = Sets.newHashSet(); > updatedCatalogObjects? Done Line 363: TCatalogObject catalogTbl = new TCatalogObject(TCatalogObjectType.TABLE, > Why not just iterate over db.getTables()? Done Line 441: addedCatalogObjectKeys.add(CatalogDeltaLog.toCatalogObjectKey(privilege)); > Instead of having a duplicate add() in all places, how about creating the a Done Line 448: for (TCatalogObject removedObject: deltaLog_.retrieveObjects(fromVersion)) { > Nice! Much clearer. Done -- To view, visit http://gerrit.cloudera.org:8080/7731 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I93cb7a033dc8f0d3e0339394b36affe14523274c Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris TsirogiannisGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Vuk Ercegovac Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/8011/10/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: PS10, Line 154: if the scan range has returned eosr before if that was the case, why can we have !eosr at line 152? PS10, Line 155: must why "must"? or do you mean "can't"? PS10, Line 155: this is the first time the function : // is called why is that significant? and why is it okay to have io_buffer_==NULL in that case? PS10, Line 200: DISK_IO_ERROR why not INTERNAL_ERROR, since that's the code that indicates an internal error (i.e. bug). PS10, Line 351: if (boundary_buffer_bytes_left_ > 0 && : (output_buffer_pos_ != _buffer_pos_ || : output_buffer_bytes_left_ != _buffer_bytes_left_)) { : return false; this is effectively a double negative statement, which makes it harder to read. How about applying deMorgan's to remove the double negation: return boundary_buffer_bytes_left_ == 0 || (output_buffer_pos_ == _buffer_pos_ && output_buffer_bytes_left == _buffer_bytes_left_); Alternatively, it may be even easier to read if you just put the DCHECK in there: if (boundary_buffer_bytes_left_ > 0) { DCHECK_EQ(output_buffer_pos_, _buffer_pos_); DCHECK_EQ(output_buffer_bytes_left_, _buffer_bytes_left_); } (which you could also use at line 228-229). Also, is there an invariant for the "else" case w.r.t. io_buffer* that makes sense to check? -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job
Philip Zeyliger has posted comments on this change. Change subject: IMPALA-5905: add script for all-build-options job .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/8043/1/bin/all-build-options.sh File bin/all-build-options.sh: Line 1: #!/usr/bin/env bash Nit: I don't like the file name. all-build-options.sh sounds like it's something that spits out the build options, rather than something that actually then builds using all those options. "build-with-all-options.sh" is closer to what this does. On the other hand, if we have a directory for Jenkins jobs, and the script names are one to one with job names, I have no problem with this name. As a new developer, though, figuring out what's what in bin/ is already a challenge. PS1, Line 35: do : OPTIONS[4]=$BUILD_SHARED_LIBS : if ! ./bin/clean.sh : then : echo "Clean failed" : exit 1 : fi You could remove noclean in line 26 and remove this. It's obviously not exactly the same, but it's supposed to be! -- To view, visit http://gerrit.cloudera.org:8080/8043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Jim Apple Gerrit-Reviewer: Philip Zeyliger Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 19: Code-Review+2 Thanks for the reviews. Carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7955 to look at the new patch set (#19). Change subject: IMPALA-5881: Use native allocation while building catalog updates .. IMPALA-5881: Use native allocation while building catalog updates This patch moves the allocation of thrift structures for serializing the output of GetAllCatalogObjects() call into the native side. This is done to prevent the Catalog from hitting JVM array limitations (2GB maximum size) at scale. Additionally this patch also extends the --compact_catalog_top=true to apply TCompactProtocol for the above serialization to reduce the in-memory footprint. This patch also caps the native allocations to not go beyond 4GB due to Thrift library limitations. (Thrift internally uses uint32_t datatype to represent the message size which limits the size to ~4.2GB). Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive. Deployed the patch on a 16 node cluster and tested it on a Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed). Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 --- M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/rpc/jni-thrift-util.h M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java 12 files changed, 731 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/19 -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: (8 comments) http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS18, Line 117: DCHECK_LE(len, numeric_limits::max()); > This DCHECK needs to be run against nbuffer.buffer_len directly. I'm not totally sure about this part, but looks like the static_cast() can silently fail. Either way I think its good to make this check on buffer_len http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS18, Line 155: Catalog > nit: catalog Done http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 61: nbaos.write(b, offset, len); > try catch and assert if we catch an exception or use the exceptionCaught pa As discussed, not really needed since we expect the write to throw if an error occurs and then it is caught by the test. Line 67: // Makes sure that the memory is freed by the end of nboas.write(). > Expects a write() to fail and checks that the memory is freed after the uns Done Line 71: nbaos.write(b, offset, len); > Add an assert that fails if the write succeeded. Done Line 95: public void nbaosTest() { > NbaosTest Done Line 99: testAllocator = new NativeTestAllocator(); > move into L96 Done Line 100: // Check that initial allocation failure in NBAOS c'tor > remove "that" Done -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5905: add script for all-build-options job
Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/8043 Change subject: IMPALA-5905: add script for all-build-options job .. IMPALA-5905: add script for all-build-options job This checks in a modified version of the job script for https://jenkins.impala.io/view/Experimental/job/all-build-options which adds UBSAN and TSAN. The script is also modified to not reference any jenkins environment variables, e.g. WORKSPACE, since the Jenkins job script intermingled references to those with the script logic. Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 --- A bin/all-build-options.sh 1 file changed, 57 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/8043/1 -- To view, visit http://gerrit.cloudera.org:8080/8043 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e78f05c41e3ccd59af599b00e453e7f88b2bb34 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/7955/18/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS18, Line 117: DCHECK_LE(len, numeric_limits::max()); Is this actually needed? Isn't the cast in the line above going to fail if the condition in the DCHECK is false? http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/main/java/org/apache/impala/service/JniCatalog.java File fe/src/main/java/org/apache/impala/service/JniCatalog.java: PS18, Line 155: Catalog nit: catalog -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to 3f49724
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] Bump Kudu version to 3f49724
Impala Public Jenkins has submitted this change and it was merged. Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Reviewed-on: http://gerrit.cloudera.org:8080/8040 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 10: I fixed more test errors and will run an exhaustive build in parallel now. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8011 to look at the new patch set (#10). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/common/status.h M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr-test.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 9 files changed, 106 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/10 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Alex Behm has posted comments on this change. Change subject: IMPALA-5250: Unify decompressor output_length semantics .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG Commit Message: Line 26: instead of the "fast" version, for the latter is insecure with corrupted > Is there an individual benchmark? Not sure how useful compression-test.cc is. I was thinking more along the lines of end-to-end tests that have heavy data exchanges. http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: Line 196: } > I'm not sure if my interpretation of JIRA is correct. In my understanding: Your interpretation is correct. I'm just referring to a specific broken file I've seen where I believe the output_len should be 0. Can you please make sure that your change fixes the bug for that specific broken file and check whether these tests cover that scenario? http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc File be/src/util/decompress.cc: Line 426: // If size_only is false, output buffer size must be at least output_len. *output_len is > Done. It is set to 0 in this case, while other decompressor may leave it un Agree this in/out pattern is not great and we should consider your proposal (but not in this patch). I'm ok with setting to 0 everywhere, but it might be easier to just leave output_len untouched. I saw in the code that in some places it is set to 0, but not in others. My main ask is for consistency and documenting the behavior in a comment. -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: Taking a look now. -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1215/ -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Alex Behm has posted comments on this change. Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: PS2, Line 866: Thanks for removing http://gerrit.cloudera.org:8080/#/c/7949/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java: Line 585: targetExpr = targetExpr.substitute(smap, analyzer,false); Better to use try/catch and return null in case the re-analysis (part of substitute()) fails for some reason. Space before "false" Line 587: Type srcType = filter.getSrcExpr().getType(); Add a comment explaining why the types must be identical Line 591: } catch(AnalysisException | IllegalArgumentException e){ catch Exception formatting (space after catch, space before "{") Line 594: Preconditions.checkState(targetExpr.getType().equals(srcType)); remove (obviously true from code above) http://gerrit.cloudera.org:8080/#/c/7949/2/testdata/workloads/functional-query/queries/QueryTest/joins.test File testdata/workloads/functional-query/queries/QueryTest/joins.test: Line 774: QUERY Also add a planner test in runtime-filter-propagation.test Line 775: # IMPALA-5597: Tests that a join on clause with lhs and rhs having different types will Comment is wrong. Line 777: select * EE test should go into runtime_row_filters.test -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Alex Behm has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Tianyi Wang has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8036/2//COMMIT_MSG Commit Message: Line 7: IMPALA-4987: Fix flaky test test_row_availability.py > Sorry to be adamant about grammar nits, but let's get them fixed. Thanks for the careful review. I should definitely improve on this. Line 9: This patch keeps test_row_availbility from random failing. In this test > from randomly failing Done Line 14: coordinator finished sending query to the backends, which means the > sending the query fragments Done http://gerrit.cloudera.org:8080/#/c/8036/2/tests/query_test/test_rows_availability.py File tests/query_test/test_rows_availability.py: Line 1: # Licensed to the Apache Software Foundation (ASF) under one > space Done -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Tianyi Wang has uploaded a new patch set (#3). Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. IMPALA-4987: Fix flaky test test_row_availability.py This patch keeps test_row_availbility from randomly failing. In this test the time interval between the 'Rows available' timeline event and the previous event in the runtime profile is measured in order to make sure that the rows become available after a specific amount of time. This measurement is not correct since the previous event is that the coordinator finished sending the query fragments to the backends, which means the execution on some backends might have already started. This patch tracks another event "Ready to start" as the beginning of the time interval instead. The coordinator begins to send the query fragments to the backends after this event so the time check should always pass. Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 --- M tests/query_test/test_rows_availability.py 1 file changed, 30 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8036/3 -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5860: upgrade to LLVM 3.9.1
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-5860: upgrade to LLVM 3.9.1 .. IMPALA-5860: upgrade to LLVM 3.9.1 LLVM made a few API changes: * Misc minor changes to function and type signatures * The CloneFunction() API changed semantics (http://reviews.llvm.org/D18628) TODO: need to rebase and update toolchain to a later version. It has gotten behind with the frequent Kudu version bumps. Testing: Ran core and ASAN tests. Perf: Ran single node TPC-H and targeted perf with scale factor 60. Both improved on average. +--+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--+---+-++++ | TPCH(60) | parquet / none / none | 17.82 | -5.01% | 11.64 | -4.23% | +--+---+-++++ +--+--+---++-++++-+---+ | Workload | Query| File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters | +--+--+---++-++++-+---+ | TPCH(60) | TPCH-Q1 | parquet / none / none | 27.97 | 27.59 | +1.36% | 0.39%| 0.41%| 1 | 5 | | TPCH(60) | TPCH-Q20 | parquet / none / none | 5.81 | 5.78| +0.44% | 0.73%| 0.21%| 1 | 5 | | TPCH(60) | TPCH-Q21 | parquet / none / none | 62.98 | 62.98 | +0.01% | 5.56%| 1.07%| 1 | 5 | | TPCH(60) | TPCH-Q15 | parquet / none / none | 8.45 | 8.46| -0.20% | 0.40%| 0.38%| 1 | 5 | | TPCH(60) | TPCH-Q4 | parquet / none / none | 5.57 | 5.59| -0.41% | 0.43%| 0.80%| 1 | 5 | | TPCH(60) | TPCH-Q6 | parquet / none / none | 3.16 | 3.17| -0.45% | 0.78%| 1.70%| 1 | 5 | | TPCH(60) | TPCH-Q5 | parquet / none / none | 7.41 | 7.47| -0.92% | 0.71%| 1.06%| 1 | 5 | | TPCH(60) | TPCH-Q9 | parquet / none / none | 33.45 | 33.78 | -0.99% | 1.15%| 0.85%| 1 | 5 | | TPCH(60) | TPCH-Q11 | parquet / none / none | 2.00 | 2.03| -1.34% | 1.71%| 2.24%| 1 | 5 | | TPCH(60) | TPCH-Q2 | parquet / none / none | 4.71 | 4.79| -1.60% | 1.49%| 1.95%| 1 | 5 | | TPCH(60) | TPCH-Q18 | parquet / none / none | 46.48 | 47.71 | -2.58% | 1.04%| 0.38%| 1 | 5 | | TPCH(60) | TPCH-Q14 | parquet / none / none | 5.85 | 6.02| -2.84% | 0.44%| 0.70%| 1 | 5 | | TPCH(60) | TPCH-Q22 | parquet / none / none | 6.51 | 6.76| -3.71% | 2.29%| 2.42%| 1 | 5 | | TPCH(60) | TPCH-Q19 | parquet / none / none | 7.27 | 7.63| -4.69% | 1.33%| 0.78%| 1 | 5 | | TPCH(60) | TPCH-Q10 | parquet / none / none | 13.19 | 13.84 | -4.73% | 0.42%| 1.44%| 1 | 5 | | TPCH(60) | TPCH-Q13 | parquet / none / none | 21.95 | 23.12 | -5.03% | 0.25%| 1.19%| 1 | 5 | | TPCH(60) | TPCH-Q16 | parquet / none / none | 5.29 | 5.57| -5.04% | 0.85%| 0.78%| 1 | 5 | | TPCH(60) | TPCH-Q7 | parquet / none / none | 42.05 | 44.33 | -5.16% | 2.07%| 2.28%| 1 | 5 | | TPCH(60) | TPCH-Q12 | parquet / none / none | 19.77 | 21.00 | -5.87% | 8.14%| 5.09%| 1 | 5 | | TPCH(60) | TPCH-Q3 | parquet / none / none | 11.46 | 12.32 | -6.94% | 0.76%| 0.53%| 1 | 5 | | TPCH(60) | TPCH-Q17 | parquet / none / none | 40.09 | 49.28 | -18.64% | 2.09%| 0.67%| 1 | 5 | | TPCH(60) | TPCH-Q8 | parquet / none / none | 10.63 | 13.47 | I -21.08% | * 12.34% * | * 21.09% * | 1 | 5 | +--+--+---++-++++-+---+ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TARGETED-PERF(60) | parquet / none / none | 22.38 | -1.24%
[Impala-ASF-CR] IMPALA-3877: support unpatched LLVM
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-3877: support unpatched LLVM .. IMPALA-3877: support unpatched LLVM The p1 patch we use for LLVM avoided merging of structurally identical Struct types in unpredictable ways when linking in IR UDF modules. This avoided hitting type assertions when generating calls to IR UDfs. This implements an alternative solution, which is to bitcast the arguments when calling IR UDFs. This means we do not need to carry the patch when we upgrade LLVM. Testing: Ran core tests with unpatched LLVM 3.8, including the IR UDF test that originally required the patch to pass. Change-Id: I3dfbe44ed8a1464b9b0991fd54e72b194ad6155d --- M be/src/codegen/CMakeLists.txt M be/src/codegen/codegen-anyval.cc A be/src/codegen/codegen-util.cc A be/src/codegen/codegen-util.h M be/src/codegen/llvm-codegen.h M be/src/exprs/expr-codegen-test.cc 6 files changed, 154 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/7973/2 -- To view, visit http://gerrit.cloudera.org:8080/7973 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3dfbe44ed8a1464b9b0991fd54e72b194ad6155d Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Tianyi Wang has posted comments on this change. Change subject: IMPALA-5250: Unify decompressor output_length semantics .. Patch Set 1: (18 comments) http://gerrit.cloudera.org:8080/#/c/8030/1//COMMIT_MSG Commit Message: Line 9: This patch makes the semantics output_length parameter in > the semantics of the output_length Done Line 10: Codec::ProcessBlock to be the same. In existing code different > Codec::ProcessBlock() to be the same across all codecs. In the existing cod Done Line 13:to actual decompressed length, but it does not set it to actual > to the actual decompressed length Done Line 16:be exactly the same as actual decompressed length, otherwise > as the actual decompressed length Done Line 19:actual decompressed length and will set it to actual decompressed > the actual decompressed length Done Line 21: This inconsistency leads to a bug where the error message is > This inconsistency lead to a bug where Do you mean changing it to plural? Line 24: decompressors can handle oversized output buffer correctly. > can handle an oversized output buffer correctly. Done Line 25: Lz4Decompressor will use the "safe" version of decompression function > will use the "safe" instead of the "fast" decompression function Done Line 26: instead of the "fast" version, for the latter is insecure with corrupted > We use LZ4 for compressing exchange data, so we check the perf impact of th Is there an individual benchmark? There is one for compression in be/src/experiments/compression-test.cc. Line 27: data and requires decompressed length to be known. > and requires the decompressed length to be known Done http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress-test.cc File be/src/util/decompress-test.cc: Line 153: // Test the behavior when the decompressor is given too little / too much space > "." at end of sentence Done Line 155: // correct output size when the space is enough, and does not write beyond the output > the correct output size Done Line 157: void DecompressOverUnderSizedOutputBuffer(Codec *compressor, Codec *decompressor, > Out style is "Codec* compressor" Done Line 196: } > I think we should test more directly the scenario mentioned in the JIRA whe I'm not sure if my interpretation of JIRA is correct. In my understanding: 1. The parquet file is corrupted so either current_page_header_.uncompressed_page_size is greater or current_page_header_.compressed_page_size is less than what it should be. 2. Either way we allocated a larger buffer and the decompressor decompressed the (probably incomplete) data successfully. 3. The part of the buffer not written by the decompressor is uninitilized. Because the decompressor returned OK, somehow we read this uninitilized part later and trigger undeterministic errors. The problem here is in 2: The decompressor should report how much data it decompressed even if the decompression is successful. Then ReadDataPage() can check whether it is the same as current_page_header_.uncompressed_page_size. So I think we should test whether output_len is set properly with an successful decompression. I don't understand "set to 0" part. When should it be set to 0? http://gerrit.cloudera.org:8080/#/c/8030/1/be/src/util/decompress.cc File be/src/util/decompress.cc: Line 426: // If size_only is false, output buffer size must be at least output_len. *output_len is > Mention what output_len is set to if a non-OK status is returned. Done. It is set to 0 in this case, while other decompressor may leave it unmodified or set it to an arbitrary intermediate value. I think the complication here is caused by the design that 'output_len' and 'output' are output parameters. When a parameter is both input and output, and could be passed as reference further into callees of callee, there could be all kinds of bugs forgetting to set the correct value here and there. I think if output and output_len is in the return value, creating such bugs is much more difficult, and the code will be much easier to follow since the data flow is clear and explicit. I think we should have a Status type for return values. It represents either a value or an error status, and then we can avoid output parameters in new codes. Line 427: // also updated with actual output size. > the actual output size Done Line 575: // LZ4_decompress_fast will cause a segmentation fault if passed a nullptr output. > Comment still relevant? Removed. Line 579: if (ret < 0) { > single line Done -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5250: Unify decompressor output length semantics
Tianyi Wang has uploaded a new patch set (#2). Change subject: IMPALA-5250: Unify decompressor output_length semantics .. IMPALA-5250: Unify decompressor output_length semantics This patch makes the semantics of the output_length parameter in Codec::ProcessBlock to be the same across all codecs. In existing code different decompressor treats output_length differently: 1. SnappyDecompressor needs output_length to be greater than or equal to the actual decompressed length, but it does not set it to the actual decompressed length after decompression. 2. SnappyBlockDecompressor and Lz4Decompressor require output_length to be exactly the same as the actual decompressed length, otherwise decompression fails. 3. Other decompressors need output_length to be greater than or equal to the actual decompressed length and will set it to actual decompressed length if oversized. This inconsistency leads to a bug where the error message is undeterministic when the compressed block is corrupted. This patch makes all decompressor behave as 3 and adds a testcase checking that decompressors can handle an oversized output buffer correctly. Lz4Decompressor will use the "safe" instead of the "fast" decompression function, for the latter is insecure with corrupted data and requires the decompressed length to be known. Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 --- M be/src/util/decompress-test.cc M be/src/util/decompress.cc 2 files changed, 31 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/30/8030/2 -- To view, visit http://gerrit.cloudera.org:8080/8030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd42942b169921a7eb53940c3762bc45bb82a993 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Alex Behm has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8036/2//COMMIT_MSG Commit Message: Line 7: IMPALA-4987: Fix flaky test test_row_availability.py Sorry to be adamant about grammar nits, but let's get them fixed. Line 9: This patch keeps test_row_availbility from random failing. In this test from randomly failing Line 14: coordinator finished sending query to the backends, which means the sending the query fragments http://gerrit.cloudera.org:8080/#/c/8036/2/tests/query_test/test_rows_availability.py File tests/query_test/test_rows_availability.py: Line 1: # Licensed to the Apache Software Foundation (ASF) under one space -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Alex Behm has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 18: Code-Review+1 (6 comments) Dimitris, any more comments? http://gerrit.cloudera.org:8080/#/c/7955/18/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 61: nbaos.write(b, offset, len); try catch and assert if we catch an exception or use the exceptionCaught pattern you use for the c'tor test below Line 67: // Makes sure that the memory is freed by the end of nboas.write(). Expects a write() to fail and checks that the memory is freed after the unsuccessful write(). Line 71: nbaos.write(b, offset, len); Add an assert that fails if the write succeeded. Line 95: public void nbaosTest() { NbaosTest Line 99: testAllocator = new NativeTestAllocator(); move into L96 Line 100: // Check that initial allocation failure in NBAOS c'tor remove "that" -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] PREVIEW: IMPALA-3437: DECIMAL V2: avoid implicit decimal->double conversion
Greg Rahn has posted comments on this change. Change subject: PREVIEW: IMPALA-3437: DECIMAL_V2: avoid implicit decimal->double conversion .. Patch Set 4: Code-Review+1 LGTM -- To view, visit http://gerrit.cloudera.org:8080/7916 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie419a75784eec2294947103e6e1465dfadfc29da Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim ArmstrongGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Greg Rahn Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
Tim Armstrong has posted comments on this change. Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps .. Patch Set 2: Code-Review-1 I think that the documented solution is not the correct solution to the problem. It may work by coincidence but relies on some very brittle assumptions about Impala exporting boost functions that have binary compatibility with the version of boost used by the UDF. In future we will likely remove, update or modify the version of boost used in Impala. -- To view, visit http://gerrit.cloudera.org:8080/7983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 9: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has uploaded a new patch set (#18). Change subject: IMPALA-5881: Use native allocation while building catalog updates .. IMPALA-5881: Use native allocation while building catalog updates This patch moves the allocation of thrift structures for serializing the output of GetAllCatalogObjects() call into the native side. This is done to prevent the Catalog from hitting JVM array limitations (2GB maximum size) at scale. Additionally this patch also extends the --compact_catalog_top=true to apply TCompactProtocol for the above serialization to reduce the in-memory footprint. This patch also caps the native allocations to not go beyond 4GB due to Thrift library limitations. (Thrift internally uses uint32_t datatype to represent the message size which limits the size to ~4.2GB). Testing: Passed ASAN + HDFS core and DEBUG + HDFS exhaustive. Deployed the patch on a 16 node cluster and tested it on a Catalog-update topic of 3.5GB (uncompressed) or 780MB (compressed). Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 --- M be/src/catalog/catalog.cc M be/src/catalog/catalog.h M be/src/rpc/jni-thrift-util.h M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/JniCatalog.thrift M fe/src/main/java/org/apache/impala/service/JniCatalog.java A fe/src/main/java/org/apache/impala/thrift/NativeAllocator.java A fe/src/main/java/org/apache/impala/thrift/NativeByteArrayOutputStream.java A fe/src/main/java/org/apache/impala/thrift/TNativeSerializer.java A fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java A fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java 12 files changed, 731 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/7955/18 -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5881: Use native allocation while building catalog updates
Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-5881: Use native allocation while building catalog updates .. Patch Set 17: (7 comments) Fixed a subtle bug in the test allocator. http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java File fe/src/test/java/org/apache/impala/thrift/NativeByteArrayOutputStreamTest.java: Line 40: || size >= NativeByteArrayOutputStream.BUFFER_MAX_SIZE_LIMIT) { > Our real allocator would not OOM if the size is big, at least not in this p Done Line 60: private void writeAndCheck(NativeByteArrayOutputStream nboas, > I'd prefer to split this up into writeOk() and writeNotOk(). Otherwise, tes Done Line 78: NativeByteArrayOutputStream nboas = > The first allocation happens here in the c'tor. We need to test that as wel Done Line 90: nboas = new NativeByteArrayOutputStream(testAllocator); > nit: nbaos Done Line 108: > remove extraneous newines Done Line 114: testAllocator = new NativeTestAllocator(); > Add a helper function for these smaller tests that creates a new allocator, Done http://gerrit.cloudera.org:8080/#/c/7955/17/fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java File fe/src/test/java/org/apache/impala/thrift/TNativeSerializerTest.java: Line 81: public void testBasicSerialization() > We usually call these TestBasicSerialization (first letter is uppercase) Done -- To view, visit http://gerrit.cloudera.org:8080/7955 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I383684effa9524734ce3c6c0fb7ed37de0e15782 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath VissapragadaGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Tianyi Wang has posted comments on this change. Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/8036/1//COMMIT_MSG Commit Message: Line 9: This patch keeps test_row_availbility from random failure. In this test > Please take a pass to correct wording/grammar. For example, it should be: Done Line 10: the time interval between 'Rows available' event and the previous event > the 'Rows available' timeline event Done Line 11: in runtime profile is measured in order to make sure that rows become > in the runtime profile Done Line 12: available after a specific amount of time. This is not correct since > Instead of 'This' try to rephrase/repeat what 'This' refers to for clarity, Done Line 13: the previous event is that the coordinator finishes sending query to > finished sending the query to the backends Done Line 14: backends, which means the execution on backend might have already > on some backends might have already started Done Line 15: started. This patch tracks another event "ready to start" as the > "Read to start" Done Line 17: query to backends after this event so the time check should always pass. > the query to backends Done http://gerrit.cloudera.org:8080/#/c/8036/1/tests/query_test/test_rows_availability.py File tests/query_test/test_rows_availability.py: Line 81: row_avail_time_ms = self.__parse_time_ms(self.__find_time(line)) > rows_avail_time_ms (not just one row is available) Done Line 92: "'ready to start' event.\nExpected the event to be marked no earlier than "\ > 'Ready to start' Done Line 93: "%sms after the 'ready to start' event.\nQuery: %s"\ > 'Ready to start' Done Line 99: """Find event time point in a line from runtime timeline.""" > from the runtime profile timeline Done Line 100: # Example line: "- Rows available: 3s311ms (2s300ms)" > Example should include what this function returns Done -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Philip Zeyliger has posted comments on this change. Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8038/1//COMMIT_MSG Commit Message: Line 12: Examples: I think you can add a test in tests/shell/test_shell_interactive.py. test_var_substitution() has an example where options are provided. I didn't see an ~/.impalarc test. @pytest.mark.execute_serially def test_var_substitution(self): cmds = open(os.path.join(QUERY_FILE_PATH, 'test_var_substitution.sql')).read() args = '''--var=foo=123 --var=BAR=456 --delimited "--output_delimiter= " ''' result = run_impala_shell_interactive(cmds, shell_args=args) assert_var_substitution(result) PS1, Line 16: .impalarc: : [impala] : query_options=EXPLAIN_LEVEL=2,MT_DOP=2 This might be hard to do, but I think [impala] [[query_options]] EXPLAIN_LEVEL=2 MT_DOP=2 may be better. http://gerrit.cloudera.org:8080/#/c/8038/1/shell/option_parser.py File shell/option_parser.py: PS1, Line 160: parser.add_option("--var", dest="keyval", action="append", : help="Define variable(s) to be used within the Impala session." : " It must follow the pattern \"KEY=VALUE\"," : " KEY starts with an alphabetic character and" : " contains alphanumeric characters or underscores.") : parser.add_option("--query_options", dest="query_options", : help="Sets default query options. The format is comma separated" : A little bit of bike-shedding: Impala-shell seems to support --var a=b --var c=d, but for query options it's "--query_options=a=b,c=d". I think the former approach is preferable, because it means we don't have to worry about commas in settings. It's also consistent with --var. -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4987: Fix flaky test test row availability.py
Tianyi Wang has uploaded a new patch set (#2). Change subject: IMPALA-4987: Fix flaky test test_row_availability.py .. IMPALA-4987: Fix flaky test test_row_availability.py This patch keeps test_row_availbility from random failing. In this test the time interval between the 'Rows available' timeline event and the previous event in the runtime profile is measured in order to make sure that the rows become available after a specific amount of time. This measurement is not correct since the previous event is that the coordinator finished sending query to the backends, which means the execution on some backends might have already started. This patch tracks another event "Ready to start" as the beginning of time interval instead. The coordinator begins to send the query to the backends after this event so the time check should always pass. Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 --- M tests/query_test/test_rows_availability.py 1 file changed, 31 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8036/2 -- To view, visit http://gerrit.cloudera.org:8080/8036 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I96142f1868a26426cbc34aa9a0e0a56979df66c3 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-5597: Try casting targetExpr when building runtime filter plan
Tianyi Wang has uploaded a new patch set (#2). Change subject: IMPALA-5597: Try casting targetExpr when building runtime filter plan .. IMPALA-5597: Try casting targetExpr when building runtime filter plan This patch fixes a bug that fails a precondition check when generating runtime filter plans. The lhs and rhs or join predicate might have different types when the eq predicate function accepts wildcard-typed parameters. In this case in existing code the types of source and target expr will be found mismatch and an exception will be thrown when generating runtime filters. This patch tries to cast target expr to be of the same type as source expr. A testcase is added to joins.test Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java M testdata/workloads/functional-query/queries/QueryTest/joins.test 5 files changed, 20 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/7949/2 -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-5597: Check predicate children types when building runtime filter plan
Tianyi Wang has restored this change. Change subject: IMPALA-5597: Check predicate children types when building runtime filter plan .. Restored A new fix is implemented. -- To view, visit http://gerrit.cloudera.org:8080/7949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: restore Gerrit-Change-Id: I0d66673e280ce13cd3a514236c0c2ecbc50475a6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
Re: [Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
I've been talking to John out of band about this. I had some more meta concerns about documenting this and whether it will encourage customers to do dangerous things. Need to organise my thoughts and enumerate concerns. On 12 Sep. 2017 9:42 am, "Matthew Jacobs (Code Review)"wrote: > Matthew Jacobs has posted comments on this change. > > Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps > .. > > > Patch Set 2: > > (3 comments) > > http://gerrit.cloudera.org:8080/#/c/7983/2/docs/shared/impala_common.xml > File docs/shared/impala_common.xml: > > PS2, Line 848: our > who is 'our' here? > > > PS2, Line 850: , > : which matches the representation for > TIMESTAMP in Impala > , and is required in order to use TIMESTAMP. > > > Line 855: > it might be good to just say that you should probably just define this > variable always, unless you have a really good reason not to > > > -- > To view, visit http://gerrit.cloudera.org:8080/7983 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 > Gerrit-PatchSet: 2 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: John Russell > Gerrit-Reviewer: John Russell > Gerrit-Reviewer: Matthew Jacobs > Gerrit-Reviewer: Michael Ho > Gerrit-Reviewer: Taras Bobrovytsky > Gerrit-Reviewer: Tim Armstrong > Gerrit-HasComments: Yes >
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has uploaded a new patch set (#9). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/common/status.h M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 8 files changed, 100 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/9 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 180: state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, > Unfortunately it looks like when abort_on_error=1 we depend on this status Done. I opened IMPALA-5922 to track this and left a TODO in the code. -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Dan Hecht has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 180: state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, > Unfortunately it looks like when abort_on_error=1 we depend on this status Hm, that's unfortunate. I wonder how many of these are already reporting filename/offset, but yeah, until we fix them all we'll have to keep the old structure. Let's be sure to continue to chip away at these kind of things over time, though. And let's leave a TODO explaining where we'd like to get to (use LogOrReturnError()). -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to 3f49724
Impala Public Jenkins has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1214/ -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5664: Unix time to timestamp conversions may crash Impala
Lars Volker has posted comments on this change. Change subject: IMPALA-5664: Unix time to timestamp conversions may crash Impala .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7954/2/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: PS2, Line 81: ptime Have you considered catching the error when the invalid ptime object is created? What is the benefit of doing it here? -- To view, visit http://gerrit.cloudera.org:8080/7954 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77b2f6284d3a597f57e61c17a67c959eff9e38ff Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Lars Volker has posted comments on this change. Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. Patch Set 1: (1 comment) Looks good to me, only a minor comment. http://gerrit.cloudera.org:8080/#/c/8038/1/shell/impala_shell.py File shell/impala_shell.py: PS1, Line 1308: keyvals Can you inline this? -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba RinghoferGerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Pranay Singh has uploaded a new patch set (#2). Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder Currently DictDecoder class and DictEncoder class uses std::vector to store the tables mapping codeword to value and vice-versa. It is hard to detect the memory usage by these tables when they becomes very large, since this memory is not accounted by Impala's memory mangement infrastructure. This patch uses existing dictionary_pool_ to track the memory used by std::vector in DictDecoder class, whereas an existing memory pool for DictEncoder class is used when allocating memory for std::vector in DictEncoder class. Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 --- M be/src/exec/parquet-column-readers.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc 3 files changed, 42 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/2 -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnellGerrit-Reviewer: Pranay Singh
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Pranay Singh has posted comments on this change. Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/exec-node.h File be/src/exec/exec-node.h: PS1, Line 215: MemTracker* decoder_mem_tracker() { return decoder_mem_tracker_.get(); } > The dictionary is only used for Parquet, so I think the MemPool should not I also think using existing memory pool, dictionary_pool_ is a better option, I missed it. I'll use dictionary_pool_ instead http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS1, Line 213: dict_decoder_(new MemPool(parent->scan_node_->decoder_mem_tracker())), > It is important not to create objects on a per-column basis unless truly ne I'll use dictionary_pool_ instead of creating a new memory pool for every column reads http://gerrit.cloudera.org:8080/#/c/8034/1/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: PS1, Line 234: *val_ptr = *dict_[index]; > dict_ needs to return T's directly rather than T*'s. This is a performance I'll use memcpy to copy the contents of dictionary to the buffer passed. -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnellGerrit-Reviewer: Pranay Singh Gerrit-HasComments: Yes
[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
Matthew Jacobs has posted comments on this change. Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7983/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: PS2, Line 848: our who is 'our' here? PS2, Line 850: , : which matches the representation for TIMESTAMP in Impala , and is required in order to use TIMESTAMP. Line 855: it might be good to just say that you should probably just define this variable always, unless you have a really good reason not to -- To view, visit http://gerrit.cloudera.org:8080/7983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: John Russell Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Taras Bobrovytsky Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] Bump Kudu version to 3f49724
Matthew Jacobs has posted comments on this change. Change subject: Bump Kudu version to 3f49724 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder
Pranay Singh has uploaded a new patch set (#2). Change subject: IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder .. IMPALA-5522:Use tracked memory for DictDecoder and DictEncoder Currently DictDecoder class and DictEncoder class uses std::vector to store the tables mapping codeword to value and vice-versa. It is hard to detect the memory usage by these tables when they becomes very large, since this memory is not accounted by Impala's memory mangement infrastructure. This patch uses existing dictionary_pool_ to track the memory used by std::vector in DictDecoder class, whereas an existing memory pool for DictEncoder class is used when allocating memory for std::vector in DictEncoder class. Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 --- M be/src/exec/parquet-column-readers.cc M be/src/util/dict-encoding.h M be/src/util/dict-test.cc 3 files changed, 42 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/8034/2 -- To view, visit http://gerrit.cloudera.org:8080/8034 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I02a3b54f6c107d19b62ad9e1c49df94175964299 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Joe McDonnell
[Impala-ASF-CR] Bump Kudu version to 3f49724
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/8040 Change subject: Bump Kudu version to 3f49724 .. Bump Kudu version to 3f49724 Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 --- M bin/impala-config.sh 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/8040/1 -- To view, visit http://gerrit.cloudera.org:8080/8040 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I21aff562e2bca90436a8d0206ffca44b712bc1f1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Tim Armstrong has posted comments on this change. Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/8011/8/be/src/exec/base-sequence-scanner.cc File be/src/exec/base-sequence-scanner.cc: Line 180: state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, Unfortunately it looks like when abort_on_error=1 we depend on this status wrapping to add a filename and offset for some parse errors, e.g. "Table schema is not a record". I think the filename is pretty essential to understand and fix any errors so I think we should be careful not to drop it. Ideally I think all parse errors from the scanners would just include that context when original constructed. Not sure how much of a project it would be to go through all the sequence scanners and fix that. It's much cleaner using LogOrReturnError(). Maybe in the meantime we should just do: if (!status.IsCancelled() && !status.IsMemLimitExceeded() && !status.IsDiskIoError()) { state_->LogError(ErrorMsg(TErrorCode::SEQUENCE_SCANNER_PARSE_ERROR, ...) } http://gerrit.cloudera.org:8080/#/c/8011/6/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 544: return Status(TErrorCode::DISK_IO_ERROR, > Should we re-use DISK_IO_ERROR for those, even though they're technically n I think it makes sense to use DISK_IO_ERROR for now if we interpret its meaning broadly as "an error that prevented the disk I/O manager completing a request range". -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5856: Fix outer join predicate assignment.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/8039 Change subject: IMPALA-5856: Fix outer join predicate assignment. .. IMPALA-5856: Fix outer join predicate assignment. Fixes incorrect assignment of join predicates with the following properties: - from the On-clause of a left outer join - only references the left-hand side tuples (not the right hand side tuple) - references full-outer joined tuples; the full outer join appears on the left Testing: - a core/hdfs run passed - added new regression test Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test 2 files changed, 43 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/8039/1 -- To view, visit http://gerrit.cloudera.org:8080/8039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I93db34d988cb66e00aa05d7dc161e0ca47042acb Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
[Impala-ASF-CR] IMPALA-5736: Add impala-shell argument to set default query options
Csaba Ringhofer has uploaded a new change for review. http://gerrit.cloudera.org:8080/8038 Change subject: IMPALA-5736: Add impala-shell argument to set default query options .. IMPALA-5736: Add impala-shell argument to set default query options Query options can be set from command line and impala rc as comma separated key=value pairs. Examples: command line: impala-shell.sh --query_options=EXPLAIN_LEVEL=2,MT_DOP=2 .impalarc: [impala] query_options=EXPLAIN_LEVEL=2,MT_DOP=2 Note that the option set in command line will overwrite the one in impalarc instead of updating it as a set. For example, if impalarc contains "query_options=EXPLAIN_LEVEL=2", and the shell command contains --query_options="MT_DOP=2", only MT_DOP will be set, and EXPLAIN_LEVEL will use its default value. Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 --- M shell/impala_shell.py M shell/impala_shell_config_defaults.py M shell/option_parser.py 3 files changed, 13 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8038/1 -- To view, visit http://gerrit.cloudera.org:8080/8038 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I26a3b67230c80a99bd246b6af205d558fec9a986 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Csaba Ringhofer
[Impala-ASF-CR] IMPALA-5890: Abort queries if scanner hits IO errors
Lars Volker has uploaded a new patch set (#8). Change subject: IMPALA-5890: Abort queries if scanner hits IO errors .. IMPALA-5890: Abort queries if scanner hits IO errors Prior to this fix, an error in ScannerContext::Stream::GetNextBuffer() could leave the stream in an inconsistent state: - The DiskIoMgr hits EOF unexpected, cancels the scan range and enqueues a buffer with eosr set. - The ScannerContext::Stream tries to read more bytes, but since it has hit eosr, it tries to read beyond the end of the scan range using DiskIoMgr::Read(). - The previous read error resulted in a new file handle being opened. The now truncated, smaller file causes the seek to fail. - Then during error handling, the BaseSequenceScanner calls SkipToSync() and trips over the NULL pointer in in the IO buffer. In my reproduction this only happens with the file handle cache enabled, which causes Impala to see two different sized handles: the one from the cache when the query starts, and the one after reopening the file. To fix this, we change the I/O manager to always return DISK_IO_ERROR for errors and we abort a query if we receive such an error in the scanner. This change also fixes GetBytesInternal() to maintain the invariant that the output buffer points to the boundary buffer whenever the latter contains some data. I tested this by running the repro from the JIRA and impalad did not crash but aborted the queries. I also ran the repro with abort_on_error=1, and with the file handle cache disabled. Text files are not affected by this problem, since the text scanner doesn't try to recover from errors during ProcessRange() but wraps it in RETURN_IF_ERROR instead. With this change queries abort with the same error. Parquet files are also not affected since they have the metadata at the end. Truncated files immediately fail with this error: WARNINGS: File 'hdfs://localhost:20500/test-warehouse/tpch.partsupp_parquet/foo.0.parq' has an invalid version number: Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/runtime/disk-io-mgr-scan-range.cc M be/src/runtime/disk-io-mgr.cc M be/src/runtime/runtime-state.cc M common/thrift/generate_error_codes.py 7 files changed, 86 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/8011/8 -- To view, visit http://gerrit.cloudera.org:8080/8011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I44dc95184c241fbcdbdbebad54339530680d3509 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars VolkerGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4082: Remove todo item in getRegionsInRange .. IMPALA-4082: Remove todo item in getRegionsInRange HbaseTable.GetRegionsInRange is a function copied and modified from HTable in Hbase 0.95.1 or ealier. The HTable function uses cached region location and the modified version gets more up-to-date information. There is a todo item for the removal of this modified function if Hbase provides the same functionality itself. In Hbase 0.95.2, the HTable function is renamed to getKeysAndRegionsInRange and in Hbase 0.99 it became a private function. Thus this todo item is no longer needed and is removed by this patch. Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Reviewed-on: http://gerrit.cloudera.org:8080/8018 Reviewed-by: Lars VolkerReviewed-by: Alex Behm Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java 1 file changed, 8 insertions(+), 8 deletions(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, but someone else must approve Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4082: Remove todo item in getRegionsInRange
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4082: Remove todo item in getRegionsInRange .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8018 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I35f0a3cd9363b55b3cde237beaf3037a7967582a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint "immediately after the SELECT keyword" was mentioned in a few places for STRAIGHT_JOIN. I reworded all instances to mention that [DISTINCT | ALL] can also come before the hint name. Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Reviewed-on: http://gerrit.cloudera.org:8080/8031 Reviewed-by: Alex BehmTested-by: Impala Public Jenkins --- M docs/shared/impala_common.xml M docs/topics/impala_hints.xml M docs/topics/impala_perf_joins.xml 3 files changed, 12 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No
[Impala-ASF-CR] [DOCS] Explain Boost setting needed for 96-bit timestamps
John Russell has posted comments on this change. Change subject: [DOCS] Explain Boost setting needed for 96-bit timestamps .. Patch Set 2: Tim and Michael, I added you as reviewers based on your frequent activity in be/src/exprs/. If someone else is more familiar with the compilation aspect w.r.t. TIMESTAMP, please suggest any other reviewer. -- To view, visit http://gerrit.cloudera.org:8080/7983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4b67cd7762f682c3a054e0d9641080aa51801c83 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: John Russell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5894: [DOCS] Clarify placement of STRAIGHT JOIN hint
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-5894: [DOCS] Clarify placement of STRAIGHT_JOIN hint .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-docs-submit/153/ -- To view, visit http://gerrit.cloudera.org:8080/8031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3cac1afccc132f389b2017ad217fdf7e7b04513a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John RussellGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-HasComments: No