[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16196 to look at the new patch set (#6). Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches .. IMPALA-9882: Import KLL functionality from Apache DataSketches First, I updated our existing snapshot of DataSketches to the following commit: c67d92faad3827932ca3b5d864222e64977f2c20 "Merge pull request #166 from gaborkaszab/const_cast" This affects files originated from kll/ and common/ directories of the DataSketches repo. Then I copied all the files needed for KLL into our snapshot directory. You can find the original Apache DataSketches files here: https://github.com/apache/incubator-datasketches-cpp This new snapshot however, broke the interface we used for serializing hll_union objects with dropping serialize_compact(). As a solution I had to make changes to the serialization and merging phases of the union operator by not serializing hll_union itself but the underlying hll_sketch instead. Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp D be/src/thirdparty/datasketches/CommonUtil.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable-internal.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md A be/src/thirdparty/datasketches/bounds_binomial_proportions.hpp A be/src/thirdparty/datasketches/common_defs.hpp A be/src/thirdparty/datasketches/count_zeros.hpp M be/src/thirdparty/datasketches/hll.hpp A be/src/thirdparty/datasketches/kll_helper.hpp A be/src/thirdparty/datasketches/kll_helper_impl.hpp A be/src/thirdparty/datasketches/kll_quantile_calculator.hpp A be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp A be/src/thirdparty/datasketches/kll_sketch.hpp A be/src/thirdparty/datasketches/kll_sketch_impl.hpp A be/src/thirdparty/datasketches/memory_operations.hpp A be/src/thirdparty/datasketches/serde.hpp 29 files changed, 3,280 insertions(+), 347 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/16196/6 -- To view, visit http://gerrit.cloudera.org:8080/16196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943 Gerrit-Change-Number: 16196 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/16123/9//COMMIT_MSG Commit Message: PS9: note to self: need to focus on tests http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: | hash predicates: bigint_col IS NOT DISTINCT FROM functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM functional.alltypestiny.float_col, id IS NOT DISTINCT FROM functional.alltypestiny.id, int_col IS NOT DISTINCT FROM functional.alltypestiny.int_col, month IS NOT DISTINCT FROM functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM functional.alltypestiny.year, string_col IS NOT DISTINCT FROM functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM functional.alltypestiny.date_string_col > Actually, I was not referring to planning time but the execution time. I h Yeah it does add overhead - with the regular equality predicates, we don't insert or probe with rows with null join keys, so the null check is omitted. In general it would be helpful to have more nullability info since there are a lot of null checks in the compiled code (basically every SlotRef expr) -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 8 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 06:27:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > From what I've seen the biggest killer in these situations is with plan tim Actually, I was not referring to planning time but the execution time. I haven't done a measurement but I would imagine the cpu cost of IS NOT DISTINCT to be a bit more than the equality comparison because of the null == null check for each row and potentially many columns. Something to evaluate in the future. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:41:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16212 ) Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16212 ) Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6170/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16212 ) Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 3 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16212 ) Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6698/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:05:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6697/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 05:03:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 9: (15 comments) Thanks for all the test suggestions guys! http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544 PS8, Line 2544: // nonterminal making this issue unresolvable. We rely on the left precedence of > Not your change, but maybe drop a reference to IMPALA-4741 in here. Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546 PS8, Line 2546: // select_stmt (i.e., ORDER BY and LIMIT bind to the select_stmt by default, and not the > Some of the wordings in this comment needs to be updated to remove referenc Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348 PS8, Line 348: public boolean setOperationNeedsRewrite = false; > It is confusing that this is specifically intended to be set for Except, In Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014 PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from functional.alltypes " + > line too long (92 > 90) Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024 PS8, Line 3024: > line has trailing whitespace Done http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60 PS8, Line 60: > Uncomment Hah how'd that sneak in http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361 PS8, Line 361: # nested except, shouldn't be unnested, if it had been the results would be incorrect > I didn't quite see what this comment was getting at. Hah who knows what my state of mind was at that point. I tried to clean up the comment a bit. The intent was to contrast this plan with the one above, to emphasize except can't be unnested and the difference plan shape as a result. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > That's good that the codegen does some optimization for the hashing+equalit >From what I've seen the biggest killer in these situations is with plan times >dealing with ExprSubstitutionMaps being linear time searches. That combined >with the way rewrites and analysis are done, we end getting into super >quadratic behavior and JVM GC issues that could easily be avoid with a hash >table for exprs. In general though agree, I had thought it would be better to address this issue and DISTINCT placement in general as another rewrite phase. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test File testdata/workloads/functional-query/queries/QueryTest/except.test: PS8: > Can we add a token query or two that use the MINUS and EXCEPT DISTINCT alte Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153 PS8, Line 153: (select 10 except select 11) union all select 10 > This is a repeat of the one just above. Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166 PS8, Line 166: select 10 union all select 11 union all select 11 except select 10 > Would be good to have something like Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356 PS8, Line 356: b the > absorb? Done http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: PS8: > Can we add a token query or two that use the INTERSECT DISTINCT alternative Done http://gerrit.cloudera.org:8080/#/c/16123/8/
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16212 ) Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/16212/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java: http://gerrit.cloudera.org:8080/#/c/16212/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1392 PS1, Line 1392: Only subqueries that > I think we should remove the bit about the invariant Done -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 04:37:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16204 ) Change subject: IMPALA-8125: Add query option to limit number of hdfs writer instances .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16204/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/16204/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@236 PS3, Line 236: to nit: "to" seems misplaced. -- To view, visit http://gerrit.cloudera.org:8080/16204 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5 Gerrit-Change-Number: 16204 Gerrit-PatchSet: 3 Gerrit-Owner: Bikramjeet Vig Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 04:37:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException
Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16212 to look at the new patch set (#2). Change subject: IMPALA-9929: Subquery error should throw AnalysisException .. IMPALA-9929: Subquery error should throw AnalysisException Unsupported subquery in the select list should throw an AnalysisException. Testing: * Analyzer test to catch this case. Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 --- M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java 2 files changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16212/2 -- To view, visit http://gerrit.cloudera.org:8080/16212 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338 Gerrit-Change-Number: 16212 Gerrit-PatchSet: 2 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#9). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 29 files changed, 5,038 insertions(+), 796 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/9 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 9 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6696/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 04:12:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#5). Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. Doing this pushdown can substantially improve performance by applying the limit early. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - TODO: Add targeted end-to-end tests Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test 24 files changed, 1,055 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/5 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16230 ) Change subject: IMPALA-9987: Improve logging around HTTP connections .. Patch Set 1: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6169/ -- To view, visit http://gerrit.cloudera.org:8080/16230 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f Gerrit-Change-Number: 16230 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 03:31:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. IMPALA-9979: part 1: factor out Top-N heap. This extracts the implementation of the actual priority queue from the rest of TopNNode's state, so that we can, in the next patch, have multiple heaps per node. The codegen'd InsertBatch() function is unfortunately a little sensitive to minor changes in code, because of the weird way that it does an indirect call via TupleRowComparator - see IMPALA-4065. I had to tweak the code a little to find a variant that performed similarly to the previous version - other variants had small regressions. Perf: Single node TPC-H showed no perf change. The time for the TOP-N node in this targeted query was within the margin of error: use tpch30_parquet; set mt_dop=1; select l_extendedprice from lineitem order by 1 limit 100 Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Reviewed-on: http://gerrit.cloudera.org:8080/16223 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/codegen/impala-ir.h M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/util/tuple-row-compare.h 5 files changed, 163 insertions(+), 73 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 5 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 03:30:29 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 20: Patch set 19 fail the same test, test_multiple_sort_run_bytes_limits. Looks like admission controller does not respect buffer_pool_limit as much as mem_limit. Patch set 20 change the test cases to use mem_limit instead of buffer_pool_limit, just as Tim initially suggest. Some of the sort_run_bytes_limit parameter also adjusted to keep the assertions true. Fang-Yu help me verify that this Patch set 20 can pass ubuntu-16.04-dockerised-tests by rerunning it in this jenkins job: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2814/ -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 02:47:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6695/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 01:11:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 23 Jul 2020 00:55:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@418 PS4, Line 418: if (!(analyticWindow_.getLeftBoundary().getType() == AnalyticWindow.BoundaryType.UNBOUNDED_PRECEDING line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@419 PS4, Line 419: && analyticWindow_.getRightBoundary().getType() == AnalyticWindow.BoundaryType.CURRENT_ROW)) { line too long (106 > 90) http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@417 PS4, Line 417: private PlanNode findDescendantAnalyticNode(PlanNode root, List intermediateNodes) { line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 23 Jul 2020 00:51:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Hello David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#4). Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. Doing this pushdown can substantially improve performance by applying the limit early. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - TODO: Add targeted end-to-end tests Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test 24 files changed, 1,047 insertions(+), 269 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/4 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16231 ) Change subject: IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6694/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16231 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a Gerrit-Change-Number: 16231 Gerrit-PatchSet: 2 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 22 Jul 2020 23:51:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events
Fang-Yu Rao has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16231 ) Change subject: IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events .. IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events This JIRA could be considered as a follow-up to IMPALA-9625, where we converted the name of a TAccessEvent to lowercase to avoid duplicate audits in the Set used to maintain the collected TAccessEvent's so that there will not be duplicate TAccessEvent's in the file specified by the flag of "-audit_event_log_dir" when Impala is started. However, the patch for IMPALA-9625 only considered the audits that are exported to the specific file mentioned above but not the PrivilegeRequest's that will be processed by Ranger which in turn would produce the corresponding audit log entries. Therefore, the fully-qualified table name that is provided when Analyzer#registerPrivReq() is called in Analyzer#getTable() is not necessarily in lowercase, resulting in duplicate AuthzAuditEvent's stored in the corresponding RangerBufferAuditHandler because the full table names returned from registerAuthAndAuditEvent() and getTable() differ. Refer to IMPALA-9625 for more details. To resolve the inconsistencies, this patch converts the arguments of database and table names to lowercase when PrivilegeRequestBuilder#onTable() is building the corresponding PrivilegeRequest, which will later be added to the Set of PrivilegeRequest's for Ranger to process. Testing: - Added an FE test in RangerAuditLogTest.java to make sure no duplicate Ranger audit log entries are produced. - Verified that the patch passes the exhaustive tests in the DEBUG build. Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a --- M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java 2 files changed, 18 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/16231/2 -- To view, visit http://gerrit.cloudera.org:8080/16231 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a Gerrit-Change-Number: 16231 Gerrit-PatchSet: 2 Gerrit-Owner: Fang-Yu Rao Gerrit-Reviewer: Fang-Yu Rao Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Norbert Luksa Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16218 ) Change subject: IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls .. IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls The calls to get_num_in_flight_queries in TestFetchFirst are flaky because they expect the number of in flight queries to drop to 0 immediately. This might not always be true, especially in ASAN builds where Impala is generally slower. This patch wraps to call to get_num_in_flight_queries in ImpalaTestSuite.assert_eventually, which adds retries to the calls to get_num_in_flight_queries. Testing: * Ran tests/hs2/test_fetch_first.py locally Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Reviewed-on: http://gerrit.cloudera.org:8080/16218 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M tests/hs2/test_fetch_first.py 1 file changed, 6 insertions(+), 2 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Gerrit-Change-Number: 16218 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16218 ) Change subject: IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Gerrit-Change-Number: 16218 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Wed, 22 Jul 2020 23:28:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16222 ) Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows are returned .. IMPALA-9953: Shell should continue fetching even when 0 rows are returned The Impala shell stops fetching rows if it receives a batch that contains 0 rows. This is incorrect because a batch with 0 rows can be returned if the fetch request hits a timeout. Instead, the shell should rely on the value of has_rows / hasMoreRows to determine when to stop issuing fetch requests. Tests: * Added a regression test to test_shell_commandline.py * Ran all shell tests Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Reviewed-on: http://gerrit.cloudera.org:8080/16222 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M shell/impala_shell.py M tests/shell/test_shell_commandline.py 2 files changed, 17 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Gerrit-Change-Number: 16222 Gerrit-PatchSet: 4 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16222 ) Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows are returned .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Gerrit-Change-Number: 16222 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 23:28:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16159 ) Change subject: IMPALA-3127: Support incremental metadata updates in partition level .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6693/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870 Gerrit-Change-Number: 16159 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 22 Jul 2020 23:09:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 20: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6692/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 23:01:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/16159 ) Change subject: IMPALA-3127: Support incremental metadata updates in partition level .. Patch Set 4: (13 comments) Thanks for the review! Uploaded the new patch set after it passed the exhaustive test. > I think it would be useful if we could have an exhaustive test (may be in a > separate jira) to make sure that we are not leaking partitions in statestore. > The test could add/drop partitions along with multiple add/invalidate/drop > table commands and make sure that the number of partition keys in the > statestore is as per our expectation. Yeah, created IMPALA-9994 for this. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@712 PS3, Line 712: if (!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, v2Key, > Its unclear to me that when we generate the minimalObject when delete flag Sorry, this line is added in PS1 and should be removed in PS2... I add a test for this in PS4. Added these nice comments in the class comment of HdfsTable. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@742 PS3, Line 742: partObject.setId(obj.hdfs_partition.id); : } else if (obj.hdfs_partition.isSetPrev_id()) { : Preconditions.checkState( : obj.hdfs_partition.prev_id != HdfsPartition.INITIAL_PARTITION_ID - 1, : "Invalid partition id"); : > This looks a bit hacky to me. Do you think it would be more readable by add I think this way satifies the meaning of invalidations better. LocalCatalog coordinators don't need to distinguish whether an invalidation is an "update" invalidation or a "delete" invalidation. On the other hand, catalogd sends minimal objects as invalidations because it knows the implementation of coordinators. I think it's ok for adding the awareness of how coordinator use the partition ids. BTW, the prev_id field is added in THdfsPartition but is only used in passing the previous partition id through here. I'll define its default value to -1 in thrift definition. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1295 PS3, Line 1295: topicUpdateEntry.getLastSentVersion(), > wouldn't this line be called for both fullUpdate and a incremental update? Sorry, I think I use "incremental updates" in many places and it introduce confusions. toThriftWithPartitionIds() is used when catalogd wants to send partition updates individually instead of carrying them inside the thrift table. I call this "incremental updates" but I think I should avoid the conflicts with incremental catalog topic updates. Will update the javadoc. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1321 PS3, Line 1321: // statestored restarts). : if (ctx.isFullUpdate()) hdfsTable.resetMaxSentPartitionId(); : > nit, perhaps this is more readable? Done http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1329 PS3, Line 1329: > Can you clarify why this is needed only in case of incremental updates? Wha Sorry, I thought no one can't make use of these in a full topic update. But it's only true for statetore and v1 coordinators. When statestore restarts, its catalog topic map is empty. It will fetch a full topic update (fromVersion=0) from catalogd. But there are no old values to be reset in its catalog topic map. When statestore restarts, V1 coordinators will receive a full topic update which will trigger it to reset the whole local cache. They don't need deletions in the new empty cache. Actually, partition deletions are always ignored by v1 coordinators since partition deletions are detected by absense of the id in table's latest partition list. However, v2 coordinators won't reset the cache so they can still use them to invalidate obsolete partition cache. Will remove this check. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@90 PS3, Line 90: ly instead of > I think it is worth documenting that even though this extends CatalogObject Sure. Done. http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catal
[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level
Hello Anurag Mantripragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16159 to look at the new patch set (#4). Change subject: IMPALA-3127: Support incremental metadata updates in partition level .. IMPALA-3127: Support incremental metadata updates in partition level Currently, partitions are tightly integrated into the HdfsTable objects. Catalogd has to transmit the entire table metadata even when few partitions change. This is a waste of resources and can lead to OOM in transmitting large tables due to the 2GB JVM array limit. This patch makes HdfsPartition extend CatalogObject so the catalogd can send partitions as individual catalog objects. Consequently, table objects in the catalog topic update can have minimal partition maps that only contain the partition ids, which reduces the thrift object size for large tables. The catalog object key of HdfsPartition consists of db name, table name and partition name. In "full" topic mode (catalog_topic_mode=full), catalogd only sends changed partitions with their latest table states. The latest table states are table objects with the minimal partition map. Legacy coordinators use the partition list to pick up existing (unchanged) partitions from the existing table object and new partitions in the catalog update. Currently, partition instances are immutable - all partition modifications are implemented by deleting the old instance and adding a new one with a new partition id. Since partition ids are generated by a global counter. Newer partition instances will have larger partition ids. So catalogd maintains a watermark for each table as the max sent partition id. Partition instances with ids larger than this are new partitions that should be sent in the next catalog update. For the deleted partition instances, they are kept in a set for each table until the next catalog update. If there are no updates on the same partition name, catalogd will send deletion on the partition. For dropped or invalidated tables, catalogd will still send deletions on their partitions. Although they are not used in coordinators (coordinators delete the partitions when they delete the table instances), they help in avoiding topic entry leak in the statestore catalog topic. In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only sends invalidations on tables and stale partition instances. Each partition instance is identified by its partition id. LocalCatalog coordinators use the partition invalidations to evict stale partitions in time. For instance, let's say partition(year=2010) is updated in catalogd. This is done by deleting the old partition instance partition(id=0, year=2010) and adding a new partition instance partition(id=1, year=2010). Catalogd will send invalidations on the table and partition instance with id=0, but not the one with id=1. A LocalCatalog coordinator will invalidate the partition instance(id=0) if it's in the cache. If the partition instance(id=1) is cached, it's already the latest version since partition instances are immutable. So we don't need to invalidate it. Tests - Run exhaustive tests. - Run exhaustive test_ddl.py in LocalCatalog mode. - Add test in test_local_catalog.py to verify stale partitions are invalidated in LocalCatalog when partitions are updated. Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870 --- M be/src/catalog/catalog-util.cc M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/catalog/Catalog.java M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M tests/custom_cluster/test_local_catalog.py 13 files changed, 615 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16159/4 -- To view, visit http://gerrit.cloudera.org:8080/16159 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870 Gerrit-Change-Number: 16159 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anurag Mantripragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-8547: get json object fails to get value for numeric key
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14905 ) Change subject: IMPALA-8547: get_json_object fails to get value for numeric key .. Patch Set 3: > This patch LGTM. > > Hive supports more general keys because it just split the json path > by '.' > https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java#L147 > So this is also workable in Hive: > > select get_json_object('{"hello world": 5}', '$.hello world'); > > It can't work in Impala because "hello world" is not a legal > variable name. > I think if we want the compatibility with Hive we can create a JIRA > to refactor the json patch parsing logics. I filed IMPALA-9993 as a follow up. I think this requires some more thought. That SQL statement is valid in Postgres, but not MySQL. It seems all databases have a slightly different way of handling JSON. The Hive / Impala syntax seems to be some combination of Postgres / MySQL behavior, which is a bit odd. -- To view, visit http://gerrit.cloudera.org:8080/14905 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4 Gerrit-Change-Number: 14905 Gerrit-PatchSet: 3 Gerrit-Owner: Eugene Zimichev Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:47:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15963 to look at the new patch set (#20). Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. IMPALA-6692: Trigger sort node run before hitting memory limit. Sorter node works by adding row batches to a sort run. After all batches are added to current unsorted run or memory limit is hit, sorter will immediately start the run. If the latter case happens, sorter will spill the sorted run to disk after sort complete, create new unsorted run object, and continue to add the next row batches, and so on. This algorithm tries to fit as much rows into memory before start sorting. However, in the case of partitioned sort with large number of row batches, fitting too much rows into memory will cause the sort to be slow and block the sorter node for a long time before it can release some memory and continue accepting the next row batch from exchange node. One slow sorter node can block exchange node from sending row batches to other sorter node that is free. This patch speeds up the decision to start the sort without waiting it to hit memory limit first by capping the intermediary quicksort run to lower memory limit, determined by query option 'sort_run_bytes_limit'. If the total used reservation of quicksort has exceeded sort_run_bytes_limit, current unsorted_run_ will be wrapped up, sorted, and then spilled. Thus, overlapping the next sort run with spill from previous sort run. To reduce regression for cases where total input size of sort node might be fully fit into available memory, sort_run_bytes_limit will not be enforced for the first sort run. However, it will stay limited by sort_run_bytes_limit if planner estimates hint that spill is inevitably will happen. We also add new summary counter 'AddBatchTime' to get summary of how much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate the longest time spent in Sorter::AddBatch, presumably busy doing intermediary sort. Testing: - Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits - Run core tests - Run data loading of 3 largest TPC-DS facts table of 300GB scale into real cluster using 5 backends, and 4GB mem_limit. sort_run_bytes_limit is varied between unspecified (not limited) vs 512 MB. The performance result is summarized in the following table. +---+-+--+---+-+ | Insert table | #Rows | Avg | no limit| 512 MB limit | | | | SortDataSize ++--+-+---+ | | | per Node | Query | Max | Query | Max | | | | | Time | AddBatchTime | Time | AddBatchTime | +---+-+--++--+-+---+ | store_sales | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m | 5s634ms | +---+-+--++--+-+---+ | catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms | 15m27s | 3s603ms | +---+-+--++--+-+---+ | web_sales | 216.01M | 5.67 GB | 8m16s | 29s250ms | 6m41s | 3s856ms | +---+-+--++--+-+---+ Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 --- M be/src/exec/sort-node.cc M be/src/exec/sort-node.h M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M be/src/service/query-options-test.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/SortNode.java M tests/query_test/test_sort.py 15 files changed, 224 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/20 -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 20 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16230 ) Change subject: IMPALA-9987: Improve logging around HTTP connections .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6169/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16230 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f Gerrit-Change-Number: 16230 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:26:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/16230 ) Change subject: IMPALA-9987: Improve logging around HTTP connections .. Patch Set 1: verify failed due to IMPALA-9923 -- To view, visit http://gerrit.cloudera.org:8080/16230 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f Gerrit-Change-Number: 16230 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:25:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc@37 PS3, Line 37: priority_queue_.size() < heap_capacity() > just thinking out loud, do you think generally the else part will be more c The branch should be predictable at least - you're right that we'd want to optimise for the case when there are many rows. Probably not worth investing too much into tuning until we do codegen of the comparator, cause that will completely change the performance profile anyway. -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:46 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6168/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 4 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/16223 ) Change subject: IMPALA-9979: part 1: factor out Top-N heap. .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc File be/src/exec/topn-node-ir.cc: http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc@37 PS3, Line 37: priority_queue_.size() < heap_capacity() just thinking out loud, do you think generally the else part will be more common? Like I would assume the limit to be a smallish value and the top-N node going through 1000s of rows. If yes, then do you think adding a IR_LIKELY for the else case will help performance even if in a small way? -- To view, visit http://gerrit.cloudera.org:8080/16223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31 Gerrit-Change-Number: 16223 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Jul 2020 21:51:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16230 ) Change subject: IMPALA-9987: Improve logging around HTTP connections .. Patch Set 1: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6164/ -- To view, visit http://gerrit.cloudera.org:8080/16230 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f Gerrit-Change-Number: 16230 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 20:59:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6167/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 22 Jul 2020 19:41:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6691/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 22 Jul 2020 19:40:46 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 19:34:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16120 to look at the new patch set (#4). Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. IMPALA-9903: Reduce Kudu openTable calls per query This patch reduces the number of Kudu openTable calls for the lifetime of a query by storing the KuduTable object in the Analyzer GlobalState and using it in the KuduScanNode. It does not cache the KuduTable object longer than a single query, does not impact DDL statements, and does not introduce the need to invalidate metadata when interacting with Kudu tables. Reducing the number of openTable calls is important because each call results in a GetTableSchema RPC to the remote leader Kudu master. With very high rates of queries against Kudu tables this can overload the master leading to degraded query performance. Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 3 files changed, 34 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/4 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16215 ) Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6690/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 22 Jul 2020 19:15:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 3: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6689/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 22 Jul 2020 18:56:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16215 ) Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators .. IMPALA-5746: Cancel all queries scheduled by failed coordinators Executor registers the updating of cluster membership. When coordinators are absence from the active cluster membership list, executer cancels all the running fragments of the queries which are scheduled by the inactive coordinator since the executer cannot send results back to the inactive/failed coordinators. This makes executers quickly release the resources allocated for those running fragments to be canceled. Testing: - Added new test case TestProcessFailures::test_kill_coordinator and ran the test case as following command: ./bin/impala-py.test tests/custom_cluster/test_process_failures.py\ ::TestProcessFailures::test_kill_coordinator \ --exploration_strategy=exhaustive. - Passed the core test. Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/exec-env.cc M be/src/runtime/query-exec-mgr.cc M be/src/runtime/query-exec-mgr.h M be/src/runtime/query-state.cc M be/src/runtime/query-state.h M be/src/runtime/test-env.cc M common/protobuf/control_service.proto M tests/custom_cluster/test_process_failures.py 9 files changed, 183 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/16215/4 -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 4 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16215 ) Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc File be/src/runtime/exec-env.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc@554 PS3, Line 554: server->CancelQueriesOnFailedBackends(current_backend_set); > I was thinking to reuse the backend set and save a loop with one callback f Yeah +1 to what Thomas said. http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc File be/src/runtime/query-exec-mgr.cc: http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc@222 PS3, Line 222: // TODO: create cancellation task queue and working thread to run cancellation tasks : // on a separate thread. If the queue is full, ignore the cancellations since we'll : // be able to process them on the next heartbeat instead. : : for (auto& qs : to_cancel) { : VLOG(1) << "CancelQueriesForFailedCoordinators(): cancel query " << qs->query_id(); : qs->Cancel(); : qs->is_coord_active_.Store(false); : ReleaseQueryState(qs); : } > Will define a new thread pool owned by QueryExecMgr. Yeah separate thread pool seems fine. Yeah, I'm fine with keeping this out of ImpalaServer. -- To view, visit http://gerrit.cloudera.org:8080/16215 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd Gerrit-Change-Number: 16215 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 22 Jul 2020 18:38:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/16120/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16120/2//COMMIT_MSG@9 PS2, Line 9: This patch reduces the number of Kudu openTable calls for the : lifetime of a query by storing the KuduTable object in the : Analyzer GlobalState and using it in the > I think it would be good to be more specific here. Looks like currently we Done http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java File fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java: http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java@166 PS2, Line 166: result.setSchema(resultSchema); > These are methods that implement the show partitions DDL, so we don't need Done http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@185 PS2, Line 185: @Override : public List getPrimaryKeyColumnNames() { : return ImmutableList.copyOf(primaryKeyColumnNames_); : } : > This would mean that once kuduTable_ is initialized, it never gets refreshe Done http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@298 PS2, Line 298: partitionBy_ = Utils.loadPartitionByParams(kuduTable); > This probably should be kept as is otherwise we won't see a updated Kudu sc Done http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java: http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java@56 PS2, Line 56: /** > Caching it in LocalTable makes sense since it's per-query anyway. So this p If we are going the analyzer route I don't think this is needed right? http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@135 PS2, Line 135: // Get the KuduTable from the analyzer to retrieve the cached KuduTable > I think this invocation should go via 'analyzer' to retrieve the per-query Done -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Wed, 22 Jul 2020 18:39:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16120 to look at the new patch set (#3). Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. IMPALA-9903: Reduce Kudu openTable calls per query This patch reduces the number of Kudu openTable calls for the lifetime of a query by storing the KuduTable object in the Analyzer GlobalState and using it in the KuduScanNode. It does not cache the KuduTable object longer than a single query, does not impact DDL statements, and does not introduce the need to invalidate metadata when interacting with Kudu tables. Reducing the number of openTable calls is important because each call results in a GetTableSchema RPC to the remote leader Kudu master. With very high rates of queries against Kudu tables this can overload the master leading to degraded query performance. Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 3 files changed, 34 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/3 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16218 ) Change subject: IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls .. Patch Set 2: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6166/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Gerrit-Change-Number: 16218 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Wed, 22 Jul 2020 18:20:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls
Sahil Takiar has removed a vote on this change. Change subject: IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/16218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Gerrit-Change-Number: 16218 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16218 ) Change subject: IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls .. Patch Set 2: Failed due to IMPALA-9991. -- To view, visit http://gerrit.cloudera.org:8080/16218 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46 Gerrit-Change-Number: 16218 Gerrit-PatchSet: 2 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Wed, 22 Jul 2020 18:19:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16222 ) Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows are returned .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6165/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Gerrit-Change-Number: 16222 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 18:14:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16222 ) Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows are returned .. Patch Set 3: A bunch of HBase tests failed due to connection timeouts to the region servers. -- To view, visit http://gerrit.cloudera.org:8080/16222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Gerrit-Change-Number: 16222 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 18:14:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned
Sahil Takiar has removed a vote on this change. Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows are returned .. Removed Verified-1 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/16222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88 Gerrit-Change-Number: 16222 Gerrit-PatchSet: 3 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16228 ) Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1508 PS3, Line 1508:* SELECT item FROM complextypestbl $a$1, $a$1.int_array; I need to understand the current complex types support (independent of ACID) a little more but my initial thought here is that this could potentially introduce a lot of cross-joins depending on the query that would make the ACID reads slower than the regular reads. -- To view, visit http://gerrit.cloudera.org:8080/16228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f Gerrit-Change-Number: 16228 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 16:47:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16230 ) Change subject: IMPALA-9987: Improve logging around HTTP connections .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6164/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/16230 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f Gerrit-Change-Number: 16230 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 16:44:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16228 ) Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types) .. Patch Set 2: Code-Review+1 (5 comments) Nice Work! I did a readthrough on the code part, haven't checked the tests. Looks fine for me, but someone with more frontend knowledge should also take a look. http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@385 PS2, Line 385: reqires nit: typo http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1516 PS2, Line 1516: for (int i = 0; i < stmt.fromClause_.size(); ++i) { : TableRef tblRef = stmt.fromClause_.get(i); nit: you can iterate over fromClause_.getTableRefs() and then you can use a foreach and could get rid of L1517. http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1541 PS2, Line 1541: int tableRefIdx Instead of the index you can use the CollectionTableRef itself as a param. Update: I see you use 'tableRefIdx' for other purposes below so I guess my comment here doesn't make sense :) http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1556 PS2, Line 1556: newCollPath.remove(0); Could you add a comment what is at position '0' here? (I guess in L1553 it's the DB name, but we removed it) http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1576 PS2, Line 1576: private TableRef newTableRef(Analyzer analyzer, List rawPath, String alias) Shouldn't this function belong to TableRef as a static member function? -- To view, visit http://gerrit.cloudera.org:8080/16228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f Gerrit-Change-Number: 16228 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 15:14:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15963 ) Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit. .. Patch Set 19: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/15963 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240 Gerrit-Change-Number: 15963 Gerrit-PatchSet: 19 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 14:29:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16196 ) Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6688/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943 Gerrit-Change-Number: 16196 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 22 Jul 2020 13:44:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16196 to look at the new patch set (#5). Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches .. IMPALA-9882: Import KLL functionality from Apache DataSketches First, I updated our existing snapshot of DataSketches to the following commit: dddc149209902f72b71109f1a098e58d6d4761ee "Merge pull request #159 from apache/workflow_update" This affects files originated from hll/ and common/ directories of the DataSketches repo. Then I copied all the files needed for KLL into our snapshot directory. You can find the original Apache DataSketches files here: https://github.com/apache/incubator-datasketches-cpp This new snapshot however, broke the interface we used for serializing hll_union objects with dropping serialize_compact(). As a solution I had to make changes to the serialization and merging phases of the union operator by not serializing hll_union itself but the underlying hll_sketch instead. Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943 --- M be/src/exprs/CMakeLists.txt M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/datasketches-test.cc M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp D be/src/thirdparty/datasketches/CommonUtil.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable-internal.hpp M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp M be/src/thirdparty/datasketches/CouponList-internal.hpp M be/src/thirdparty/datasketches/Hll4Array-internal.hpp M be/src/thirdparty/datasketches/HllArray-internal.hpp M be/src/thirdparty/datasketches/HllSketch-internal.hpp M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp M be/src/thirdparty/datasketches/HllUnion-internal.hpp M be/src/thirdparty/datasketches/HllUtil.hpp M be/src/thirdparty/datasketches/MurmurHash3.h M be/src/thirdparty/datasketches/README.md A be/src/thirdparty/datasketches/bounds_binomial_proportions.hpp A be/src/thirdparty/datasketches/common_defs.hpp A be/src/thirdparty/datasketches/count_zeros.hpp M be/src/thirdparty/datasketches/hll.hpp A be/src/thirdparty/datasketches/kll_helper.hpp A be/src/thirdparty/datasketches/kll_helper_impl.hpp A be/src/thirdparty/datasketches/kll_quantile_calculator.hpp A be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp A be/src/thirdparty/datasketches/kll_sketch.hpp A be/src/thirdparty/datasketches/kll_sketch_impl.hpp A be/src/thirdparty/datasketches/memory_operations.hpp A be/src/thirdparty/datasketches/serde.hpp 29 files changed, 3,280 insertions(+), 347 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/16196/5 -- To view, visit http://gerrit.cloudera.org:8080/16196 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943 Gerrit-Change-Number: 16196 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6687/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 07:42:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@413 PS2, Line 413: if (!(analyticWindow_.getLeftBoundary().getType() == AnalyticWindow.BoundaryType.UNBOUNDED_PRECEDING line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@414 PS2, Line 414: && analyticWindow_.getRightBoundary().getType() == AnalyticWindow.BoundaryType.CURRENT_ROW)) { line too long (106 > 90) http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@414 PS2, Line 414: private PlanNode findDescendantAnalyticNode(PlanNode root, List intermediateNodes) { line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 22 Jul 2020 07:14:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
Hello David Rorke, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#2). Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator .. IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. Doing this pushdown can substantially improve performance by applying the limit early. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans - Ran end-to-end TPC-DS queries - Specifically tested TPC-DS q67 for limit pushdown and result correctness - Manually tested several negative cases where the pushdown should not be applied - TODO: Run more end-to-end tests - TODO: Add unit tests Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test 21 files changed, 445 insertions(+), 265 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/2 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Tim Armstrong