[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Reviewed-on: http://gerrit.cloudera.org:8080/9949 Reviewed-by: Tianyi WangTested-by: Impala Public Jenkins --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-query/queries/QueryTest/distinct.test M tests/query_test/test_aggregation.py 9 files changed, 454 insertions(+), 24 deletions(-) Approvals: Tianyi Wang: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 8 Gerrit-Owner: Tianyi Wang Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 12 Apr 2018 22:01:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2298/ -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 12 Apr 2018 18:15:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-query/queries/QueryTest/distinct.test M tests/query_test/test_aggregation.py 9 files changed, 454 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/7 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 7 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Thu, 12 Apr 2018 18:14:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 6: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2287/ -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Thu, 12 Apr 2018 01:39:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2287/ -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Wed, 11 Apr 2018 21:13:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Wed, 11 Apr 2018 20:54:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-query/queries/QueryTest/distinct.test M tests/query_test/test_aggregation.py 9 files changed, 454 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/6 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 6 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.h@132 PS5, Line 132: QUERY_OPT_FN(shuffle_distinct_exprs, SHUFFLE_DISTINCT_EXPRS, TQueryOptionLevel::REGULAR)\ This seems similarly ADVANCED as, e.g., DEFAULT_JOIN_DISTRIBUTION_MODE so let's move to ADVANCED. http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/9949/5/be/src/service/query-options.cc@635 PS5, Line 635: case TImpalaQueryOptions::SHUFFLE_DISTINCT_EXPRS: { I just realized we should also update the query-options-test.cc http://gerrit.cloudera.org:8080/#/c/9949/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: http://gerrit.cloudera.org:8080/#/c/9949/5/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@78 PS5, Line 78: public void testNoShuffleOnDistinct() { testShuffleByDistinctExprs() http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test File testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test: http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test@2 PS5, Line 2: QUERY Is the formatting of these tests correct? I believe we don't recognize " QUERY in planner tests" Or perhaps this works by chance? Other tests don't have this better to be consistent. http://gerrit.cloudera.org:8080/#/c/9949/5/testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test@217 PS5, Line 217: select count(distinct a.int_col) from functional.alltypes a inner join [shuffle] Add one more test where the input is partitioned by year, int_col (compatible with the desired partitioning with distinct exprs) http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py File tests/query_test/test_aggregation.py: http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py@336 PS5, Line 336: class TestDistinctAggregationQueries(ImpalaTestSuite): TestDistinctAggregation http://gerrit.cloudera.org:8080/#/c/9949/5/tests/query_test/test_aggregation.py@353 PS5, Line 353: if cls.exploration_strategy() == 'core': Aren't we adding all formats in L345? I think we need to add a constraint here and not a new dimension. -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Tue, 10 Apr 2018 03:49:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/shuffle-by-distinct-exprs.test M testdata/workloads/functional-query/queries/QueryTest/distinct.test M tests/query_test/test_aggregation.py 9 files changed, 371 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/5 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 5 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 4: (3 comments) Looking closely into the tests now. http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@889 PS3, Line 889: // grouping exprs in the second phase which is not required when omitting the distinct > What do you mean by separating them? I suppose we always shuffle if there a Nvm, current code is ok. http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@931 PS3, Line 931: // phase-1 merge step. > L929 adds phase-2 agg. Partition being compatible && no shuffle by distinct Got it, for some reason I missed L929. http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@952 PS3, Line 952: // step (which is where it should be) > We don't need "fragments.add(firstMergeFragment)" if the partition is compa Makes sense. Let's the two paths more uniform with: if (!shuffleDistinctExpr) return firstMergeFragment; fragments.add(firstMergeFragment); -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tianyi Wang Gerrit-Comment-Date: Mon, 09 Apr 2018 22:37:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test 7 files changed, 173 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/4 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 4 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/9949/3/be/src/service/query-options.h@44 PS3, Line 44: TImpalaQueryOptions::SHUFFLE_DISTINCT_EXPR + 1);\ Let's call this SHUFFLE_DISTINCT_EXPRS because there could be several such exprs http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/9949/3/common/thrift/ImpalaInternalService.thrift@276 PS3, Line 276: // When a query has both grouping exprs and a distinct expr, impala can shuffle by * both grouping and distinct exprs * Impala can optionally include the distinct exprs in the hash exchange of the first aggregation phase to spread the data among more nodes. However, this plan requires another hash exchange on the grouping exprs in the second phase which is not required when when omitting the distinct exprs in the first phase. Shuffling by both is better if the grouping exprs have a low NDVs. http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@886 PS3, Line 886: // When a query has both grouping exprs and a distinct expr, impala can shuffle by same comments as in .thrift http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@889 PS3, Line 889: boolean shuffleDistinctExpr = ctx_.getQueryOptions().shuffle_distinct_expr || shuffleDistinctExprs I think it's cleaner to separate the value of the query option from hasGroupingExprs http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@931 PS3, Line 931: if (!shuffleDistinctExpr) return firstMergeFragment; I don't understand why this is correct. If the first phase fragment is suitably partitioned then we still need to do the second phase. Where do we add the second phase aggregation if we return here? Is this because shuffleDistinctExpr conflates whether there are grouping exprs or not and what the query option is set to? http://gerrit.cloudera.org:8080/#/c/9949/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@952 PS3, Line 952: if (shuffleDistinctExpr) { Move this before L959 for better readability. Also you can just: if (!shuffleDistinctExpr) return firstFragment; http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test File testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test: http://gerrit.cloudera.org:8080/#/c/9949/3/testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test@1 PS3, Line 1: # distinct agg with grouping over a union Let's try to distill minimal tests that are needed here. * distinct with and without grouping * first phase fragment is already suitably partitioned or not Looks like we have a bunch of tests here with unnecessary stuff like unions, etc. I'll spend some time looking over these and the existing tests, so we can come up with a lean and effective test suite. -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi WangGerrit-Reviewer: Alex Behm Gerrit-Comment-Date: Mon, 09 Apr 2018 19:16:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test 7 files changed, 856 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/3 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 3 Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/9949 ) Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-by-distinct.test 7 files changed, 856 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/2 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 2 Gerrit-Owner: Tianyi Wang
[Impala-ASF-CR] IMPALA-6822: Add a query option to control shuffling by distinct exprs
Tianyi Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9949 Change subject: IMPALA-6822: Add a query option to control shuffling by distinct exprs .. IMPALA-6822: Add a query option to control shuffling by distinct exprs IMPALA-4794 changed the distinct aggregation behavior to shuffling by both grouping exprs and the distinct expr. It's slower in queries where the NDVs of grouping exprs are high and data are uniformly distributed among groups. This patch adds a query option controlling this behavior, letting users switch to the old plan. Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc --- M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java A testdata/workloads/functional-planner/queries/PlannerTest/no-shuffle-on-distinct.test 7 files changed, 856 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/9949/1 -- To view, visit http://gerrit.cloudera.org:8080/9949 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icb4b4576fb29edd62cf4b4ba0719c0e0a2a5a8dc Gerrit-Change-Number: 9949 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang