Tianyi Wang has uploaded a new patch set (#2). Change subject: IMPALA-4794: Grouping distinct agg plan robust to data skew ......................................................................
IMPALA-4794: Grouping distinct agg plan robust to data skew This patch changes the query plan for grouping distinct aggregations to be more robust to data skew in the grouping expressions. The existing plan partitions data between phase-1 and phase-2 by grouping expr and the data skewness on grouping expr directly impacts performance. The new plan partitions data by both grouping expr and distinct aggregation expr, then adds one more aggregation and exchange node. It is supposed to be faster with data skew but slower otherwise. Teting: Modified existing planner tests which already provide sufficient coverage. The pattern is that the distinct aggregation expr is added to exchange node, followed by an additional merge aggregation and exchange node. Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9 --- M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test 7 files changed, 213 insertions(+), 128 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7643/2 -- To view, visit http://gerrit.cloudera.org:8080/7643 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]>
