Tianyi Wang has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7643

Change subject: IMPALA-4794: Partition distinct expr for skew data
......................................................................

IMPALA-4794: Partition distinct expr for skew data

Currently in an aggregation with grouping and distinct expr, there
are 2 aggregation phases. The first phase aggregates data and
partitions data with grouping expr and then do first merge phase and
second phase. The performance depends on the partitioning of grouping
expr, which could be skew. This patch partitions data in the first
phase by (grouping expr, distinct expr). It introduces an additional
exchange node and a merging node but the data is supposed to be more
balanced with finer grained partitioning and the performance is
supposed to be more stable.

Testing: In planner test, some plans with distinct aggregation change.
The pattern is that the distinct expr is added to exchange node,
followed by an additional aggregation and exchange node.

Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
---
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
7 files changed, 192 insertions(+), 104 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/7643/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7643
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I7bdada0e328b555900c7b7ff8aabc8eb15ae8fa9
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tianyi Wang <tw...@cloudera.com>

Reply via email to