Hello Aman Sinha, Abhishek Rawat, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/20009
to look at the new patch set (#2).
Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases
......................................................................
IMPALA-12183: Fix cardinality clamping across aggregation phases
In the Impala planner, an aggregation node's cardinality is a sum of all
its aggregation class cardinality. An aggregation class cardinality is a
simple multiplication of NDVs of contributing grouping columns. Since
this simple multiplication of NDVs can be greater than the aggregation
node's input cardinality, each aggregation class cardinality is further
clamped at the aggregation node's input cardinality.
An aggregation operator can translate into a chain of multi-phase
aggregation plan nodes. The longest possible aggregation phase is as
follows, from the bottom to the top:
1. FIRST
2. FIRST_MERGE
3. SECOND
4. SECOND_MERGE
5. TRANSPOSE
FIRST_MERGE aggregation maintains its aggregation class cardinality
clamping at its corresponding FIRST aggregation's input
cardinality (similar relationship between SECOND_MERGE and SECOND).
However, the SECOND aggregation was clamped at the FIRST_MERGE output
cardinality instead of the FIRST input cardinality. This cardinality
mispropagation can causes cardinality explosion in the later aggregation
phase and node operator above them.
This patch fix the clamping of multi-phase aggregation to always look at
input cardinality of FIRST aggregation node. An exception is made for
TRANSPOSE phase of grouping set aggregation (such as ROLLUP). In that
case, cardinality clamping will use output cardinality of child node
right below it (either FIRST_MERGE or SECOND_MERGE) because the output
cardinality of the whole aggregation chain can be higher than input
cardinality of the FIRST phase.
Testing:
- Add test in card-agg.test
- Pass core tests.
Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/card-agg.test
M
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test
M
testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
10 files changed, 214 insertions(+), 72 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/20009/2
--
To view, visit http://gerrit.cloudera.org:8080/20009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95
Gerrit-Change-Number: 20009
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Abhishek Rawat <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>