Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20009
Change subject: IMPALA-12183: Fix cardinality clamping across aggregation phases ...................................................................... IMPALA-12183: Fix cardinality clamping across aggregation phases In the Impala planner, an aggregation node's cardinality is a sum of all its aggregation class cardinality. An aggregation class cardinality is a simple multiplication of NDVs of contributing grouping columns. Since this simple multiplication of NDVs can be greater than the aggregation node's input cardinality, each aggregation class cardinality is further clamped at the aggregation node's input cardinality. An aggregation operator can translate into a chain of multi-phase aggregation plan nodes. The longest possible aggregation phase is as follows, from the bottom to the top: 1. FIRST 2. FIRST_MERGE 3. SECOND 4. SECOND_MERGE 5. TRANSPOSE FIRST_MERGE aggregation maintains its aggregation class cardinality clamping at its corresponding FIRST aggregation's input cardinality (similar relationship between SECOND_MERGE and SECOND). However, the SECOND aggregation was clamped at the FIRST_MERGE output cardinality instead of the FIRST input cardinality. This cardinality mispropagation can causes cardinality explosion in the later aggregation phase and node operator above them. This patch fix the clamping of multi-phase aggregation to always look at input cardinality of FIRST aggregation node. An exception is made for TRANSPOSE phase of grouping set aggregation (such as ROLLUP). In that case, cardinality clamping will use output cardinality of child node right below it (either FIRST_MERGE or SECOND_MERGE) because the output cardinality of the whole aggregation chain can be higher than input cardinality of the FIRST phase. Testing: - Add test in card-agg.test - Pass core tests. Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M testdata/workloads/functional-planner/queries/PlannerTest/card-agg.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans-default.test M testdata/workloads/functional-planner/queries/PlannerTest/partition-key-scans.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test 10 files changed, 198 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/20009/1 -- To view, visit http://gerrit.cloudera.org:8080/20009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1d414fe56b027f887c7f901d8a6799a388b16b95 Gerrit-Change-Number: 20009 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <[email protected]>
