Hello Aman Sinha, Yida Wu, Michael Smith, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21860
to look at the new patch set (#7).
Change subject: IMPALA-13405: Do tuple analysis to lower AggregationNode
cardinality
......................................................................
IMPALA-13405: Do tuple analysis to lower AggregationNode cardinality
If an aggregation node has multiple grouping expressions originating
from a common tuple, their combined NDV must not exceed the output
cardinality of PlanNode producing that tuple. This patch enhances
AggragationNode.estimateNumGroups() by grouping such expressions and
capping their combined NDV at the output cardinality of producer
PlanNode. This patch only consider simple SlotRef expression from scalar
type column for cardinality inspection.
This patch also changes multi-class merging-phase aggregation to use the
same per-class estimate from the multi-class preaggregation node right
before it.
Testing:
- Pass core tests.
- Manually verify the changed plan under tpcds_cpu_cost/ with query
profiles from the real run of the same 3TB scale. Confirmed that most
new estimates are still above the actual cardinality.
Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7
---
M fe/src/main/java/org/apache/impala/analysis/AggregateInfo.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-high-mem-estimate.test
M
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-low-mem-estimate.test
M
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M
testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q12.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q14b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q19.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q20.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q22.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q37.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q43.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q45.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q50.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q55.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q67.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q82.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q98.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q37.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43-verbose.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q43.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q45.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q47.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q50.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q55.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q57.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q66.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q67.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q82.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q89.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q91.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q98.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
55 files changed, 5,730 insertions(+), 5,929 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/21860/7
--
To view, visit http://gerrit.cloudera.org:8080/21860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icd589ab5f7ba9566a0d35784f61f5ffaef5696e7
Gerrit-Change-Number: 21860
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>