Riza Suminto has uploaded this change for review. ( http://gerrit.cloudera.org:8080/22032
Change subject: IMPALA-13086: Lower AggregationNode estimate using stats predicate ...................................................................... IMPALA-13086: Lower AggregationNode estimate using stats predicate NDV of a grouping column can be reduced if there is a predicate over that column. If the predicate is a constant equality predicate or is-null predicate, then the NDV must be equal to 1. If the predicate is a simple in-list predicate, the NDV must be the number of items in the list. This patch adds such consideration by leveraging existing analysis in HdfsScanNode.computeStatsTupleAndConjuncts(). It memorizes the first ScanNode/UnionNode that produces a TupleId in Analyzer, registered during Init()/computeStats() of the PlanNode. At AggregationNode, it looks up the PlanNode that produces a TupleId. If the origin PlanNode is an HdfsScanNode, analyze if any grouping expression is listed in statsOriginalConjuncts_ and reduce them accordingly. If HdfsScanNode.computeStatsTupleAndConjuncts() can be made generic for all ScanNode implementations in the future, we can apply this same analysis to all kinds of ScanNode and achieve the same reduction. In terms of tracking producer PlanNode, this patch made an exception for Iceberg PlanNodes that handle positional or equality deletion. In that scenario, it is possible to have two ScanNodes sharing the same TupleId to force UnionNode passthrough. Therefore, the UnionNode will be acknowledged as the first producer of that TupleId. Testing: - Add new test cases in aggregation.test - Pass core tests. Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8 --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39a.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39b.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q74.test 33 files changed, 8,468 insertions(+), 8,215 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/22032/1 -- To view, visit http://gerrit.cloudera.org:8080/22032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8 Gerrit-Change-Number: 22032 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto <[email protected]>
