Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/22032 )
Change subject: IMPALA-13086: Lower AggregationNode estimate using stats predicate ...................................................................... IMPALA-13086: Lower AggregationNode estimate using stats predicate NDV of a grouping column can be reduced if there is a predicate over that column. If the predicate is a constant equality predicate or is-null predicate, then the NDV must be equal to 1. If the predicate is a simple in-list predicate, the NDV must be the number of items in the list. This patch adds such consideration by leveraging existing analysis in HdfsScanNode.computeStatsTupleAndConjuncts(). It memorizes the first ScanNode/UnionNode that produces a TupleId in Analyzer, registered during Init()/computeStats() of the PlanNode. At AggregationNode, it looks up the PlanNode that produces a TupleId. If the origin PlanNode is an HdfsScanNode, analyze if any grouping expression is listed in statsOriginalConjuncts_ and reduce them accordingly. If HdfsScanNode.computeStatsTupleAndConjuncts() can be made generic for all ScanNode implementations in the future, we can apply this same analysis to all kinds of ScanNode and achieve the same reduction. In terms of tracking producer PlanNode, this patch made an exception for Iceberg PlanNodes that handle positional or equality deletion. In that scenario, it is possible to have two ScanNodes sharing the same TupleId to force UnionNode passthrough. Therefore, the UnionNode will be acknowledged as the first producer of that TupleId. This patch also remove some redundant operation in HdfsScanNode. Fixed typo in method name MathUtil.saturatingMultiplyCardinalities(). Testing: - Add new test cases in aggregation.test - Pass core tests. Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8 Reviewed-on: http://gerrit.cloudera.org:8080/22032 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java M fe/src/main/java/org/apache/impala/planner/ScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/java/org/apache/impala/util/MathUtil.java M fe/src/test/java/org/apache/impala/util/MathUtilTest.java M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test M testdata/workloads/functional-planner/queries/PlannerTest/joins.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q27.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q42.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q44.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q66.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q74.test 32 files changed, 7,613 insertions(+), 7,391 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/22032 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8 Gerrit-Change-Number: 22032 Gerrit-PatchSet: 18 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
