Hello Quanlong Huang, Aman Sinha, Zoltan Borok-Nagy, Michael Smith, Impala
Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/22032
to look at the new patch set (#3).
Change subject: IMPALA-13086: Lower AggregationNode estimate using stats
predicate
......................................................................
IMPALA-13086: Lower AggregationNode estimate using stats predicate
NDV of a grouping column can be reduced if there is a predicate over
that column. If the predicate is a constant equality predicate or
is-null predicate, then the NDV must be equal to 1. If the predicate is
a simple in-list predicate, the NDV must be the number of items in the
list.
This patch adds such consideration by leveraging existing analysis in
HdfsScanNode.computeStatsTupleAndConjuncts(). It memorizes the first
ScanNode/UnionNode that produces a TupleId in Analyzer, registered
during Init()/computeStats() of the PlanNode. At AggregationNode, it
looks up the PlanNode that produces a TupleId. If the origin PlanNode is
an HdfsScanNode, analyze if any grouping expression is listed in
statsOriginalConjuncts_ and reduce them accordingly. If
HdfsScanNode.computeStatsTupleAndConjuncts() can be made generic for all
ScanNode implementations in the future, we can apply this same analysis
to all kinds of ScanNode and achieve the same reduction.
In terms of tracking producer PlanNode, this patch made an exception for
Iceberg PlanNodes that handle positional or equality deletion. In that
scenario, it is possible to have two ScanNodes sharing the same TupleId
to force UnionNode passthrough. Therefore, the UnionNode will be
acknowledged as the first producer of that TupleId.
This patch also remove some redundant operation in HdfsScanNode.
Testing:
- Add new test cases in aggregation.test
- Pass core tests.
Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-resources.test
M
testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-cardinality-reduction.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q06.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q31.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q39b.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q42.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q44.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q49.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q66.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q96.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q27.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q31.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39a.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q39b.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q42.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q44.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q52.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q66.test
M
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q74.test
37 files changed, 8,499 insertions(+), 8,260 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/22032/3
--
To view, visit http://gerrit.cloudera.org:8080/22032
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia840d68f1c4f126d4e928461ec5c44545dbf25f8
Gerrit-Change-Number: 22032
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>