Quanlong Huang has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/24297 )

Change subject: IMPALA-14600: Support HBO for AggregationNode cardinality
......................................................................

IMPALA-14600: Support HBO for AggregationNode cardinality

This extends HBO to support tracking and using cardinality of
AggregationNodes. The HBO key string of AggregationNode consists of
 - Logical AggPhase: FIRST, SECOND, TRANSPOSE.
 - isPreagg, isGroupingSet: true/false.
 - For each AggClass: canonicalized grouping exprs.
   - AggClasses are sorted by their key strings.
 - Conjuncts in HAVING clause.
 - The HBO Key string of the real child (explained below).

Some fields are ignored since they decide how the output should be
processed and are unrelated to the output cardinality of current node,
e.g. isDistinctAgg, needsFinalize.

Also adds a concept of cardinality-preserving nodes for nodes that
always have inputCardinality == outputCardinality, e.g. SortNode,
ExchangeNode, AnalyticEvalNode. These nodes can be ignored in generating
the HBO key strings. "Ignored" here means using the key string of their
child which acts like they don't exist.

When adding the child key of AggregationNode, intermediate agg nodes
nodes that belong to the same logical aggregation and cardinality
preserving nodes are ignored. We use the "real" child of the aggregation
which currently can only be a scan node or another aggregation
(from different multiAggInfo_ instance). Note that HBO doesn't support
other node types like JoinNode yet.

This is not just an optimization to simplify (and shorten) the HBO key
string, but also a correctness requirement since intermediate agg nodes
and ExchangeNodes could be added after the cardinality is computed. This
ensures the HBO key string is consistent in both the SingleNodePlanner
and DistributedPlanner.

Testing
 - Added e2e tests in test_hbo.py
 - Updated golden test files of scan cardinality due to some agg nodes
   now also have HBO stats.

Assisted-by: Claude Code 4.6
Assisted-by: Composer 2
Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce
---
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java
M testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-distinct-agg.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-grouping-set.test
M testdata/workloads/functional-query/queries/QueryTest/hbo-iceberg-scan.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-single-agg.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-no-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-no-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-stats.test
M tests/query_test/test_hbo.py
17 files changed, 719 insertions(+), 78 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/24297/2
--
To view, visit http://gerrit.cloudera.org:8080/24297
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce
Gerrit-Change-Number: 24297
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>

Reply via email to