Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/24297

to look at the new patch set (#3).

Change subject: IMPALA-14600: Support HBO for AggregationNode cardinality
......................................................................

IMPALA-14600: Support HBO for AggregationNode cardinality

This extends HBO to support tracking and using cardinality of
AggregationNodes.

Frontend Changes
----------------
The HBO key string of AggregationNode consists of
 - Logical AggPhase: FIRST, SECOND, TRANSPOSE.
 - isPreagg, isGroupingSet: true/false.
 - For each AggClass: canonicalized grouping exprs.
   - AggClasses are sorted by their key strings.
 - Conjuncts in HAVING clause.
 - The HBO Key string of the real child (explained below).

Some fields are ignored since they decide how the output should be
processed and are unrelated to the output cardinality of current node,
e.g. isDistinctAgg, needsFinalize.

Also adds a concept of cardinality-preserving nodes for nodes that
always have inputCardinality == outputCardinality, e.g. SortNode,
ExchangeNode, AnalyticEvalNode. These nodes can be ignored in generating
the HBO key strings. "Ignored" here means using the key string of their
child which acts like they don't exist.

When adding the child key of AggregationNode, intermediate agg nodes
nodes that belong to the same logical aggregation and cardinality
preserving nodes are ignored. We use the "real" child of the aggregation
which currently can only be a scan node or another aggregation
(from different multiAggInfo_ instance). Note that HBO doesn't support
other node types like JoinNode yet.

This is not just an optimization to simplify (and shorten) the HBO key
string, but also a correctness requirement since intermediate agg nodes
and ExchangeNodes could be added after the cardinality is computed. This
ensures the HBO key string is consistent in both the SingleNodePlanner
and DistributedPlanner.

Backend Changes
---------------
Most of the backend logic remains unchanged except for checking nodes
that have effective external runtime filters under the subtree.
"External" here means the runtime filter is generated outside the
subtree.

If a scan node has effective runtime filters, we travel through its
parents recursively until the source node of the runtime filter. These
nodes except the source node are marked as having effective external
runtime filters. Their cardinalities are unstable and won't be stored
into the HBO stats.

To help this traveling, FE passes a list of parent ids for all the plan
nodes to the Backend.

Testing
 - Added e2e tests in test_hbo.py
 - Updated golden test files of scan cardinality due to some agg nodes
   now also have HBO stats.

Assisted-by: Claude Code 4.6
Assisted-by: Composer 2
Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce
---
M be/src/service/impala-server.cc
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java
M fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-distinct-agg.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-grouping-set.test
M testdata/workloads/functional-query/queries/QueryTest/hbo-iceberg-scan.test
A testdata/workloads/functional-query/queries/QueryTest/hbo-single-agg.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-no-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-no-stats.test
M 
testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-stats.test
M tests/query_test/test_hbo.py
20 files changed, 803 insertions(+), 91 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/24297/3
--
To view, visit http://gerrit.cloudera.org:8080/24297
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce
Gerrit-Change-Number: 24297
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to