Quanlong Huang has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/24297 )
Change subject: IMPALA-14600: Support HBO for AggregationNode cardinality ...................................................................... IMPALA-14600: Support HBO for AggregationNode cardinality This extends HBO to support tracking and using cardinality of AggregationNodes. The HBO key string of AggregationNode consists of - Logical AggPhase: FIRST, SECOND, TRANSPOSE. - isPreagg, isGroupingSet: true/false. - For each AggClass: canonicalized grouping exprs. - AggClasses are sorted by their key strings. - Conjuncts in HAVING clause. - The HBO Key string of the real child (explained below). Some fields are ignored since they decide how the output should be processed and are unrelated to the output cardinality of current node, e.g. isDistinctAgg, needsFinalize. Also adds a concept of cardinality-preserving nodes for nodes that always have inputCardinality == outputCardinality, e.g. SortNode, ExchangeNode, AnalyticEvalNode. These nodes can be ignored in generating the HBO key strings. "Ignored" here means using the key string of their child which acts like they don't exist. When adding the child key of AggregationNode, intermediate agg nodes nodes that belong to the same logical aggregation and cardinality preserving nodes are ignored. We use the "real" child of the aggregation which currently can only be a scan node or another aggregation (from different multiAggInfo_ instance). Note that HBO doesn't support other node types like JoinNode yet. This is not just an optimization to simplify (and shorten) the HBO key string, but also a correctness requirement since intermediate agg nodes and ExchangeNodes could be added after the cardinality is computed. This ensures the HBO key string is consistent in both the SingleNodePlanner and DistributedPlanner. Testing - Added e2e tests in test_hbo.py - Updated golden test files of scan cardinality due to some agg nodes now also have HBO stats. Assisted-by: Claude Code 4.6 Assisted-by: Composer 2 Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce --- M fe/src/main/java/org/apache/impala/planner/AggregationNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/ExchangeNode.java M fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java M testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test A testdata/workloads/functional-query/queries/QueryTest/hbo-distinct-agg.test A testdata/workloads/functional-query/queries/QueryTest/hbo-grouping-set.test M testdata/workloads/functional-query/queries/QueryTest/hbo-iceberg-scan.test A testdata/workloads/functional-query/queries/QueryTest/hbo-single-agg.test M testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-no-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-nonpartitioned-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-no-stats.test M testdata/workloads/functional-query/queries/QueryTest/hbo-single-scan-partitioned-stats.test M tests/query_test/test_hbo.py 17 files changed, 719 insertions(+), 78 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/24297/2 -- To view, visit http://gerrit.cloudera.org:8080/24297 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie0fafaf9d827f3bf533b1af7e62fdb2303c126ce Gerrit-Change-Number: 24297 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]>
