Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/24426
to look at the new patch set (#2).
Change subject: IMPALA-14601: Support HBO for JoinNode cardinality
......................................................................
IMPALA-14601: Support HBO for JoinNode cardinality
This extends HBO to support tracking and using cardinality of JoinNodes.
Only frontend changes are needed.
HBO Key String
--------------
The HBO key string for a JoinNode consists of the following info:
- JoinOp, e.g. LEFT_OUTER_JOIN, LEFT_SEMI_JOIN, LEFT_ANTI_JOIN, etc.
- Join conjuncts.
- HBO key strings of the left and right children.
- Optional WHERE conjuncts.
Right-handed joins like RIGHT_OUTER_JOIN are inverted in the HBO key
string so a plan and its inverted counterpart hash identically. For
instance, these two JoinNodes have the same HBO key string:
LeftOuterJoin RightOuterJoin
/ \ / \
A B B A
For INNER/CROSS joins that the join order doesn't impact the final
cardinality, the HBO key string is optimized to be independent with the
join order. For each of such JoinNode, a maximal contiguous group of
such joins are collected and flatten into sorted operand strings in the
HBO key string. Then added all the conjuncts. For instance, the
following two plans has the same HBO key string in the top level
JoinNode:
InnerJoin InnerJoin
/ \ / \
InnerJoin C InnerJoin B
/ \ / \
A B A C
Conjuncts are canonicalized in the same way as in HdfsScanNode and
AggregationNode, except that column names are qualified using a
connicalized alias "op<idx>" where idx is the index of the operand in
the sorted operand list. E.g. the following two queries have the same
conjunct string:
select count(*) from alltypes a join alltypestiny b on a.id = b.int_col;
select count(*) from alltypestiny a join alltypes b on a.id = b.int_col;
But their canonicalized conjunct strings are different:
1) op1.id = op0.int_col
2) op0.id = op1.int_col
Note that alltypestiny is operand 0 and alltypes is operand 1.
Historical Runs Matching
------------------------
Each JoinNode collects all the leaf scan input stats inside its subtree.
For order-independent JoinNodes, these scan input stats are sorted using
the same order of the sorted operands. Otherwise, these scan input stats
are append from left to right children.
While matching a historical run, element-wise comparison of all
scan_input_stats entries is performed to ensure they are similar
respectively. Take the following JoinNode runs as an example.
JoinNode JoinNode
/ \ / \
A B A' B'
The matching requires similar(A, A') && similar(B, B'). Similarity
checking of a scan node pair is the same as previous patches.
Testing
- Added FE tests on the HBO key strings.
- Added e2e tests.
Assisted-by: Opus 4.8 (Claude Code)
Change-Id: I70b655ae7027d0d9eb8e9fae9ba2e1b7ad9876b4
---
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergDeleteJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/service/HistoricalStats.java
M fe/src/test/java/org/apache/impala/planner/HboKeyStringTest.java
A testdata/workloads/functional-query/queries/QueryTest/hbo-join.test
M testdata/workloads/functional-query/queries/QueryTest/hbo-multiple-scans.test
M tests/query_test/test_hbo.py
12 files changed, 914 insertions(+), 43 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/24426/2
--
To view, visit http://gerrit.cloudera.org:8080/24426
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I70b655ae7027d0d9eb8e9fae9ba2e1b7ad9876b4
Gerrit-Change-Number: 24426
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>