Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/24153 )
Change subject: IMPALA-14597: Initial HBO support ...................................................................... Patch Set 18: (3 comments) Thanks for the review! Addressed the comments. Also updated the HBO annotaion in the query plan to not overwrite the filteredInputCardinality_. http://gerrit.cloudera.org:8080/#/c/24153/17/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java File fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java: http://gerrit.cloudera.org:8080/#/c/24153/17/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@56 PS17, Line 56: public static List<String> canonicalizeScanConjuncts(List<Expr> exprs, FeTable table, > For sufficiently complex expressions, there could be a not trivial cost of Yeah, the cost should be considered. Currently we just deal with some simple expressions without recursion. The cost is mostly converting them to strings (toSql calls) and sorting them. I think the toSql() calls is the dominant cost. We already pay this cost in generating the query plan (in the predicates lines) and the string of "Analyzed query". So probably this cost is acceptable so far. Filed IMPALA-15098 as a follow-up JIRA. To avoid introducing this cost when storing HBO stats is disabled (store_hbo_stats=false), patch set 18 add codes to skip generating the HBO hash strings in such case. http://gerrit.cloudera.org:8080/#/c/24153/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/17/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1947 PS17, Line 1947: // Register this scan node as an input for tuple caching. Agree that not mutating parameters make the code more maintainable. Initially I just extracted these logic into a method so later patches can reuse it in AggregationNode and JoinNode. I didn't think about preference like this. As discussed offline, we will refactor these and probably the toThrift() method in a separate patch. > But a small nit with the current code: populateHboThriftFields returns a > boolean. This is the only caller and it doesn't do anything with the return > value. Did you mean to do something? Or should it be a void? The aggregation support patch uses the return value to log warnings: https://gerrit.cloudera.org/c/24297/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#879 But I think it's unnecessary. I'll move the log to the place that causes HBO not supported for the node. Changed this return type to void. http://gerrit.cloudera.org:8080/#/c/24153/17/testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test File testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test: http://gerrit.cloudera.org:8080/#/c/24153/17/testdata/workloads/functional-query/queries/QueryTest/hbo-collection-scan.test@1 PS17, Line 1: ==== > These tests are great. One suggestion is to add one with sufficient number That's a good point. Added hbo-disjunct-selectivity.test for this. -- To view, visit http://gerrit.cloudera.org:8080/24153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e Gerrit-Change-Number: 24153 Gerrit-PatchSet: 18 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Fri, 12 Jun 2026 15:19:25 +0000 Gerrit-HasComments: Yes
