Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/24153 )
Change subject: IMPALA-14597: Initial HBO support ...................................................................... Patch Set 14: (11 comments) > Patch Set 13: > > (11 comments) > > This is only a half code review at this point. I'd like to dive in a little > deeper, but figured I'd post what I have for now. Thanks for the review! Addressed these comments first. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java@234 PS13, Line 234: // TODO: support CompoundPredicate like "p = 1 or p = 2". > Reuse IS_NOT_EQ_BINARY_PREDICATE? Done http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java@239 PS13, Line 239: } > nit optional: return arg instanceof InPredicate condenses 4 lines into 1. Done. That looks better. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@319 PS13, Line 319: if (rawPath_ == null) return label_; > Optional for cutesiness: Iterable.getLast(rawPath_) Done http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/catalog/FeTable.java File fe/src/main/java/org/apache/impala/catalog/FeTable.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/catalog/FeTable.java@175 PS13, Line 175: return slotRefs.stream() > Optional for cutesiness: Done. Slightly modified this since the left-hand side of the lambda function should be a variable. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java File fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java@54 PS13, Line 54: private final int errorLevel_; > nit: errorLevel_ Done http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java File fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@75 PS13, Line 75: // is not EXPR_REWRITE (i.e. more aggressive than EXPR_REWRITE). > If I'm understanding this code correctly, this is only called for the two s Yeah, that's better. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@97 PS13, Line 97: List<String> values = new ArrayList<>(); > There are 2 cases here. One that works for BinaryPredicate and one that wo These are something like ExprRewriteRules but return strings and probably need the table metadata for complex cases. I think it's OK to keep them outside Predicate classes. So far we just support two kinds of simple predicates. I'll split this into two methods to make it cleaner. In the future when we support more complex expressions, perhaps making this a visitor is better? http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@128 PS13, Line 128: if (!Expr.IS_NOT_EQ_BINARY_PREDICATE.apply(binPred) > We have this if check in at least 2 places (that I've noticed so far on the Expr.IS_EQ_BINARY_PREDICATE is the closest method but it doesn't allow NOT_DISTINCT. I changed this to use !Expr.IS_NOT_EQ_BINARY_PREDICATE.apply() here. It's a bit awkward since the usage is double negative. I'll check if IS_EQ_BINARY_PREDICATE should allow NOT_DISTINCT. Otherwise we can add a new function for this. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1745 PS13, Line 1745: inputCardinality_ = totalFiles; > nit: return immediate if use_hbo_stats is false. Done http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1031 PS13, Line 1031: if (hashString == null) break; > I think it's better to have a test outside the for loop to do this. It flow Eventually we will replace this with a Preconditions check when HBO supports all PlanNodes, all stats types and all canonicalization strategies. In the current patch, if a PlanNode doesn't support HBO, generateHboHashString() returns null for all stats type and all canonicalization strategies. So we use "break" here. Maybe adding a supportsHBO() method returning a boolean value is more clear? http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1034 PS13, Line 1034: if (hashStrings.contains(hashString)) { > I know there are only 2 elements, but this usage is better with a Set versu The returned list will be iterated using the enum of CanonicalizationStrategy (corresponding to the index), so we do need it to be a list. This deduplication is a small optimization to reduce HBO stats read/write requests. I think so far it's OK since the list is either empty or has only one element here. -- To view, visit http://gerrit.cloudera.org:8080/24153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e Gerrit-Change-Number: 24153 Gerrit-PatchSet: 14 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Mon, 01 Jun 2026 14:20:03 +0000 Gerrit-HasComments: Yes
