Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/24153 )
Change subject: IMPALA-14597: Initial HBO support ...................................................................... Patch Set 13: (11 comments) This is only a half code review at this point. I'd like to dive in a little deeper, but figured I'd post what I have for now. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java@234 PS13, Line 234: if (arg instanceof BinaryPredicate) { Reuse IS_NOT_EQ_BINARY_PREDICATE? http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/Expr.java@239 PS13, Line 239: if (arg instanceof InPredicate) { nit optional: return arg instanceof InPredicate condenses 4 lines into 1. But I can see keeping it expanded to put the explicit cases in "if" commands. Your call. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/SlotRef.java File fe/src/main/java/org/apache/impala/analysis/SlotRef.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@319 PS13, Line 319: return ToSqlUtils.getIdentSql(rawPath_.get(rawPath_.size() - 1)); Optional for cutesiness: Iterable.getLast(rawPath_) http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/catalog/FeTable.java File fe/src/main/java/org/apache/impala/catalog/FeTable.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/catalog/FeTable.java@175 PS13, Line 175: for (SlotRef slotRef : slotRefs) { Optional for cutesiness: return slotRefs.stream().anyMatch( r.getDesc().getColumn() -> r != null && r.getPosition() < numPartitionCols); http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java File fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/CanonicalizationStrategy.java@54 PS13, Line 54: private final int errorLevel; nit: errorLevel_ http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java File fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@75 PS13, Line 75: if (strategy == CanonicalizationStrategy.EXPR_REWRITE) { If I'm understanding this code correctly, this is only called for the two strategies right now. I'm not sure of the future usage, but in both cases, we call exprToSqlWithSortedInValues. These can be combined to be a little simpler? So something like: boolean shouldRemoveConstants = false; if (strategy != CanonicalizationStrategy.EXPR_REWRITE) { if (Expr.IS_EQUALITY_PREDICATE.apply(expr)) { ... } } return exprToSqlWithSortedInValues(expr, shouldRemoveConstants); http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@97 PS13, Line 97: private static String exprToSqlWithSortedInValues( There are 2 cases here. One that works for BinaryPredicate and one that works for InPredicate. In a way, to me it make more sense for these to be methods off of Predicate and the code to be in their respective classes. I suppose it's nice to have the Hbo code isolated in one place, but if other predicates are added, this will get a little messy. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@128 PS13, Line 128: if ((binPred.getOp() == BinaryPredicate.Operator.EQ || We have this if check in at least 2 places (that I've noticed so far on the code review) Should we have a function checking for this? I think I use this in Calcite, so perhaps I can benefit from that as well :) http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1745 PS13, Line 1745: if (analyzer.getQueryOptions().use_hbo_stats) { nit: return immediate if use_hbo_stats is false. Everything else then has less indentation. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1031 PS13, Line 1031: if (hashString == null) break; I think it's better to have a test outside the for loop to do this. It flows a little better since you're not testing it for each iteration. Then you can have either a Preconditions check that it is not null or you can skip an iteration that contains null and proceed with others. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1034 PS13, Line 1034: if (hashStrings.contains(hashString)) { I know there are only 2 elements, but this usage is better with a Set versus a List. If ordering matters, then perhaps a LinkedHashSet -- To view, visit http://gerrit.cloudera.org:8080/24153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e Gerrit-Change-Number: 24153 Gerrit-PatchSet: 13 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Sun, 31 May 2026 01:01:25 +0000 Gerrit-HasComments: Yes
