Steve Carlin has posted comments on this change. ( http://gerrit.cloudera.org:8080/24153 )
Change subject: IMPALA-14597: Initial HBO support ...................................................................... Patch Set 13: (10 comments) I started doing a second pass code review and I saw you responded to a first pass. So this post is a bit of a mix between responding to what was there and some new comments. Still need to look at this more today to understand the complete flow. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java File fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@97 PS13, Line 97: private static String exprToSqlWithSortedInValues( > These are something like ExprRewriteRules but return strings and probably n I was thinking about this a little more and I'll give you one more argument to put it in Expr... A createCanonicalizedSql makes sense as a generic thing for Expr objects because there should be a canonicalized form form for everything. Having said that... I really don't have a problem with this either, so let's leave it as/is. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@128 PS13, Line 128: if ((binPred.getOp() == BinaryPredicate.Operator.EQ || > Expr.IS_EQ_BINARY_PREDICATE is the closest method but it doesn't allow NOT_ I hear ya. Double negative is weird. Almost makes me want to get rid of the original, make it a positive, and put the negative on the outside of the other ones. Of course, that would involve changing code outside the scope of this commit and adds a slight risk. Leaving as/is is fine. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1944 PS13, Line 1944: public void appendInputStats(TPlanNodeRun execStats) { Can we do the instantiation of TPlanNodeRun here and have this method return a TPlanNodeRun? http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1031 PS13, Line 1031: if (hashString == null) break; > Eventually we will replace this with a Preconditions check when HBO support Hmmm...now that I think about it more...why not just put in the Preconditions now? I still need to do a more complete code review to understand this, but a couple of layers back, the only way this gets called is through: msg.setHbo_hash_keys(generateAllHboHashKeys()); ...which only exists in HdfsScanNode. Would this ever be called by a node that doesn't support this? You'll need to add the caller in JoinNode when that gets supported. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1034 PS13, Line 1034: if (hashStrings.contains(hashString)) { > The returned list will be iterated using the enum of CanonicalizationStrate Ack http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java File fe/src/main/java/org/apache/impala/service/HistoricalStats.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@40 PS13, Line 40: int concurrencyLevel = BackendConfig.INSTANCE != null optional nit: BackendConfig.INSTANCE is checked inline here, but the same calls are in a method for getSimilarityThreshold() and getMaxRunsPerKey(), so this is inconsistent. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@65 PS13, Line 65: writeScanStats(runWithKeys.run, runWithKeys.hash_keys, runWithKeys.stats_type); Curious as to why you are passing all the individual members of the structure instead of the whole structure. Is this because you anticipate TPlanNodeRunWithKeys will be dealt with for other PlanNodes? If we're gonna do that though, maybe there should be a specific thrift level for the scan node and one for common items? You prolly have thought this through better than I have now. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@96 PS13, Line 96: if (sa.isSetInput_file_size() && sb.isSetInput_file_size()) { nit: I think this would look better if you did this like the lines above. That is: immediately return false if this condition is false, and then the rest of the logic would not be indented. http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@190 PS13, Line 190: Preconditions.checkArgument(!currRun.getScan_input_stats().isEmpty()); We can do a stronger assert here that size == 1? http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java File fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java: http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java@39 PS13, Line 39: public HistoricalStatsValue(T initialRun) { If we're doing a lot of removes, maybe a LinkedList would be better to prevent O(n) removals? -- To view, visit http://gerrit.cloudera.org:8080/24153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e Gerrit-Change-Number: 24153 Gerrit-PatchSet: 13 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Steve Carlin <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Comment-Date: Mon, 01 Jun 2026 15:38:10 +0000 Gerrit-HasComments: Yes
