Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24153 )

Change subject: IMPALA-14597: Initial HBO support
......................................................................


Patch Set 13:

(10 comments)

I started doing a second pass code review and I saw you responded to a first 
pass.

So this post is a bit of a mix between responding to what was there and some 
new comments.

Still need to look at this more today to understand the complete flow.

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java
File fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@97
PS13, Line 97:   private static String exprToSqlWithSortedInValues(
> These are something like ExprRewriteRules but return strings and probably n
I was thinking about this a little more and I'll give you one more argument to 
put it in Expr...

A createCanonicalizedSql makes sense as a generic thing for Expr objects 
because there should be a canonicalized form form for everything.

Having said that...  I really don't have a problem with this either, so let's 
leave it as/is.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/ExprCanonicalizer.java@128
PS13, Line 128:       if ((binPred.getOp() == BinaryPredicate.Operator.EQ ||
> Expr.IS_EQ_BINARY_PREDICATE is the closest method but it doesn't allow NOT_
I hear ya.  Double negative is weird.

Almost makes me want to get rid of the original, make it a positive, and put 
the negative on the outside of the other ones.  Of course, that would involve 
changing code outside the scope of this commit and adds a slight risk.

Leaving as/is is fine.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1944
PS13, Line 1944:   public void appendInputStats(TPlanNodeRun execStats) {
Can we do the instantiation of TPlanNodeRun here and have this method return a 
TPlanNodeRun?


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1031
PS13, Line 1031:       if (hashString == null) break;
> Eventually we will replace this with a Preconditions check when HBO support
Hmmm...now that I think about it more...why not just put in the Preconditions 
now?

I still need to do a more complete code review to understand this, but a couple 
of layers back, the only way this gets called is through:

msg.setHbo_hash_keys(generateAllHboHashKeys());

...which only exists in HdfsScanNode.  Would this ever be called by a node that 
doesn't support this?  You'll need to add the caller in JoinNode when that gets 
supported.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1034
PS13, Line 1034:       if (hashStrings.contains(hashString)) {
> The returned list will be iterated using the enum of CanonicalizationStrate
Ack


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java
File fe/src/main/java/org/apache/impala/service/HistoricalStats.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@40
PS13, Line 40:     int concurrencyLevel = BackendConfig.INSTANCE != null
optional nit: BackendConfig.INSTANCE is checked inline here, but the same calls 
are in a method for getSimilarityThreshold() and getMaxRunsPerKey(), so this is 
inconsistent.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@65
PS13, Line 65:       writeScanStats(runWithKeys.run, runWithKeys.hash_keys, 
runWithKeys.stats_type);
Curious as to why you are passing all the individual members of the structure 
instead of the whole structure.  Is this because you anticipate 
TPlanNodeRunWithKeys will be dealt with for other PlanNodes?

If we're gonna do that though, maybe there should be a specific thrift level 
for the scan node and one for common items?  You prolly have thought this 
through better than I have now.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@96
PS13, Line 96:     if (sa.isSetInput_file_size() && sb.isSetInput_file_size()) {
nit: I think this would look better if you did this like the lines above.  That 
is: immediately return false if this condition is false, and then the rest of 
the logic would not be indented.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@190
PS13, Line 190:     
Preconditions.checkArgument(!currRun.getScan_input_stats().isEmpty());
We can do a stronger assert here that size == 1?


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java
File fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStatsValue.java@39
PS13, Line 39:   public HistoricalStatsValue(T initialRun) {
If we're doing a lot of removes, maybe a LinkedList would be better to prevent 
O(n) removals?



--
To view, visit http://gerrit.cloudera.org:8080/24153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e
Gerrit-Change-Number: 24153
Gerrit-PatchSet: 13
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Mon, 01 Jun 2026 15:38:10 +0000
Gerrit-HasComments: Yes

Reply via email to