Steve Carlin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24153 )

Change subject: IMPALA-14597: Initial HBO support
......................................................................


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/24153/13/common/thrift/HBO.thrift
File common/thrift/HBO.thrift:

http://gerrit.cloudera.org:8080/#/c/24153/13/common/thrift/HBO.thrift@47
PS13, Line 47:   1: optional list<TScanInputStats> scan_input_stats
Maybe not for this release, but something for the future:

If TScanInputStats is on the Scan level, then this information is gonna be 
duplicated across a lot of different runs.  Maybe this should be handled with 
an extra pointer structure to save on some space if that becomes an issue.

Of course, that might make it more difficult to maintain and clean up, so I 
think it's fine for now.


http://gerrit.cloudera.org:8080/#/c/24153/13/common/thrift/HBO.thrift@58
PS13, Line 58:   2: required list<string> hash_keys
I just noticed some code in HistoricalStats.getNumOutputScan that specifically 
relies on EXPR_REWRITE being at position 0:

if (firstInputRows < 0 && i > 0) break;

This association is a little precarious.  Theoretically, someone might decide 
to alphabetize the enums.  Or perhaps there will be some plan nodes where 
certain hash keys apply and some don't, and then we will have holes in the list.

I guess what I'm getting at here:  Would it be better to have this as a map and 
bring the enum values into thrift?  That would clean up that above "if" 
statement and make it cleaner.


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java
File fe/src/main/java/org/apache/impala/service/HistoricalStats.java:

http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@60
PS13, Line 60:         : 100;
I may still be misunderstanding this...

Why do we need to track so many runs?  Does the catalog_version change often? 
(I'm not too familiar with that part of the code).  And if it changes, why do 
we need to keep track of the previous version (or at least more than two)?

For scan input rows:  Also once it changes, does it even make sense to keep a 
historical value?

Or am I not thinking outside the box here and this will matter more with a 
later commit?


http://gerrit.cloudera.org:8080/#/c/24153/13/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@126
PS13, Line 126:   private int foundSimilarRunWithNumRows(List<TPlanNodeRun> 
runs, TPlanNodeRun currRun) {
Nit:  I think I like the name getSimilarRunWithNumRowsIndex better?  What do 
you think?

Same for the other "found" method.



--
To view, visit http://gerrit.cloudera.org:8080/24153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e
Gerrit-Change-Number: 24153
Gerrit-PatchSet: 13
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Steve Carlin <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Mon, 01 Jun 2026 16:38:22 +0000
Gerrit-HasComments: Yes

Reply via email to