Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24153 )

Change subject: IMPALA-14597: Initial HBO support
......................................................................


Patch Set 3:

(4 comments)

Tried to get the big picture, only went to details about the storage part.

http://gerrit.cloudera.org:8080/#/c/24153/3/common/thrift/HBO.thrift
File common/thrift/HBO.thrift:

http://gerrit.cloudera.org:8080/#/c/24153/3/common/thrift/HBO.thrift@34
PS3, Line 34: TPlanNodeRun
If this is cached in deserialized format, then it seems better to me to use 
required fields with some default value to avoid the extra "set" members.
Also, if this is not written to files, why not use simple java objects?


http://gerrit.cloudera.org:8080/#/c/24153/3/fe/src/main/java/org/apache/impala/service/HistoricalStats.java
File fe/src/main/java/org/apache/impala/service/HistoricalStats.java:

http://gerrit.cloudera.org:8080/#/c/24153/3/fe/src/main/java/org/apache/impala/service/HistoricalStats.java@124
PS3, Line 124:     // TODO: handle races from concurrent writers.
This doesn't look hard using asMap().compute() on the map.
As the serialization is fast, we don't need to worry much about blocking.


http://gerrit.cloudera.org:8080/#/c/24153/3/fe/src/main/java/org/apache/impala/service/InMemoryCacheBackend.java
File fe/src/main/java/org/apache/impala/service/InMemoryCacheBackend.java:

http://gerrit.cloudera.org:8080/#/c/24153/3/fe/src/main/java/org/apache/impala/service/InMemoryCacheBackend.java@73
PS3, Line 73: weighHistoricalStatsValue
It seems simpler to store the stats in serialized format, which would make 
weighing trivial. Or that would add significant overhead?


http://gerrit.cloudera.org:8080/#/c/24153/3/fe/src/main/java/org/apache/impala/service/InMemoryCacheBackend.java@97
PS3, Line 97:     this(4, 1024L * 1024 * 1024);
This default seems too large to me. The danger I see is that the JVM limit is 
not large enough, then this will very slowly fill up and lead to hard to 
diagnose issues.
My preference would be to have a unified size limited cache in the coordinator 
(it doesn't seem hard to move the cache implementation itself out of 
CatalogMetaProvider). Another way would be to use soft references here.



--
To view, visit http://gerrit.cloudera.org:8080/24153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ff60a8bd22c13c0ecad1198934cc96249b1015e
Gerrit-Change-Number: 24153
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 31 Mar 2026 15:05:45 +0000
Gerrit-HasComments: Yes

Reply via email to