Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24041 )

Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables
......................................................................


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@153
PS1, Line 153: ir<Map<THash128, THdfsFileDesc>, Long> toT
> nit: This if stmt could be moved to the beginning.
Ack


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@155
PS1, Line 155:       Map<THash128, THdfsFileDesc> ret = new HashMap<>();
> If the whole container is needed, a fast path could copy it without constru
Good point! Implemented


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@159
PS1, Line 159:
> These maps are not modified after construction of IcebergContentFileStore,
Both valid points. But given that Csaba's fast past optimization cuts off the 
worst case and the code will proceed to reverseMap only for split containers, I 
would not add complexity with re-calculating the hash, IYDM.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@480
PS1, Line 480:     if (startOffset < state.currentOffset + containerSize &&
             :         state.filesCollected < maxFiles) {
             :       long localOffset = Math.max(0, startOffset - 
state.currentOffset);
             :       Pair<Map<THash128, THdfsFileDesc>, Long> result =
             :           container.toThriftPartial(localOffset, maxFiles - 
state.filesCollected);
             :       if (!result.first.isEmpty()) {
> Isn't it enough to set these in first request? partitions_ can be pretty la
Yeah, another great point!


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@869
PS1, Line 869:       // iceberg_file_offset. This ensures callers that don't 
support pagination
             :       // always receive the full file set.
             :       boolean paginationRequested = 
req.table_info_selector.isSetIceberg_file_o
> Not sure if cross compatibility between old coordinator and new catalogd is
Agree. Maybe not as much for compatibility, but rather as a way to allow a full 
request along with partial, the consistency check won't hurt.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@888
PS1, Line 888: able
> I am not sure about the severity of this - with current default of 1M it ma
This is the same as: 
https://github.com/apache/Impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2424
Maybe we could address this more generally on logger side? For the furture: 
newer log4j2 seems to have the DuplicateMessageFilter.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1237
PS1, Line 1237:     checkResponse(resp.table_info != null &&
              :         resp.table_info.iceberg_table != null, req,
> Can't we deduce the total number of files from req? It would be better to a
Agree, an extra RPC is an overkill. We can return the total files within the 
first response though.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1277
PS1, Line 1277: Count > 1) {
> I would prefer to move this and mergeIcebergContentFiles to IcebergContentF
Makes sense. Also the same count function can be reused in both catalog and 
coordinator from there.



--
To view, visit http://gerrit.cloudera.org:8080/24041
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
Gerrit-Change-Number: 24041
Gerrit-PatchSet: 2
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Sun, 08 Mar 2026 15:47:07 +0000
Gerrit-HasComments: Yes

Reply via email to