Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24041 )

Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@155
PS1, Line 155:       }
If the whole container is needed, a fast path could copy it without 
constructing the reverseMap.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@480
PS1, Line 480:     // Always include metadata flags and partitions
             :     ret.setHas_avro(hasAvro_);
             :     ret.setHas_orc(hasOrc_);
             :     ret.setHas_parquet(hasParquet_);
             :     ret.setMissing_files(new ArrayList<>(missingFiles_));
             :     ret.setPartitions(convertPartitionMapToList(partitions_));
Isn't it enough to set these in first request? partitions_ can be pretty large.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@869
PS1, Line 869:       long fileOffset = 
req.table_info_selector.isSetIceberg_file_offset()
             :           ? req.table_info_selector.iceberg_file_offset : 0;
             :       long fileLimit = 
BackendConfig.INSTANCE.getCatalogPartialFetchMaxFiles();
Not sure if cross compatibility between old coordinator and new catalogd is a 
goal here, but if yes, then we shouldn't enforce the fileLimit in case 
fileOffset is not set to avoid returning partial results.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@888
PS1, Line 888: warn
I am not sure about the severity of this - with current default of 1M it makes 
sense, but someone may decrease it, leading to warning spew.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1237
PS1, Line 1237:     // Keep requesting until we get 0 files back (reached the 
end)
              :     // The catalogd enforces the file limit per response
Can't we deduce the total number of files from req? It would be better to avoid 
the extra RPC when possible. Besides the perf cost this also increased the 
chance of InconsistentMetadataFetchException


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1277
PS1, Line 1277: countIcebergContentFiles
I would prefer to move this and mergeIcebergContentFiles to 
IcebergContentFileStore as static files. The reason is that 
CatalogdMetaProvider.java is already very large and complicated.



--
To view, visit http://gerrit.cloudera.org:8080/24041
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
Gerrit-Change-Number: 24041
Gerrit-PatchSet: 1
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 02 Mar 2026 18:16:43 +0000
Gerrit-HasComments: Yes

Reply via email to