Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24041 )

Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables
......................................................................


Patch Set 4:

(11 comments)

Thanks for reivew!
Also fixed testLoadIcebergFileDescriptors in this patch.

http://gerrit.cloudera.org:8080/#/c/24041/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/24041/3//COMMIT_MSG@16
PS3, Line 16: This patch implements
            : offset-based pagination to allow catalogd to truncate the 
response at
            : any point in the file list, not just at partition boundaries.
> Do we also change the corresponding implementation for HdfsTable?
No, it's only for Iceberg.


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@154
PS3, Line 154:
> nit: using Collections.emptyMap() to avoid creating lots of empty maps. I t
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@162
PS3, Line 162:       if (startOffset == 0 && endOffset == fileDescList_.size()) 
return toThrift();
             :
             :       if (reverseMap_ == null) {
             :
> nit: can be simplified as
'reverseMap' has got some backlash already. What about this: change it to a 
field with lazy instantiation. This way it's populated only for container with 
boundary (thanks to above's fast path), and only once.

Applied stream nit.


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@586
PS3, Line 586:
             :   /**
             :    * Count total files in a TIcebergContentFileStore.
             :    */
             :   public static long countContentFiles(TIcebergContentFileStore 
contentFiles) {
             :     if (contentFiles == null) return 0;
             :
             :     return 
contentFiles.getPath_hash_to_data_file_without_deletesSize()
             :         + 
contentFiles.getPath_hash_to_data_file_with_deletesSize()
             :         + contentFiles.getPath_hash_to_position_delete_fileSize()
             :         + 
contentFiles.getPath_hash_to_equality_delete_fileSize();
             :   }
             :
             :   /**
> nit: this can be simplified as
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@892
PS3, Line 892:       if (fileOffset == 0) {
             :         partialContentFiles.setTotal_file_count(totalFiles);
             :       }
> nit: To be consistent, this can be moved into IcebergContentFileStore.toThr
Pls check 
https://gerrit.cloudera.org/c/24041/3..4/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#1282

In latest patch, I had to unset it separately at coordinator's site to be 
consistent with toThrift (fixing the regressed testLoadIcebergFileDescriptors). 
Maybe in that light we want to keep it separate?


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@104
PS3, Line 104: import org.apache.impala.thrift.TNetworkAddress;
> nit: unused import
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1241
PS3, Line 1241:     long filesReceived = 
IcebergContentFileStore.countContentFiles(
> nit: this comment corresponds to L1246. Maybe move it down.
Hmm, it seems accurate. Patch mismatch?


http://gerrit.cloudera.org:8080/#/c/24041/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1286
PS3, Line 1286:   }
              :
              :   @Ov
> nit: stale comment?
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/24041/3/tests/custom_cluster/test_local_catalog.py@726
PS3, Line 726:
> nit: please fix the idention, e.g. moving all parameters to the next line a
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/tests/custom_cluster/test_local_catalog.py@745
PS3, Line 745:     self.assert_impalad_log_contains(
> nit: add a blank line after this
Ack


http://gerrit.cloudera.org:8080/#/c/24041/3/tests/custom_cluster/test_local_catalog.py@821
PS3, Line 821:
> nit: we can use parse_query_id() directly
Thanks



--
To view, visit http://gerrit.cloudera.org:8080/24041
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
Gerrit-Change-Number: 24041
Gerrit-PatchSet: 4
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 11 Mar 2026 17:21:07 +0000
Gerrit-HasComments: Yes

Reply via email to