Mihaly Szjatinya has posted comments on this change. ( http://gerrit.cloudera.org:8080/24041 )
Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@153 PS1, Line 153: ir<Map<THash128, THdfsFileDesc>, Long> toT > nit: This if stmt could be moved to the beginning. Ack http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@155 PS1, Line 155: Map<THash128, THdfsFileDesc> ret = new HashMap<>(); > If the whole container is needed, a fast path could copy it without constru Good point! Implemented http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@159 PS1, Line 159: > These maps are not modified after construction of IcebergContentFileStore, Both valid points. But given that Csaba's fast past optimization cuts off the worst case and the code will proceed to reverseMap only for split containers, I would not add complexity with re-calculating the hash, IYDM. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@480 PS1, Line 480: if (startOffset < state.currentOffset + containerSize && : state.filesCollected < maxFiles) { : long localOffset = Math.max(0, startOffset - state.currentOffset); : Pair<Map<THash128, THdfsFileDesc>, Long> result = : container.toThriftPartial(localOffset, maxFiles - state.filesCollected); : if (!result.first.isEmpty()) { > Isn't it enough to set these in first request? partitions_ can be pretty la Yeah, another great point! http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@869 PS1, Line 869: // iceberg_file_offset. This ensures callers that don't support pagination : // always receive the full file set. : boolean paginationRequested = req.table_info_selector.isSetIceberg_file_o > Not sure if cross compatibility between old coordinator and new catalogd is Agree. Maybe not as much for compatibility, but rather as a way to allow a full request along with partial, the consistency check won't hurt. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@888 PS1, Line 888: able > I am not sure about the severity of this - with current default of 1M it ma This is the same as: https://github.com/apache/Impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2424 Maybe we could address this more generally on logger side? For the furture: newer log4j2 seems to have the DuplicateMessageFilter. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1237 PS1, Line 1237: checkResponse(resp.table_info != null && : resp.table_info.iceberg_table != null, req, > Can't we deduce the total number of files from req? It would be better to a Agree, an extra RPC is an overkill. We can return the total files within the first response though. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1277 PS1, Line 1277: Count > 1) { > I would prefer to move this and mergeIcebergContentFiles to IcebergContentF Makes sense. Also the same count function can be reused in both catalog and coordinator from there. -- To view, visit http://gerrit.cloudera.org:8080/24041 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb Gerrit-Change-Number: 24041 Gerrit-PatchSet: 2 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Sun, 08 Mar 2026 15:47:07 +0000 Gerrit-HasComments: Yes
