Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/24041 )
Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java: http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@149 PS1, Line 149: * @param startOffset Index to start from (0-based) nit: create ret only after fast return path http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@159 PS1, Line 159: > Both valid points. But given that Csaba's fast past optimization cuts off t Maybe this could be thought through in a follow up patch. Having a LinkedHashMap for fileDescMap_ instead of a separate list and map could make this a lot simpler, but my understanding is that would lead to and extra reference per file. http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@175 PS1, Line 175: // Iterate through the list range Isn't ret.size() always endOffset - startOffset? if yes, it would be enough to return the map http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1230 PS2, Line 1230: ithRetry With pagination? http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1248 PS2, Line 1248: resp.table_info.iceberg_table.content_files.isSetTotal_file_count() It could be treated as an exception if total_file_count is not returned on the first call. http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py File tests/custom_cluster/test_local_catalog.py: http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@791 PS2, Line 791: for i in range(150): : self.execute_query("insert into {0} values ({1}, 'data{1}')".format(tbl, i)) How much time does this take? My guess is 150 DMLs would be pretty slow - the test could use less files and smaller catalog_partial_fetch_max_files http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@798 PS2, Line 798: # Extract query_id from SHOW FILES to precisely check only its log messages : query_id = re.search(r"Query \(id=(.*)\):", res.runtime_profile).group(1) the query id is not really needed as the database itself will be unique http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@801 PS2, Line 801: # Check catalogd logged the truncation warning for final SHOW FILES request : # (limit=100, so returns 100/150 on first request with offset 0) : self.assert_catalogd_log_contains( : "WARNING", r"{0}\] Returning 100/150 files for Iceberg table {1} \(offset 0\). " : "Coordinator will fetch remaining files in follow-up requests.".format( : query_id, tbl)) : : # Check coordinator logged the intermediate fetch : # (request 1 - the initial request returns 100) : self.assert_impalad_log_contains( : "INFO", : r"{0}\] Fetched 100 files for Iceberg table TableMetaRef {1}@\d+ " : r"in request 1. Sending follow-up request for more files.".format( : query_id, tbl)) : : # Check coordinator logged fetching all files in multiple requests : # TableMetaRef format: "TableMetaRef db.table@version" : self.assert_impalad_log_contains( : "INFO", : r"{0}\] Fetched 150 total files for Iceberg table TableMetaRef " : r"{1}@\d+ across 2 requests".format(query_id, tbl)) Could have a utility function for this to use in both tests -- To view, visit http://gerrit.cloudera.org:8080/24041 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb Gerrit-Change-Number: 24041 Gerrit-PatchSet: 2 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 09 Mar 2026 07:59:53 +0000 Gerrit-HasComments: Yes
