Mihaly Szjatinya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24041 )

Change subject: IMPALA-14583: Support partial RPC dispatch for Iceberg tables
......................................................................


Patch Set 3:

(8 comments)

Thanks! See the answers in threads.
Also fixed some regression tests.

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java
File fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java:

http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@149
PS1, Line 149:      * @param startOffset Index to start from (0-based)
> nit: create ret only after fast return path
Ack


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@159
PS1, Line 159: ()) return toThrift();
> Maybe this could be thought through in a follow up patch.
Yeah it's a doubly linked list.


http://gerrit.cloudera.org:8080/#/c/24041/1/fe/src/main/java/org/apache/impala/catalog/IcebergContentFileStore.java@175
PS1, Line 175:     }
> Isn't ret.size() always endOffset - startOffset? if yes, it would be enough
You're right, it's unnecessary.


http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1230
PS2, Line 1230: ithPagin
> With pagination?
Ack


http://gerrit.cloudera.org:8080/#/c/24041/2/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1248
PS2, Line 1248: 
p.table_info.iceberg_table.content_files.isSetTotal_file_count(),
> It could be treated as an exception if total_file_count is not returned on
Done


http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@791
PS2, Line 791:                        "functional.alltypestiny".format(tbl))
             :     exception = self.execute_query_expect_failure(
> How much time does this take? My guess is 150 DMLs would be pretty slow - t
Was almost a minute, and now less than 25s. Is that ok ?


http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@798
PS2, Line 798:     assert err in str(exception)
             :     self.assert_catalogd_log_contains("ERROR", err)
> the query id is not really needed as the database itself will be unique
It's needed becase the latest insert statements themselves trigger metadata 
fetch with the same logs.


http://gerrit.cloudera.org:8080/#/c/24041/2/tests/custom_cluster/test_local_catalog.py@801
PS2, Line 801:   @pytest.mark.execute_serially
             :   @CustomClusterTestSuite.with_args(
             :     impalad_args="--use_local_catalog=true",
             :     catalogd_args="--catalog_topic_mode=minimal 
--catalog_partial_fetch_max_files=20")
             :   def test_incomplete_iceberg_file_list(self, unique_database):
             :     """Test that coordinator can fetch missing Iceberg files 
when catalogd truncates
             :     the file list in the response"""
             :     # Create an Iceberg table with many data files
             :     tbl = unique_database + ".iceberg_tbl"
             :     self.execute_query(
             :         "create table {0} (id int, `data` string) stored as 
iceberg".format(tbl))
             :     # Insert multiple times to create multiple data files
             :     for i in range(30):
             :       self.execute_query("insert into {0} values ({1}, 
'data{1}')".format(tbl, i))
             :
             :     res = self.execute_query_expect_success(self.client, "show 
files in " + tbl)
             :     # Should have 30 files
             :     assert len(res.data) == 30
             :
             :     # Extract query_id from SHOW FILES to precisely check only 
its log messages
             :     query_id = re.search(r"Query \(id=(.*)\):", res.runtime
> Could have a utility function for this to use in both tests
Done



--
To view, visit http://gerrit.cloudera.org:8080/24041
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f2c058b7cc8efc15bac9fe0e91baadbb7b92cbb
Gerrit-Change-Number: 24041
Gerrit-PatchSet: 3
Gerrit-Owner: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Mihaly Szjatinya <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 09 Mar 2026 18:02:06 +0000
Gerrit-HasComments: Yes

Reply via email to