Surya Hebbar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/24274 )

Change subject: IMPALA-14798: Fix IllegalStateException on COMPUTE STATS on 
Iceberg tables
......................................................................


Patch Set 3:

(2 comments)

Thanks for the review, Noemi! I have left a few replies and a question inline.

http://gerrit.cloudera.org:8080/#/c/24274/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/24274/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2135
PS3, Line 2135: if (msTbl.getParameters().containsKey(CATALOG_SERVICE_ID)) {
              :       props.put(CATALOG_SERVICE_ID, 
msTbl.getParameters().get(CATALOG_SERVICE_ID));
              :     }
              :     if (msTbl.getParameters().containsKey(CATALOG_VERSION)) {
              :       props.put(CATALOG_VERSION, 
msTbl.getParameters().get(CATALOG_VERSION));
              :     }
> These parameters are not independent. They need to be present if event proc
I agree with you conceptually. >From my understanding, they do act as a pair 
for self-event detection.

However, my concern with enforcing that pairing here (or gating it behind 
isEventProcessingEnabled()) is that COMPUTE STATS shouldn't be responsible for 
altering or cleaning up unrelated table properties.

If a table somehow ends up in an inconsistent state where only one of these 
properties exists (e.g., due to an external engine modification, a previous 
bug, or event processing being toggled), enforcing an if (A && B) check would 
cause COMPUTE STATS to silently drop the orphaned property. Similarly, if event 
processing is currently disabled but the properties already exist on the table, 
we shouldn't wipe them out.

Because the goal of this specific code path is just to preserve the existing 
HMS state during an Iceberg commit, I think the safest approach is a strict 
pass-through: if a property exists, we keep it.

This guarantees zero unintended side effects to the table's properties during a 
stats computation. Let me know if you think this makes sense, or if there is an 
edge case I am missing!


http://gerrit.cloudera.org:8080/#/c/24274/3/tests/custom_cluster/test_metadata_no_events_processing.py
File tests/custom_cluster/test_metadata_no_events_processing.py:

http://gerrit.cloudera.org:8080/#/c/24274/3/tests/custom_cluster/test_metadata_no_events_processing.py@342
PS3, Line 342: self.run_stmt_in_hive("create table %s (id int) stored by 
iceberg;" % tbl)
> Can you make sure that CATALOG_SERVICE_ID is missing?
Within the end-to-end tests, I looked for a way to verify this but could not 
find a clean approach.

The only option I could think of was to fetch the JSON of TCatalogObject, 
something like:

CATALOG_TEST_PORT = "25020"
obj_url = "http://localhost:{0}/catalog_object".format(CATALOG_TEST_PORT) + 
"?json&object_type=DATABASE&object_name={0}".format(tbl)
response = requests.get(obj_url)

But in both the JSON and text versions of TCatalogObject, I could only find 
catalog_version, not the catalog_id (CATALOG_SERVICE_ID).

Could you let me know how you had in mind to test this?



--
To view, visit http://gerrit.cloudera.org:8080/24274
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I71c36872bf84c44af8283bab11ce97e60a3e598c
Gerrit-Change-Number: 24274
Gerrit-PatchSet: 3
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 10 Jun 2026 10:03:38 +0000
Gerrit-HasComments: Yes

Reply via email to