Surya Hebbar has posted comments on this change. ( http://gerrit.cloudera.org:8080/24274 )
Change subject: IMPALA-14798: Fix IllegalStateException on COMPUTE STATS on Iceberg tables ...................................................................... Patch Set 3: (2 comments) Thanks for the review, Noemi! I have left a few replies and a question inline. http://gerrit.cloudera.org:8080/#/c/24274/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/24274/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2135 PS3, Line 2135: if (msTbl.getParameters().containsKey(CATALOG_SERVICE_ID)) { : props.put(CATALOG_SERVICE_ID, msTbl.getParameters().get(CATALOG_SERVICE_ID)); : } : if (msTbl.getParameters().containsKey(CATALOG_VERSION)) { : props.put(CATALOG_VERSION, msTbl.getParameters().get(CATALOG_VERSION)); : } > These parameters are not independent. They need to be present if event proc I agree with you conceptually. >From my understanding, they do act as a pair for self-event detection. However, my concern with enforcing that pairing here (or gating it behind isEventProcessingEnabled()) is that COMPUTE STATS shouldn't be responsible for altering or cleaning up unrelated table properties. If a table somehow ends up in an inconsistent state where only one of these properties exists (e.g., due to an external engine modification, a previous bug, or event processing being toggled), enforcing an if (A && B) check would cause COMPUTE STATS to silently drop the orphaned property. Similarly, if event processing is currently disabled but the properties already exist on the table, we shouldn't wipe them out. Because the goal of this specific code path is just to preserve the existing HMS state during an Iceberg commit, I think the safest approach is a strict pass-through: if a property exists, we keep it. This guarantees zero unintended side effects to the table's properties during a stats computation. Let me know if you think this makes sense, or if there is an edge case I am missing! http://gerrit.cloudera.org:8080/#/c/24274/3/tests/custom_cluster/test_metadata_no_events_processing.py File tests/custom_cluster/test_metadata_no_events_processing.py: http://gerrit.cloudera.org:8080/#/c/24274/3/tests/custom_cluster/test_metadata_no_events_processing.py@342 PS3, Line 342: self.run_stmt_in_hive("create table %s (id int) stored by iceberg;" % tbl) > Can you make sure that CATALOG_SERVICE_ID is missing? Within the end-to-end tests, I looked for a way to verify this but could not find a clean approach. The only option I could think of was to fetch the JSON of TCatalogObject, something like: CATALOG_TEST_PORT = "25020" obj_url = "http://localhost:{0}/catalog_object".format(CATALOG_TEST_PORT) + "?json&object_type=DATABASE&object_name={0}".format(tbl) response = requests.get(obj_url) But in both the JSON and text versions of TCatalogObject, I could only find catalog_version, not the catalog_id (CATALOG_SERVICE_ID). Could you let me know how you had in mind to test this? -- To view, visit http://gerrit.cloudera.org:8080/24274 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I71c36872bf84c44af8283bab11ce97e60a3e598c Gerrit-Change-Number: 24274 Gerrit-PatchSet: 3 Gerrit-Owner: Surya Hebbar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]> Gerrit-Reviewer: Surya Hebbar <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 10 Jun 2026 10:03:38 +0000 Gerrit-HasComments: Yes
