Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/21029 )
Change subject: IMPALA-12543: Detect self-events before finishing DDL ...................................................................... Patch Set 14: (12 comments) Took a second round of review and found there are several usages of addVersionsForInflightEvents() that are unchanged. I think some need changes and some can be dealt with in future JIRAs to limit the scope of this patch. Also found we have some cases that adding versions to the list without firing HMS RPCs, which leads to dangling items in the list. Filed IMPALA-12884 to add GC to the list. http://gerrit.cloudera.org:8080/#/c/21029/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21029/14//COMMIT_MSG@48 PS14, Line 48: Testing: Currently, we just have flaky tests like test_iceberg_self_events that can cover this fix. It'd be nice if we can add a test that always fails without this fix, probably by injecting a sleep before the above step-7. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1203 PS14, Line 1203: LOG.info("Added catalog version {} in database's {} in-flight events", : versionNumber, db.getName()); Let's fix this as well, i.e. only log this when the version is added for the db. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1054 PS14, Line 1054: isInsertEvent_ = isInsertEvent; 'isInsertEvent' is always used as false. I think we can't track the inflight INSERT events before we firing them since we need the HMS RPC to return the event ids. Probably we can remove this field and replace the usages with 'false'. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1080 PS14, Line 1080: Preconditions.checkState(inflightEventRegistered_); nit: please add an error message. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1221 PS14, Line 1221: Get the new table object with an updated catalog : // version. nit: "Update the table object with the new catalog version" http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1248 PS14, Line 1248: Get the table object : // with an updated catalog version. If the partition does not exist and : // "IfExists" is true, null is returned. nit: this comment is stale http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1376 PS14, Line 1376: modification.hasInProgressModification() nit: maybe 'modification.isInProgress()' sounds better http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1592 PS14, Line 1592: catalog_.addVersionsForInflightEvents(false, tbl, newCatalogVersion); Can we remove this if the Iceberg operations won't update HMS? It also seems buggy that we don't do this when 'needsToUpdateHms' is true. We can investigate this in a separate JIRA if you think it's a new area. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2650 PS14, Line 2650: catalog_.addVersionsForInflightEvents(false, table, newCatalogVersion); This is unchanged yet. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3239 PS14, Line 3239: catalog_.addVersionsForInflightEvents(false, table, newCatalogVersion); This is also unchanged but it's a bit complex - we are not always triggering HMS RPCs. It depends on whether the table is being replicated by Hive and whether params.isDelete_stats() is true. It's an exising issue that we might add a catalog version without a corresponding event. So the version won't be cleared in the inFlight event list anymore. Filed IMPALA-12884 http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@5512 PS14, Line 5512: catalog_.addVersionsForInflightEvents(false, result.second, newCatalogVersion); This is also unchanged for RENAME but I think we can't do it before the HMS RPC since the new table is not created in catalog cache at that time. There is a chance (though rare) that the RENAME event is processed in parallel when the current thread is still running before L5494. Then 'result' become a Pair of nulls and catalogd returns the operation with the ImpalaRuntimeException at L5506. This is also complex and I think we can deal with it in a separate JIRA. http://gerrit.cloudera.org:8080/#/c/21029/14/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@7242 PS14, Line 7242: catalog_.addVersionsForInflightEvents(false, table, newCatalogVersion); This is also unchanged for inserting Iceberg tables. -- To view, visit http://gerrit.cloudera.org:8080/21029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8365c934349ad21a4d9327fc11594d2fc3445f79 Gerrit-Change-Number: 21029 Gerrit-PatchSet: 14 Gerrit-Owner: Riza Suminto <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Sai Hemanth Gantasala <[email protected]> Gerrit-Comment-Date: Fri, 08 Mar 2024 08:28:00 +0000 Gerrit-HasComments: Yes
