Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16602 )
Change subject: IMPALA-10243: ConcurrentModificationException during parallel INSERTs ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16602/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16602/1//COMMIT_MSG@9 PS1, Line 9: ConcurrentModificationException Can you add the full callstack somewhere,e.g. in the Jira? It would be good to see where it was thrown exactly. http://gerrit.cloudera.org:8080/#/c/16602/1//COMMIT_MSG@15 PS1, Line 15: This can potentially happen in CatalogOpExecutor.updateCatalog() which : updates the catalog version and unsets table column statistics. tldr: I would prefer to threat msTable as immutable and deepcopy it where we modify it, not where we just read it This happens in unsetTableColStats(), right? IMO the real bug is not in toThrift(), but in updateCatalog(), as it calls unsetTableColStats with the "original" msTable instead of a copy. The general pattern in CatalogOpExecutor is that if the msTable is modified, then it is deep copied first, and the change will be "applied" later when loadTableMetadata() reloads the schema from HMS. This is also problematic because the table in the catalog will change even if updateCatalog() will fail somewhere later and will deviate from the HMS version. My idea for the fix is to do a deepCopy of msTable early in updateCatalog() and apply all changes to that copy. loadTableMetadata() will be called later and will reload the table at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L4619 Note that it is possible that the whole unsetTableColStats stuff is no longer necessary, as Hive know uses the "engine" field in stats and lives in a separate world than Impala, so we shouldn't be able to cause correctness issues by inserting rows without invalidating stats. -- To view, visit http://gerrit.cloudera.org:8080/16602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie656925d764d5eb26c318703ca425529ecf7a3a3 Gerrit-Change-Number: 16602 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Thu, 15 Oct 2020 10:23:48 +0000 Gerrit-HasComments: Yes
