Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14547 )
Change subject: WIP IMPALA-9045: Filter base directories of open/aborted compactions ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@581 PS1, Line 581: ValidTxnList validTxnList = writeIds != null ? loadValidTxns(client) : null; > Is there a reason for treating this differently than validWriteIds_? It is Write ids are local to the tables, so it was reasonable to store them in the Table objects. Transaction ids are global and I didn't want to store them for every table. However, AFAICT we could simply clear valid write ids and transacction ids after table loading since we don't use them later. The case you mention is interesting, but I think it cannot realistically happen because HMS only compacts up to a write id that doesn't have any preceeding open write ids. That means the following sequence of events are required: * Impala asks for valid write ids, 2 is valid, 1, 3, 4 are open * write id 4 gets committed * write id 1 gets committed * HMS starts compaction up to write id 2 * HMS finishes compaction * Impala asks for valid txn list Based on our experience it is not likely for a compaction to run that fast. On the other hand, I agree that using the other API would be more failproof, and maybe it's better to load valid txn and write id list together, then clear them after table loading. However, it wouldn't provide e.g. cross-table consistency. I think the ultimate solution would be to open a transaction and load all the tables in the context of that transaction. But that way we also need to introduce transactional management of cached metadata, which might be not trivial. But, with zero-touch metadata Impala is currently at least eventually cross-table consistent, and it's probably enough for us. -- To view, visit http://gerrit.cloudera.org:8080/14547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6 Gerrit-Change-Number: 14547 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 28 Oct 2019 19:42:30 +0000 Gerrit-HasComments: Yes
