Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14547 )
Change subject: WIP IMPALA-9045: Filter base directories of open/aborted compactions ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java: http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@79 PS1, Line 79: * TODO(todd) we also likely need to pass an open transaction list here to deal : * with ignoring in-progress (not-yet-visible) compactions. Replace TODO with the new argument. http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@581 PS1, Line 581: ValidTxnList validTxnList = writeIds != null ? loadValidTxns(client) : null; Is there a reason for treating this differently than validWriteIds_? It is filled at https://github.com/apache/impala/blob/e05a5323785ecb09e45bdb5dfc96533e68256175/fe/src/main/java/org/apache/impala/catalog/Table.java#L325 Actually handling these two together may be needed for correct behavior. To get writeIds we currently use this api: https://github.com/apache/hive/blob/bd432c9203584c531f51559d8c97c398202f0794/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L3066 There is another overload that also takes a validTxnList argument and probably we should use that after fetching the validTxnList. As these APIs do not contain txnId, I guess they return the latest state of HMS. Getting validWriteIdList first and validTxnIdList later can lead to inconsistency in the following case: - there are 4 writeIds, 1, 2, 3, 4 - when we get the validWriteIdList, 2 is committed, but 1, 3 and 4 are not - between getting validWriteIdList and validTxnList, first 4, then 1 is committed, then a compaction is ran on id 2 (not 3/4, as 3 is not committed yet) - Impala will load the files and accept the compaction's base dir (contains data for writeId 1,2), but not 3 and 4. The problem with this is that 4 was committed before 1, so it is not valid to return results with 1 but without 4. -- To view, visit http://gerrit.cloudera.org:8080/14547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6 Gerrit-Change-Number: 14547 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Mon, 28 Oct 2019 17:06:59 +0000 Gerrit-HasComments: Yes
