Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14547 )

Change subject: WIP IMPALA-9045: Filter base directories of open/aborted 
compactions
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
File fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java:

http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java@79
PS1, Line 79:    *   TODO(todd) we also likely need to pass an open transaction 
list here to deal
            :    *   with ignoring in-progress (not-yet-visible) compactions.
Replace TODO with the new argument.


http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@581
PS1, Line 581:     ValidTxnList validTxnList = writeIds != null ? 
loadValidTxns(client) : null;
Is there a reason for treating this differently than validWriteIds_? It is 
filled at 
https://github.com/apache/impala/blob/e05a5323785ecb09e45bdb5dfc96533e68256175/fe/src/main/java/org/apache/impala/catalog/Table.java#L325

Actually handling these two together may be needed for correct behavior. To get 
writeIds we currently use this api: 
https://github.com/apache/hive/blob/bd432c9203584c531f51559d8c97c398202f0794/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java#L3066

There is another overload that also takes a validTxnList argument and probably 
we should use that after fetching the validTxnList.

As these APIs do not contain txnId, I guess they return the latest state of 
HMS. Getting validWriteIdList first and validTxnIdList later can lead to 
inconsistency in the following case:
- there are 4 writeIds, 1, 2, 3, 4
- when we get the validWriteIdList, 2 is committed, but 1, 3 and 4 are not
- between getting validWriteIdList and validTxnList, first 4, then 1 is 
committed, then a compaction is ran on id 2 (not 3/4, as 3 is not committed yet)
- Impala will load the files and accept the compaction's base dir (contains 
data for writeId 1,2), but not 3 and 4. The problem with this is that 4 was 
committed before 1, so it is not valid to return results with 1 but without 4.



--
To view, visit http://gerrit.cloudera.org:8080/14547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6
Gerrit-Change-Number: 14547
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Mon, 28 Oct 2019 17:06:59 +0000
Gerrit-HasComments: Yes

Reply via email to