Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14547 )

Change subject: WIP IMPALA-9045: Filter base directories of open/aborted 
compactions
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/14547/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@581
PS1, Line 581:     ValidTxnList validTxnList = writeIds != null ? 
loadValidTxns(client) : null;
> Is there a reason for treating this differently than validWriteIds_? It is
Write ids are local to the tables, so it was reasonable to store them in the 
Table objects. Transaction ids are global and I didn't want to store them for 
every table. However, AFAICT we could simply clear valid write ids and 
transacction ids after table loading since we don't use them later.

The case you mention is interesting, but I think it cannot realistically happen 
because HMS only compacts up to a write id that doesn't have any preceeding 
open write ids.

That means the following sequence of events are required:

* Impala asks for valid write ids, 2 is valid, 1, 3, 4 are open
* write id 4 gets committed
* write id 1 gets committed
* HMS starts compaction up to write id 2
* HMS finishes compaction
* Impala asks for valid txn list

Based on our experience it is not likely for a compaction to run that fast.

On the other hand, I agree that using the other API would be more failproof, 
and maybe it's better to load valid txn and write id list together, then clear 
them after table loading.

However, it wouldn't provide e.g. cross-table consistency. I think the ultimate 
solution would be to open a transaction and load all the tables in the context 
of that transaction. But that way we also need to introduce transactional 
management of cached metadata, which might be not trivial. But, with zero-touch 
metadata Impala is currently at least eventually cross-table consistent, and 
it's probably enough for us.



--
To view, visit http://gerrit.cloudera.org:8080/14547
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idb895df38bc075e4767e44a6887dbe3000a19ea6
Gerrit-Change-Number: 14547
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 28 Oct 2019 19:42:30 +0000
Gerrit-HasComments: Yes

Reply via email to