Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15818 )

Change subject: IMPALA-9512: Full ACID Milestone 2: Validate each row against 
the valid write id list
......................................................................


Patch Set 1:

(9 comments)

I went though tha FE and test parts, I need more time to dive in to the cpp 
parts.

http://gerrit.cloudera.org:8080/#/c/15818/1/be/src/exec/acid-metadata-utils.cc
File be/src/exec/acid-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15818/1/be/src/exec/acid-metadata-utils.cc@113
PS1, Line 113:   for (int64_t i = min_write_id; i <= max_write_id; ++i) {
Aren't we too pessimistic here?

Let's say that a delta dir has write ids 5 to 10, and write id 6 is not in the 
validWriteId list

a. if writeId 6 was aborted, then we shouldn't see it in the delta dir at all

b. if writeId 6 was open when Impala got its metadata from HMS, then I think 
that we shouldn't see a minor compacted delta dir like 5..10 - such a 
compaction shouldn't even have started while 6 was open. This should be ensured 
by visibilityTxnId - if appears in validTxnList, then we also see 6 in the 
validWriteIdList as finished or aborted.

It's possible that I miss some important point though.


http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@300
PS1, Line 300:         if (loadStats != null)
nit: use braces in multi line blocks


http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@349
PS1, Line 349:             loadStats.filesSupercededByNewerBase++;
nit: use braces in multi line blocks


http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@374
PS1, Line 374: if (parsedDelta.minWriteId <= baseWriteId)
It is not possible for baseWriteId to be between minWriteId and maxWriteId, 
right? Maybe add an exception in this case?


http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
File fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java:

http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@277
PS1, Line 277: transaction 10
Don't you mean writeId 10? I don't see any txn id in the paths.


http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@356
PS1, Line 356:           "delta_0000006_0000007/00000"
How do we know that this minor compaction was actually finished, and we are not 
reading a work-in-progress directory? Generally I don't understand  how this 
works in directories without visibilityTxnId. Can we actually see such 
directories with current Hive?


http://gerrit.cloudera.org:8080/#/c/15818/1/tests/common/acid_txn.py
File tests/common/acid_txn.py:

http://gerrit.cloudera.org:8080/#/c/15818/1/tests/common/acid_txn.py@41
PS1, Line 41: # >>> at.get_open_txns()
optional: for me it would be more logical to put this in util than to common

btw thanks for uploading this tool


http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py
File tests/query_test/test_acid_row_validation.py:

http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py@73
PS1, Line 73:     Creates a table like 'base_table, synthetically creates a 
minor compacted
nit: missing '


http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py@85
PS1, Line 85:     check_call(['hdfs', 'dfs', '-mkdir', '-p', 
compacted_delta_dir])
            :     check_call(['hdfs', 'dfs', '-cp'] + files + 
[compacted_delta_dir])
using https://github.com/apache/impala/blob/master/tests/util/hdfs_util.py 
functions via self.filesystem_client is the preferred way to do these



--
To view, visit http://gerrit.cloudera.org:8080/15818
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d
Gerrit-Change-Number: 15818
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 28 Apr 2020 17:49:09 +0000
Gerrit-HasComments: Yes

Reply via email to