Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15818 )
Change subject: IMPALA-9512: Full ACID Milestone 2: Validate each row against the valid write id list ...................................................................... Patch Set 1: (9 comments) I went though tha FE and test parts, I need more time to dive in to the cpp parts. http://gerrit.cloudera.org:8080/#/c/15818/1/be/src/exec/acid-metadata-utils.cc File be/src/exec/acid-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15818/1/be/src/exec/acid-metadata-utils.cc@113 PS1, Line 113: for (int64_t i = min_write_id; i <= max_write_id; ++i) { Aren't we too pessimistic here? Let's say that a delta dir has write ids 5 to 10, and write id 6 is not in the validWriteId list a. if writeId 6 was aborted, then we shouldn't see it in the delta dir at all b. if writeId 6 was open when Impala got its metadata from HMS, then I think that we shouldn't see a minor compacted delta dir like 5..10 - such a compaction shouldn't even have started while 6 was open. This should be ensured by visibilityTxnId - if appears in validTxnList, then we also see 6 in the validWriteIdList as finished or aborted. It's possible that I miss some important point though. http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@300 PS1, Line 300: if (loadStats != null) nit: use braces in multi line blocks http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@349 PS1, Line 349: loadStats.filesSupercededByNewerBase++; nit: use braces in multi line blocks http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/main/java/org/apache/impala/util/AcidUtils.java@374 PS1, Line 374: if (parsedDelta.minWriteId <= baseWriteId) It is not possible for baseWriteId to be between minWriteId and maxWriteId, right? Maybe add an exception in this case? http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java File fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java: http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@277 PS1, Line 277: transaction 10 Don't you mean writeId 10? I don't see any txn id in the paths. http://gerrit.cloudera.org:8080/#/c/15818/1/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@356 PS1, Line 356: "delta_0000006_0000007/00000" How do we know that this minor compaction was actually finished, and we are not reading a work-in-progress directory? Generally I don't understand how this works in directories without visibilityTxnId. Can we actually see such directories with current Hive? http://gerrit.cloudera.org:8080/#/c/15818/1/tests/common/acid_txn.py File tests/common/acid_txn.py: http://gerrit.cloudera.org:8080/#/c/15818/1/tests/common/acid_txn.py@41 PS1, Line 41: # >>> at.get_open_txns() optional: for me it would be more logical to put this in util than to common btw thanks for uploading this tool http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py File tests/query_test/test_acid_row_validation.py: http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py@73 PS1, Line 73: Creates a table like 'base_table, synthetically creates a minor compacted nit: missing ' http://gerrit.cloudera.org:8080/#/c/15818/1/tests/query_test/test_acid_row_validation.py@85 PS1, Line 85: check_call(['hdfs', 'dfs', '-mkdir', '-p', compacted_delta_dir]) : check_call(['hdfs', 'dfs', '-cp'] + files + [compacted_delta_dir]) using https://github.com/apache/impala/blob/master/tests/util/hdfs_util.py functions via self.filesystem_client is the preferred way to do these -- To view, visit http://gerrit.cloudera.org:8080/15818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d Gerrit-Change-Number: 15818 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 28 Apr 2020 17:49:09 +0000 Gerrit-HasComments: Yes
