Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15818 )
Change subject: IMPALA-9512: Full ACID Milestone 2: Validate rows against the valid write id list ...................................................................... Patch Set 16: (6 comments) Sorry for the delayed review comments. Feel free to ignore or follow-up in a separate as you see appropriate. Most questions are basically are for my understanding as I am not very familiar with ACID. http://gerrit.cloudera.org:8080/#/c/15818/13/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/15818/13/fe/src/main/java/org/apache/impala/catalog/Table.java@580 PS13, Line 580: if (getMetaStoreTable() != null && Is it intentional that we always send the ValidWriteIdList irrespective of whether we requested the hms table or not? If yes, would be good to a comment with the reason. http://gerrit.cloudera.org:8080/#/c/15818/13/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/15818/13/fe/src/main/java/org/apache/impala/util/AcidUtils.java@387 PS13, Line 387: if (parsedDelta.minWriteId <= baseWriteId) { : Preconditions.checkState(parsedDelta.maxWriteId <= baseWriteId); : it.remove(); : continue; : } we could use some comments here. Is the code removing all the delta directories which are superceded by the base directory? Just for my understand when would that happen? http://gerrit.cloudera.org:8080/#/c/15818/13/fe/src/main/java/org/apache/impala/util/AcidUtils.java@398 PS13, Line 398: MetaException Change to CatalogException? http://gerrit.cloudera.org:8080/#/c/15818/15/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/15818/15/fe/src/main/java/org/apache/impala/util/AcidUtils.java@191 PS15, Line 191: if (!isTxnValid(pd.visibilityTxnId)) return false; I am curious to understand this. The writeIdList that we load is pertaining to the current transaction state of the system. But during the load we fetch the ValidTxnList again. IIUC it seems like there are some transactions which do not update the write id and hence we need to look at the transaction ids specifically to determine if we need to skip certain files. I see that in IMPALA-9045 we added this check for base directories to ignore files when a major compaction is in open or aborted state. Does the same logic apply to delta files too? Are there more cases where we need to rely on validTxnIdList? http://gerrit.cloudera.org:8080/#/c/15818/15/fe/src/main/java/org/apache/impala/util/AcidUtils.java@246 PS15, Line 246: visibilityTxnId Just for sake my understanding, is this needed to support minor compactions similar to what IMPALA-9045 did for base directories? http://gerrit.cloudera.org:8080/#/c/15818/15/fe/src/main/java/org/apache/impala/util/AcidUtils.java@380 PS15, Line 380: baseWriteId Is it possible that baseWriteId is equal to SENTINEL_BASE_WRITE_ID here? -- To view, visit http://gerrit.cloudera.org:8080/15818 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5ed74585a2d73ebbcee763b0545be4412926299d Gerrit-Change-Number: 15818 Gerrit-PatchSet: 16 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 21 May 2020 19:39:31 +0000 Gerrit-HasComments: Yes
