Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16008 )
Change subject: IMPALA-9791: Support validWriteIdList in getPartialCatalogObject API ...................................................................... Patch Set 2: (9 comments) Compactions can cause some troubles I think, see my comment for details. http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3026 PS2, Line 3026: if (req.table_info_selector.valid_write_ids != null) { > This code block is for both TABLE and VIEW catalog objects. I would expect Maybe it can be a materialized view? http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3027 PS2, Line 3027: "default" nit: Catalog.DEFAULT_DB? http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3031 PS2, Line 3031: , nit: indentation should be +4 spaces instead of 2, also this standalone comma looks a bit weird http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1538 PS2, Line 1538: nit: indent with 4 spaces http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1582 PS2, Line 1582: nit: indent with 4 spaces http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java File fe/src/main/java/org/apache/impala/util/AcidUtils.java: http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@227 PS2, Line 227: delta_0001_0002 maybe "delta_0001_0002_v01234", and mention that "1234" is the visibilityTxnId. Otherwise the sentence "TransactionID derived from the directory name" could be confusing. http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324 PS2, Line 324: filterFDsforAcidState This might not work well with compactions: CREATE TABLE test (i int) .....; INSERT INTO TABLE test values (1), (2), (3); Now delta_001_001 contains this: 1 2 3 Then run DELETE FROM test WHERE i = 2; Now we have delete_delta_002_002 with the delete event of element 2. ALTER TABLE test COMPACT 'major'; Compaction creates base_002_v1234 with the following content: 1 3 (Value 2 is not stored anymore). Now if we load the table with both 1 and 2 write ids being valid, also transaction 1234 is valid, then we'll only have base_002_v1234/0000_0 in file descriptors. And if we try to filter the file descriptors with a writeIdList where only 1 is valid, we won't see any file descriptors. The workaround could be to fallback to load fds from the file system. Minor compactions are also problematic because the backend assumes that it should only validate rows in files written by Hive Streaming (when it is enough to validate per ORC stripe, or per ORC row batch). E.g. we might have the following file descriptor: delta_001_002_v01234/0000_0 But if based on the validWriteIdList we can only see rows coming from write id 1 then we'd need to add some (quite complex) logic to the backend to filter out rows based on write ids. Probably the workaround for that could be to add another stricter test to WriteListBasedPredicate. E.g. if 'rr == ValidWriteIdList.RangeResponse.ALL' is not satisfied then we should fall back to loading from the file system. http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@325 PS2, Line 325: nit: indent with 4 spaces http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@587 PS2, Line 587: same nit: the same -- To view, visit http://gerrit.cloudera.org:8080/16008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a Gerrit-Change-Number: 16008 Gerrit-PatchSet: 2 Gerrit-Owner: Vihang Karajgaonkar <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Anurag Mantripragada <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 02 Jun 2020 15:59:08 +0000 Gerrit-HasComments: Yes
