Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16008 )

Change subject: IMPALA-9791: Support validWriteIdList in 
getPartialCatalogObject API
......................................................................


Patch Set 2:

(9 comments)

Compactions can cause some troubles I think, see my comment for details.

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3026
PS2, Line 3026:         if (req.table_info_selector.valid_write_ids != null) {
> This code block is for both TABLE and VIEW catalog objects. I would expect
Maybe it can be a materialized view?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3027
PS2, Line 3027: "default"
nit: Catalog.DEFAULT_DB?


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3031
PS2, Line 3031:   ,
nit: indentation should be +4 spaces instead of 2, also this standalone comma 
looks a bit weird


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1538
PS2, Line 1538:
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1582
PS2, Line 1582:
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java
File fe/src/main/java/org/apache/impala/util/AcidUtils.java:

http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@227
PS2, Line 227: delta_0001_0002
maybe "delta_0001_0002_v01234", and mention that "1234" is the visibilityTxnId.

Otherwise the sentence "TransactionID derived from the directory name" could be 
confusing.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@324
PS2, Line 324: filterFDsforAcidState
This might not work well with compactions:

 CREATE TABLE test (i int) .....;

 INSERT INTO TABLE test values (1), (2), (3);

Now delta_001_001 contains this:
 1
 2
 3

Then run

 DELETE FROM test WHERE i = 2;

Now we have delete_delta_002_002 with the delete event of element 2.

 ALTER TABLE test COMPACT 'major';

Compaction creates base_002_v1234 with the following content:
 1
 3

(Value 2 is not stored anymore).

Now if we load the table with both 1 and 2 write ids being valid, also 
transaction 1234 is valid, then we'll only have base_002_v1234/0000_0 in file 
descriptors.

And if we try to filter the file descriptors with a writeIdList where only 1 is 
valid, we won't see any file descriptors. The workaround could be to fallback 
to load fds from the file system.

Minor compactions are also problematic because the backend assumes that it 
should only validate rows in files written by Hive Streaming (when it is enough 
to validate per ORC stripe, or per ORC row batch).

E.g. we might have the following file descriptor:
 delta_001_002_v01234/0000_0

But if based on the validWriteIdList we can only see rows coming from write id 
1 then we'd need to add some (quite complex) logic to the backend to filter out 
rows based on write ids.

Probably the workaround for that could be to add another stricter test to 
WriteListBasedPredicate. E.g. if 'rr == ValidWriteIdList.RangeResponse.ALL' is 
not satisfied then we should fall back to loading from the file system.


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@325
PS2, Line 325:
nit: indent with 4 spaces


http://gerrit.cloudera.org:8080/#/c/16008/2/fe/src/main/java/org/apache/impala/util/AcidUtils.java@587
PS2, Line 587: same
nit: the same



--
To view, visit http://gerrit.cloudera.org:8080/16008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied2c7c3cb2009c407e8fbc3af4722b0d34f57c4a
Gerrit-Change-Number: 16008
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Anurag Mantripragada <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 02 Jun 2020 15:59:08 +0000
Gerrit-HasComments: Yes

Reply via email to