Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15395 )

Change subject: WIP IMPALA-9042: Milestone 1: properly scan files that has full 
ACID schema
......................................................................


Patch Set 2:

(7 comments)

The solution looks good to me. Thanks for putting this together so quickly!

http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG@63
PS2, Line 63:  * make all tests green in exhaustive
Also need tests on column masking since we also have some hacks in path 
resolution when the table contains nested column and column masking policies on 
other primitive columns (IMPALA-9330).


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc
File be/src/exec/hdfs-orc-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc@185
PS2, Line 185:   if (scan_node_->hdfs_table()->IsFullAcid() != 
schema_resolver_->IsFullAcid()) {
I think this is too strict. The test on file schema can be false positive. 
Users that play around with acid orc files may use CREATE TABLE xxx LIKE ORC 
file to create a non-acid table. But with this check they won't be able to read 
it.


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@101
PS2, Line 101:         if (table_idx >= numPartCols) {
I think this can be a DCHECK since partition columns should be skipped by 
https://github.com/apache/impala/blob/4a8221877cc1782b861184f0ccf86238d002af13/be/src/exec/hdfs-orc-scanner.cc#L364


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@106
PS2, Line 106: table_idx - numPartCols
Could you add a DCHECK that the resulted index won't overflow?


http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@234
PS2, Line 234: (root_->getFieldName(0) != "operation" ||
             :        root_->getFieldName(5) != "row")
I think we should also check other fields and their types.


http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@847
PS2, Line 847: HIVEFULLACIDWRITE
We don't have write support yet. Is this required somewhere?


http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py@1297
PS2, Line 1297:     self.client.execute("alter table %s.%s set 
tblproperties('transactional'='false')" %
Do we need this? Doesn't the table being translated to EXTERNAL table?



--
To view, visit http://gerrit.cloudera.org:8080/15395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb
Gerrit-Change-Number: 15395
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Comment-Date: Tue, 17 Mar 2020 11:41:31 +0000
Gerrit-HasComments: Yes

Reply via email to