Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15395 )
Change subject: WIP IMPALA-9042: Milestone 1: properly scan files that has full ACID schema ...................................................................... Patch Set 2: (7 comments) The solution looks good to me. Thanks for putting this together so quickly! http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15395/2//COMMIT_MSG@63 PS2, Line 63: * make all tests green in exhaustive Also need tests on column masking since we also have some hacks in path resolution when the table contains nested column and column masking policies on other primitive columns (IMPALA-9330). http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/hdfs-orc-scanner.cc@185 PS2, Line 185: if (scan_node_->hdfs_table()->IsFullAcid() != schema_resolver_->IsFullAcid()) { I think this is too strict. The test on file schema can be false positive. Users that play around with acid orc files may use CREATE TABLE xxx LIKE ORC file to create a non-acid table. But with this check they won't be able to read it. http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@101 PS2, Line 101: if (table_idx >= numPartCols) { I think this can be a DCHECK since partition columns should be skipped by https://github.com/apache/impala/blob/4a8221877cc1782b861184f0ccf86238d002af13/be/src/exec/hdfs-orc-scanner.cc#L364 http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@106 PS2, Line 106: table_idx - numPartCols Could you add a DCHECK that the resulted index won't overflow? http://gerrit.cloudera.org:8080/#/c/15395/2/be/src/exec/orc-metadata-utils.cc@234 PS2, Line 234: (root_->getFieldName(0) != "operation" || : root_->getFieldName(5) != "row") I think we should also check other fields and their types. http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/15395/2/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@847 PS2, Line 847: HIVEFULLACIDWRITE We don't have write support yet. Is this required somewhere? http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/15395/2/tests/query_test/test_scanners.py@1297 PS2, Line 1297: self.client.execute("alter table %s.%s set tblproperties('transactional'='false')" % Do we need this? Doesn't the table being translated to EXTERNAL table? -- To view, visit http://gerrit.cloudera.org:8080/15395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic2e2afec00c9a5cf87f1d61b5fe52b0085844bcb Gerrit-Change-Number: 15395 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 17 Mar 2020 11:41:31 +0000 Gerrit-HasComments: Yes
