Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated 
Iceberg tables
......................................................................


Patch Set 12:

(6 comments)

Left a few minor comments, but the change looks great!

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@310
PS12, Line 310:     if (!child->hasAttributeKey(ICEBERG_FIELD_ID)) {
nit: I think you could add UNLIKELY here


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       // Already traversed through the file during init, find 
is guaranteed to hit
During init we only checked whether the root type's first child has field id. 
What happens if some types have field ids and some don't? We should probably 
raise an error / return null.


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       // Already traversed through the file during init, find 
is guaranteed to hit
              :       DCHECK(orc_type_to_field_id_.find(child) != 
orc_type_to_field_id_.end());
              :
              :       child_field_id = 
orc_type_to_field_id_.find(child)->second;
nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID"


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@835
PS12, Line 835:     if (child->element->__isset.field_id) {
nit: you could add LIKELY here


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@838
PS12, Line 838:       // Traversed through the file during init, find is 
guaranteed to hit
              :       DCHECK(schema_node_to_field_id_.find(child) != 
schema_node_to_field_id_.end());
              :
              :       child_field_id = 
schema_node_to_field_id_.find(child)->second;
nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID"


http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py@106
PS12, Line 106:   def test_migrated_table_field_id_resolution(self, vector, 
unique_database):
you should add annotation "@SkipIf.not_hdfs"



--
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tamas Mate <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 08 Jul 2022 14:31:04 +0000
Gerrit-HasComments: Yes

Reply via email to