Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )
Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables ...................................................................... Patch Set 12: (6 comments) Left a few minor comments, but the change looks great! http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc File be/src/exec/orc-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@310 PS12, Line 310: if (!child->hasAttributeKey(ICEBERG_FIELD_ID)) { nit: I think you could add UNLIKELY here http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311 PS12, Line 311: // Already traversed through the file during init, find is guaranteed to hit During init we only checked whether the root type's first child has field id. What happens if some types have field ids and some don't? We should probably raise an error / return null. http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311 PS12, Line 311: // Already traversed through the file during init, find is guaranteed to hit : DCHECK(orc_type_to_field_id_.find(child) != orc_type_to_field_id_.end()); : : child_field_id = orc_type_to_field_id_.find(child)->second; nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID" http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc File be/src/exec/parquet/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@835 PS12, Line 835: if (child->element->__isset.field_id) { nit: you could add LIKELY here http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@838 PS12, Line 838: // Traversed through the file during init, find is guaranteed to hit : DCHECK(schema_node_to_field_id_.find(child) != schema_node_to_field_id_.end()); : : child_field_id = schema_node_to_field_id_.find(child)->second; nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID" http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py File tests/query_test/test_iceberg.py: http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py@106 PS12, Line 106: def test_migrated_table_field_id_resolution(self, vector, unique_database): you should add annotation "@SkipIf.not_hdfs" -- To view, visit http://gerrit.cloudera.org:8080/18639 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc Gerrit-Change-Number: 18639 Gerrit-PatchSet: 12 Gerrit-Owner: Gergely Fürnstáhl <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tamas Mate <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 08 Jul 2022 14:31:04 +0000 Gerrit-HasComments: Yes
