[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution Currently ORC schema resolution is position-based only. Then the positional information is reused during ORC column reader creation. This prevents adding other column resolution strategies, e.g. column resolution by name or Iceberg field id. It also prevents schema evolution, as table metadata and file metadata should be in sync. This patch makes column reader creation independent of schema resolution. It does this by creating a mapping between slot/tuple descriptors and ORC type ids during schema resolution and use this mapping during column reader creation. Now further patches just need to add support for other column resolution strategies, the column readers will be created accordingly. Testing: * no additional tests as it is neither a bug fix nor a new feature Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Reviewed-on: http://gerrit.cloudera.org:8080/17039 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M tests/query_test/test_scanners.py 7 files changed, 80 insertions(+), 138 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 23:07:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6904/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 17:26:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 17:26:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 4: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321 PS3, Line 321: node > We can have a nullptr here in case of MAPs. It happens when we only query o Ah yes, thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 16:53:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8172/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 16:48:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h@212 PS3, Line 212: from slot/ > If I saw correctly, we always know exactly whether it is a slot or a tuple Done http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@308 PS3, Line 308: tiple children, i. > I think we should mention that LIST also has one child type but we won't us Re-worded a bit to emphasize it's for complex types only that might have multiple children. http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321 PS3, Line 321: node > nit: should we use DCHECK(node != nullptr) instead? We can have a nullptr here in case of MAPs. It happens when we only query one child of the map. E.g. when we query the "value" of the map, the ORC lib still creates two children of the MAP type, one child being nullptr. I think it's because this way 0 always mean KEY and 1 always mean VALUE. http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@728 PS3, Line 728: 0) { > nit: For code readability, I think we should use 0 here, because 'field' is Done -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 16:18:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17039 to look at the new patch set (#4). Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution Currently ORC schema resolution is position-based only. Then the positional information is reused during ORC column reader creation. This prevents adding other column resolution strategies, e.g. column resolution by name or Iceberg field id. It also prevents schema evolution, as table metadata and file metadata should be in sync. This patch makes column reader creation independent of schema resolution. It does this by creating a mapping between slot/tuple descriptors and ORC type ids during schema resolution and use this mapping during column reader creation. Now further patches just need to add support for other column resolution strategies, the column readers will be created accordingly. Testing: * no additional tests as it is neither a bug fix nor a new feature Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M tests/query_test/test_scanners.py 7 files changed, 80 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/4 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 3: Code-Review+1 (3 comments) LGTM. We can bump it to +2 after resolving the discussions. http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@308 PS3, Line 308: i.e. STRUCT or MAP I think we should mention that LIST also has one child type but we won't use this function for it. http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@321 PS3, Line 321: node nit: should we use DCHECK(node != nullptr) instead? http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/orc-column-readers.cc@728 PS3, Line 728: SchemaPathConstants::MAP_KEY nit: For code readability, I think we should use 0 here, because 'field' is now a subcolumn index of an ORC MAP type instead of a field in the SchemaPath. It's unrelative to SchemaPathConstants anymore. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 19 Feb 2021 13:39:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 3: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/17039/3/be/src/exec/hdfs-orc-scanner.h@212 PS3, Line 212: slot/tuple If I saw correctly, we always know exactly whether it is a slot or a tuple that we get/put. We could use 2 separate maps instead for slots and tuples - this would be a few extra lines of code but give better type safety. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 16 Feb 2021 17:04:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8118/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Feb 2021 15:43:09 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8116/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Feb 2021 15:35:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@291 PS1, Line 291: inline uint64_t OrcComplexColumnReader::GetTargetColId( : const SlotDescriptor* slot_desc) const { : return slot_desc->type().IsCollectionType() ? : > This is no longer used. Done http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@296 PS1, Line 296: } > This is no longer used too. Done http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@311 PS1, Line 311: * Return the result in '*child' and its index inside the children. Returns false for > nit: inline? Done http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@316 PS1, Line 316: DCHECK(parent.getKind() == orc::TypeKind::STRUCT || > Could you move the original comments of FindChild here? Added a bit more verbose comment. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 10 Feb 2021 15:23:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17039 to look at the new patch set (#3). Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution Currently ORC schema resolution is position-based only. Then the positional information is reused during ORC column reader creation. This prevents adding other column resolution strategies, e.g. column resolution by name or Iceberg field id. It also prevents schema evolution, as table metadata and file metadata should be in sync. This patch makes column reader creation independent of schema resolution. It does this by creating a mapping between slot/tuple descriptors and ORC type ids during schema resolution and use this mapping during column reader creation. Now further patches just need to add support for other column resolution strategies, the column readers will be created accordingly. Testing: * no additional tests as it is neither a bug fix nor a new feature Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M tests/query_test/test_scanners.py 7 files changed, 71 insertions(+), 137 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/3 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Hello Quanlong Huang, Gabor Kaszab, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17039 to look at the new patch set (#2). Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution Currently ORC schema resolution is position-based only. Then the positional information is reused during ORC column reader creation. This prevents adding other column resolution strategies, e.g. column resolution by name or Iceberg field id. It also prevents schema evolution, as table metadata and file metadata should be in sync. This patch makes column reader creation independent of schema resolution. It does this by creating a mapping between slot/tuple descriptors and ORC type ids during schema resolution and use this mapping during column reader creation. Now further patches just need to add support for other column resolution strategies, the column readers will be created accordingly. Testing: * no additional tests as it is neither a bug fix nor a new feature Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h M tests/query_test/test_scanners.py 7 files changed, 69 insertions(+), 137 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/2 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 1: (5 comments) Good to see a negative size patch! Thanks for the simplification! I'm going to have a detailed look. Left some minor comments first. http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc File be/src/exec/orc-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@291 PS1, Line 291: inline bool PathContains(const SchemaPath& path, const SchemaPath& sub_path) { : return path.size() >= sub_path.size() && : std::equal(sub_path.begin(), sub_path.end(), path.begin()); : } This is no longer used. http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@296 PS1, Line 296: inline const SchemaPath& GetTargetColPath(const SlotDescriptor* slot_desc) { This is no longer used too. http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@311 PS1, Line 311: bool IsDescendant(const orc::Type& node, uint64_t candidate_col_id) { nit: inline? http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@316 PS1, Line 316: Could you move the original comments of FindChild here? http://gerrit.cloudera.org:8080/#/c/17039/1/be/src/exec/orc-column-readers.cc@606 PS1, Line 606: // We have a position slot descriptor if it refers to this LIST ORC type, but it isn't : // a collection slot. It may be useful to mention "See how position slots are resolved in HdfsOrcScanner::ResolveColumns()." -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 10 Feb 2021 09:57:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17039 ) Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/8090/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 08 Feb 2021 14:01:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17039 Change subject: IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution .. IMPALA-10485: part(1): make ORC column reader creation independent of schema resolution Currently ORC schema resolution is position-based only. Then the positional information is reused during ORC column reader creation. This prevents adding other column resolution strategies, e.g. column resolution by name or Iceberg field id. It also prevents schema evolution, as table metadata and file metadata should be in sync. This patch makes column reader creation independent of schema resolution. It does this by creating a mapping between slot/tuple descriptors and ORC type ids during schema resolution and use this mapping during column reader creation. Now further patches just need to add support for other column resolution strategies, the column readers will be created accordingly. Testing: * no additional tests as it is neither a bug fix nor a new feature Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e --- M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/orc-metadata-utils.cc M be/src/exec/orc-metadata-utils.h 6 files changed, 64 insertions(+), 129 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/17039/1 -- To view, visit http://gerrit.cloudera.org:8080/17039 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I0f7d521f9397c5188fadc7996cee0bd1650d363e Gerrit-Change-Number: 17039 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy