Alex Behm has uploaded a new patch set (#4). Change subject: IMPALA-4675: Case-insensitive matching of Parquet fields. ......................................................................
IMPALA-4675: Case-insensitive matching of Parquet fields. The query option PARQUET_FALLBACK_SCHEMA_RESOLUTION allows matching of Parquet fields by name instead of by index (the default). Parquet column names are case sensitive, but Impala treats db/table/column/field names as case-insensitive. Today, there is no way today to select Parquet columns with mixed casing via SQL using the name-based field resolution policy. This patch changes the matching of Parquet fields to be case-insensitive. Testing: - Modified the data files backing complextypestbl to contain fields with mixed casing. - Several existing tests run against this table, including the test for name-based resolution. - I confirmed that without this fix, the existing name-based resolution tests fail on the modified data files. - I locally ran test_scanners.py and test_nested_types.py on exhaustive with this fix. Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4 --- M be/src/exec/parquet-metadata-utils.cc M be/src/exec/parquet-metadata-utils.h M testdata/ComplexTypesTbl/nonnullable.avsc M testdata/ComplexTypesTbl/nonnullable.json M testdata/ComplexTypesTbl/nonnullable.parq M testdata/ComplexTypesTbl/nullable.avsc M testdata/ComplexTypesTbl/nullable.json M testdata/ComplexTypesTbl/nullable.parq M testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test M tests/query_test/test_scanners.py 10 files changed, 71 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/5891/4 -- To view, visit http://gerrit.cloudera.org:8080/5891 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I87395f84ba29b4c3d8e41be1ea4e89e500b8a9f4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Nathan Salmon <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Nathan Salmon <[email protected]>
