Hello Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/16383
to look at the new patch set (#4).
Change subject: IMPALA-10115: Impala should check file schema as well to check
full ACIDv2 files
......................................................................
IMPALA-10115: Impala should check file schema as well to check full ACIDv2 files
Currently Impala checks file metadata 'hive.acid.version' to decide the
full ACID schema. There are cases when Hive forgets to set this value
for full ACID files, e.g. query-based compactions.
So it's more robust to check the schema elements instead of the metadata
field. Also, sometimes Hive write the schema with different character
cases, e.g. originalTransaction vs originaltransaction, so we should
rather compare the column names in a case insensitive way.
Testing:
* added test for full ACID compaction
* added test_full_acid_schema_without_file_metadata_tag to test full
ACID file without metadata 'hive.acid.version'
Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M testdata/data/README
A testdata/data/full_acid_schema_but_no_acid_version.orc
M testdata/workloads/functional-query/queries/QueryTest/acid-compaction.test
M tests/query_test/test_acid.py
7 files changed, 88 insertions(+), 27 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/16383/4
--
To view, visit http://gerrit.cloudera.org:8080/16383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I52642c1755599efd28fa2c90f13396cfe0f5fa14
Gerrit-Change-Number: 16383
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>