Hello Zoltan Borok-Nagy, Attila Jeges, Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11057
to look at the new patch set (#17).
Change subject: IMPALA-5050: Add support to read TIMESTAMP_MILLIS and
TIMESTAMP_MICROS from Parquet
......................................................................
IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from
Parquet
Changes:
- parquet.thrift is updated to a newer version which contains the
timestamp logical type.
- INT64 columns with converted types TIMESTAMP_MILLIS and
TIMESTAMP_MICROS can be read as TIMESTAMP.
- If the logical type is timestamp, then the type will contain the
information whether the UTC->local conversion is necessary. This
feature is only supported for the new timestamp types, so INT96
timestamps must still use flag
convert_legacy_hive_parquet_utc_timestamps.
- Min/max stat filtering is enabled again for columns that need
UTC->local conversion. This was disabled in IMPALA-7559 because
it could incorrectly drop column chunks.
- CREATE TABLE LIKE PARQUET converts these columns to
TIMESTAMP - before the change, an error was returned instead.
Testing:
- Added unit tests for timezone conversion (this needed a new public
function in timezone_db.h and adding CET to tzdb_tiny).
- Added parquet files (created with parquet-mr) with int64 timestamp
columns.
Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M be/src/exec/parquet-common.cc
M be/src/exec/parquet-common.h
M be/src/exec/parquet-metadata-utils.cc
M be/src/exprs/timezone_db.h
M be/src/runtime/timestamp-test.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M be/src/util/dict-encoding.h
M common/thrift/parquet.thrift
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M testdata/data/README
A testdata/data/int64_timestamps_at_dst_changes.parquet
A testdata/data/int64_timestamps_dict.parquet
A testdata/data/int64_timestamps_plain.parquet
A testdata/tzdb_tiny/CET
A
testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test
M tests/query_test/test_scanners.py
24 files changed, 891 insertions(+), 169 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/11057/17
--
To view, visit http://gerrit.cloudera.org:8080/11057
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3
Gerrit-Change-Number: 11057
Gerrit-PatchSet: 17
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>