Hello Zoltan Borok-Nagy, Attila Jeges, Tim Armstrong, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11057 to look at the new patch set (#16). Change subject: IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet ...................................................................... IMPALA-5050: Add support to read TIMESTAMP_MILLIS and TIMESTAMP_MICROS from Parquet Changes: - parquet.thrift is updated to a newer version which contains the timestamp logical type. - INT64 columns with converted types TIMESTAMP_MILLIS and TIMESTAMP_MICROS can be read as TIMESTAMP. - If the logical type is timestamp, then the type will contain the information whether the UTC->local conversion is necessary. This feature is only supported for the new timestamp types, so INT96 timestamps must still use flag convert_legacy_hive_parquet_utc_timestamps. - Min/max stat filtering is enabled again for columns that need UTC->local conversion. This was disabled in IMPALA-7559 because it could incorrectly drop column chunks. Note that CREATE TABLE LIKE PARQUET still converts these columns to BIGINTS. Converting to TIMESTAMP could be a breaking change in some scenarios. IMPALA-7723 is created to track this possible change. Testing: - Added unit tests for timezone conversion (this needed a new public function in timezone_db.h and adding CET to tzdb_tiny). - Added parquet files (created with parquet-mr) with int64 timestamp columns. Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3 --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/exec/parquet-column-stats.cc M be/src/exec/parquet-column-stats.h M be/src/exec/parquet-column-stats.inline.h M be/src/exec/parquet-common.cc M be/src/exec/parquet-common.h M be/src/exec/parquet-metadata-utils.cc M be/src/exprs/timezone_db.h M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.cc M be/src/runtime/timestamp-value.h M be/src/util/dict-encoding.h M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M testdata/data/README A testdata/data/int64_timestamps_at_dst_changes.parquet A testdata/data/int64_timestamps_dict.parq A testdata/data/int64_timestamps_plain.parq A testdata/tzdb_tiny/CET A testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test M tests/query_test/test_scanners.py 24 files changed, 908 insertions(+), 169 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/57/11057/16 -- To view, visit http://gerrit.cloudera.org:8080/11057 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4c7c01fffa31b3d2ca3480adf6ff851137dadac3 Gerrit-Change-Number: 11057 Gerrit-PatchSet: 16 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>