Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11984 )
Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet ...................................................................... IMPALA-7853: Add support to read int64 NANO timestamps from Parquet PARQUET-1387 added int64 timestamps with nanosecond precision that stores timestamps as nanoseconds since the Unix epoch. As 64 bits are not enough to represent the whole 1400..9999 range of Impala timestamps, this new type works with a limited range: 1677-09-21 00:12:43.145224192 .. 2262-04-11 23:47:16.854775807 UTC The benefit of the reduced range is that no validation is necessary during scanning, as every possible 64 bit value represents a valid timestamp in Impala. This may mean that this has the potential be the fastest way to store timestamps in Impala + Parquet. Another way NANO differs from MICRO and MILLI is that NANO can be only described with new logical types in Parquet, it has no converted type equivalent. This made implementing CREATE TABLE LIKE PARQUET less trivial than it was for MICRO/MILLI: the type conversion logic in ParquetHelper.java had to be rewritten to use LogicalTypeAnnotation instead of ConvertedType. The changes on Java side also made bumping CDH_BUILD_NUMBER necessary. Testing: - added a new testfile with int64 nano timestamps - ran core tests Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Reviewed-on: http://gerrit.cloudera.org:8080/11984 Reviewed-by: Impala Public Jenkins <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/parquet/parquet-common.cc M be/src/exec/parquet/parquet-common.h M be/src/exec/parquet/parquet-metadata-utils.cc M be/src/runtime/timestamp-test.cc M be/src/runtime/timestamp-value.h M be/src/runtime/timestamp-value.inline.h M bin/impala-config.sh M common/thrift/parquet.thrift M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java M testdata/data/README A testdata/data/int64_timestamps_nano.parquet M testdata/workloads/functional-query/queries/QueryTest/parquet-int64-timestamps.test M tests/query_test/test_scanners.py 13 files changed, 174 insertions(+), 69 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 8 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
