Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11521 )
Change subject: IMPALA-7595: Check the validity of the time part of Parquet timestamps ...................................................................... IMPALA-7595: Check the validity of the time part of Parquet timestamps Before this fix Impala did not check whether a timestamp's time part is out of the valid [0, 24 hour) range when reading Parquet files, so these timestamps were memcopied as they were to slots, leading to results like: 1970-01-01 -00:00:00.000000001 1970-01-01 24:00:00 Different parts of Impala treat these timestamp differently: - string conversion leads to invalid representation that cannot be converted back to timestamp - timezone conversions handle the overflowing time part and give a valid timestamp result (at least since CCTZ, I did not check older versions of Impala) - Parquet writing inserts these timestamp as they are, so the resulting Parquet file will also contain corrupt timestamps The fix adds a check that converts these corrupt timestamps to NULL, similarly to the handling of timestamp outside the [1400..10000) range. A new error code is added for this case. If both the date and the time part is corrupt, then error about corrupt time is returned. Testing: - added a new scanner test that reads a corrupted Parquet file with edge values Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058 Reviewed-on: http://gerrit.cloudera.org:8080/11521 Reviewed-by: Csaba Ringhofer <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/exec/parquet-column-readers.cc M be/src/runtime/timestamp-value.h M common/thrift/generate_error_codes.py M testdata/data/README A testdata/data/out_of_range_time_of_day.parquet M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test M tests/query_test/test_scanners.py 8 files changed, 57 insertions(+), 4 deletions(-) Approvals: Csaba Ringhofer: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/11521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058 Gerrit-Change-Number: 11521 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
