Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11521
Change subject: IMPALA-7595: Check the validity of the time part of Parquet timestamps ...................................................................... IMPALA-7595: Check the validity of the time part of Parquet timestamps Before this fix Impala did not check whether a timestamp's time part is out of the valid [0, 24 hour) range when reading Parquet files, so these timestamps were memcopied as they were to slots, leading to results like: 1970-01-01 -00:00:00.000000001 1970-01-01 24:00:00 Different parts of Impala treat these timestamp differently: - string conversion leads to invalid representation that cannot be converted back to string - timezone conversions handle the overflowing time part and give a valid timestamp result (at least since CCTZ, I did not check older versions of Impala) - Parquet writing inserts these timestamp as they are, so the resulting Parquet file will also contain corrupt timestamps The fix adds a check that converts these corrupt timestamps to NULL, similarly to the handling of timestamp outside the [1400..10000) range. The same error is returned in both cases - it may make sense to add a new error message for this kind of timestamp, but as this error did not occur in production (as far as I know), I thought that separate error message is not necessary. Testing: - added a new scanner test that reads a corrupted Parquet file with edge values Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058 --- M be/src/exec/parquet-column-readers.cc M be/src/runtime/timestamp-value.h M testdata/data/README A testdata/data/out_of_range_time_of_day.parquet M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test M testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test M tests/query_test/test_scanners.py 7 files changed, 49 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/11521/1 -- To view, visit http://gerrit.cloudera.org:8080/11521 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058 Gerrit-Change-Number: 11521 Gerrit-PatchSet: 1 Gerrit-Owner: Csaba Ringhofer <[email protected]>
