Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11521
to look at the new patch set (#3).
Change subject: IMPALA-7595: Check the validity of the time part of Parquet
timestamps
......................................................................
IMPALA-7595: Check the validity of the time part of Parquet timestamps
Before this fix Impala did not check whether a timestamp's time part
is out of the valid [0, 24 hour) range when reading Parquet files,
so these timestamps were memcopied as they were to slots, leading to
results like:
1970-01-01 -00:00:00.000000001
1970-01-01 24:00:00
Different parts of Impala treat these timestamp differently:
- string conversion leads to invalid representation that cannot be
converted back to timestamp
- timezone conversions handle the overflowing time part and give
a valid timestamp result (at least since CCTZ, I did not check
older versions of Impala)
- Parquet writing inserts these timestamp as they are, so the
resulting Parquet file will also contain corrupt timestamps
The fix adds a check that converts these corrupt timestamps to NULL,
similarly to the handling of timestamp outside the [1400..10000)
range. The same error is returned in both cases - it may make sense
to add a new error message for this kind of timestamp, but as this
error did not occur in production (as far as I know), I thought
that separate error message is not necessary.
Testing:
- added a new scanner test that reads a corrupted Parquet file
with edge values
Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
---
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/timestamp-value.h
M testdata/data/README
A testdata/data/out_of_range_time_of_day.parquet
M
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-abort-on-error.test
M
testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-continue-on-error.test
M tests/query_test/test_scanners.py
7 files changed, 45 insertions(+), 2 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/11521/3
--
To view, visit http://gerrit.cloudera.org:8080/11521
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibc0ae651b6a0a028c61a15fd069ef9e904231058
Gerrit-Change-Number: 11521
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>