Taras Bobrovytsky has uploaded a new patch set (#4). Change subject: IMPALA-4363: Add Parquet timestamp validation ......................................................................
IMPALA-4363: Add Parquet timestamp validation Before this patch, we would simply read the INT96 Parquet timestamp representation and assume that it's valid. However, not all bit permutations represent a valid timestamp. One of the boost functions raised an exception (that we did't catch) when passed an invalid boost date object, which resulted in a crash. This patch fixes problem by validating that the timestamp falls into 1400..9999 date range as we are scanning Parquet. Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf --- M be/src/exec/parquet-column-readers.cc M be/src/exec/parquet-column-readers.h M be/src/runtime/timestamp-value.h M common/thrift/generate_error_codes.py M testdata/bad_parquet_data/README A testdata/bad_parquet_data/out-of-range-timestamp.parq M testdata/bin/create-load-data.sh M testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test M tests/query_test/test_scanners.py 10 files changed, 120 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/4 -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
