Taras Bobrovytsky has posted comments on this change. Change subject: IMPALA-4363: Add Parquet timestamp validation ......................................................................
Patch Set 4: (15 comments) http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: Line 77: > I'll remove the empty lines I accidentally added here and in other files. Done Line 233: // TODO: consider adding validation for other types, such as DECIMAL. > Let's file a JIRA instead. You don't need to identify whether there is a bu Done Line 388: ReadSlot<IS_DICT_ENCODED>(tuple, pool); > single line if it fits Done Line 475: if (NeedsValidation() && !ValidateSlot(reinterpret_cast<T*>(slot))) { > The API seems a little clunky because we call a generic ValidateSlot() but Added ErrorMsg* to ValidateSlot. Line 503: DCHECK(!needs_conversion_); > DCHECK(false)? This DCHECK should be removed completely. Line 513: /// Sets the slot to NULL if the slot value is invalid, e.g., due to being > Update comment, doesn't set anything Done Line 515: bool ValidateSlot(T* src) { > const function Done Line 520: /// Pull out slow-path Status construction code from ReadRepetitionLevel()/ > remove the last part "from ReadRepetitionLevel..." I don't think that's acc Done Line 1084: void* slot = tuple->GetSlot(tuple_offset_); > just inline into L1087 Done http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: Line 511: /// Recursively reads from children_ to assemble a single CollectionValue into > Update comment to reflect API change Done http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: Line 22: > remove blank lines Done Line 150: inline bool IsValidDate() const { > nice :) http://gerrit.cloudera.org:8080/#/c/4968/4/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 312: "The valid date range is 1400..9999."), > mention the full range with year-month-day to make it clearer Done http://gerrit.cloudera.org:8080/#/c/4968/4/testdata/bin/create-load-data.sh File testdata/bin/create-load-data.sh: Line 382: hadoop fs -put -f ${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq \ > Let's avoid doing this and instead create a test table on-the-fly in the te Done http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 248: def test_zero_rows(self, vector, unique_database): > it would be nice to have a new test test_timestamp_out_of_range that looks Done -- To view, visit http://gerrit.cloudera.org:8080/4968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
