Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4363: Add Parquet timestamp validation
......................................................................


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 77: 
> I'll remove the empty lines I accidentally added here and in other files.
Done


Line 233:     // TODO: consider adding validation for other types, such as 
DECIMAL.
> Let's file a JIRA instead. You don't need to identify whether there is a bu
Done


Line 388:               ReadSlot<IS_DICT_ENCODED>(tuple, pool);
> single line if it fits
Done


Line 475:     if (NeedsValidation() && 
!ValidateSlot(reinterpret_cast<T*>(slot))) {
> The API seems a little clunky because we call a generic ValidateSlot() but 
Added ErrorMsg* to ValidateSlot.


Line 503:     DCHECK(!needs_conversion_);
> DCHECK(false)?
This DCHECK should be removed completely.


Line 513:   /// Sets the slot to NULL if the slot value is invalid, e.g., due 
to being
> Update comment, doesn't set anything
Done


Line 515:   bool ValidateSlot(T* src) {
> const function
Done


Line 520:   /// Pull out slow-path Status construction code from 
ReadRepetitionLevel()/
> remove the last part "from ReadRepetitionLevel..." I don't think that's acc
Done


Line 1084:   void* slot = tuple->GetSlot(tuple_offset_);
> just inline into L1087
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

Line 511:   /// Recursively reads from children_ to assemble a single 
CollectionValue into
> Update comment to reflect API change
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 22: 
> remove blank lines
Done


Line 150:   inline bool IsValidDate() const {
> nice
:)


http://gerrit.cloudera.org:8080/#/c/4968/4/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 312:    "The valid date range is 1400..9999."),
> mention the full range with year-month-day to make it clearer
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

Line 382:   hadoop fs -put -f 
${IMPALA_HOME}/testdata/bad_parquet_data/out-of-range-timestamp.parq \
> Let's avoid doing this and instead create a test table on-the-fly in the te
Done


http://gerrit.cloudera.org:8080/#/c/4968/4/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 248:   def test_zero_rows(self, vector, unique_database):
> it would be nice to have a new test test_timestamp_out_of_range that looks 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Taras Bobrovytsky <[email protected]>
Gerrit-HasComments: Yes

Reply via email to