[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-10 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: if (NeedsValidation() && !ValidateSlot(val_ptr).ok()) {
> I don't think this will behave quite right with abort_on_error=false. Retur
I rewrote this section, and added a comment.


Line 492:   /// Similar to NeedsCoversion, most column readers never require 
validation, so
> NeedsConversion()
Done


Line 507:   /// Ensures the data is valid. If it is discovered that data is not 
valid,
> Sets the slot to NULL if the slot value is invalid, e.g., due to being out 
Done


Line 510: DCHECK(false);
> If we ever hit this by accident in a release build we might not want to ret
I changed the interface, but False is returned instead of true now.


Line 594:   // Conversion should only happen when this flag is enabled.
> I'll delete this comment.
Done


Line 599:   return Status::OK();
> Setting a slot to NULL means flipping the null bit in the containing tuple.
Fixed. Null is being set properly now.


http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 146:   bool Validate();
> Can you change this to something like:
Done


http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 307:   ("PARQUET_TIMESTAMP_INVALID", 100,
> PARQUET_TIMESTAMP_OUT_OF_RANGE?
Done


Line 308:"File '$0' column '$1' contains invalid timestamp."),
> "Invalid" should be qualified a little more. The value is actually out of r
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-10 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#3).

Change subject: Preview: IMPALA-4363: Add timestamp validation
..

Preview: IMPALA-4363: Add timestamp validation

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
M testdata/bad_parquet_data/README
A testdata/bad_parquet_data/out-of-range-timestamp.parq
M testdata/bin/create-load-data.sh
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-abort-on-error.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test
M tests/query_test/test_scanners.py
11 files changed, 113 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 308:"File '$0' column '$1' contains invalid timestamp."),
> We generally avoid dumping data for security reasons. It's hard to redact a
Thanks for the explanation, makes a lot of sense.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-08 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 308:"File '$0' column '$1' contains invalid timestamp."),
> I think if anyhow possible it would be nice to output the invalid value its
We generally avoid dumping data for security reasons. It's hard to redact and 
can lead to very uncomfortable conversations with users.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 510: DCHECK(false);
If we ever hit this by accident in a release build we might not want to return 
OK() but some error making it clear what went wrong.


http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 308:"File '$0' column '$1' contains invalid timestamp."),
> "Invalid" should be qualified a little more. The value is actually out of r
I think if anyhow possible it would be nice to output the invalid value itself, 
too, to make it easier to locate.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: if (NeedsValidation() && !ValidateSlot(val_ptr).ok()) {
> single line if it fits
I don't think this will behave quite right with abort_on_error=false. Returning 
false here will terminate the scan immediately.


Line 599:   return Status::OK();
> It's set to NULL in the implementation of 
Setting a slot to NULL means flipping the null bit in the containing tuple. I 
don't think you're doing that from within TimestampValue::Validate(), and it 
wouldn't make sense to do it there.

Looking a little closer at the code, I think we may have to move this check 
into ReadValue() instead of ReadSlot(). 
We need to do the following if validation fails:
tuple->SetNull(null_indicator_offset_);


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 599:   return Status::OK();
> Where is the slot set to NULL?
It's set to NULL in the implementation of 
bool TimestampValue::Validate()

Do you think it would be better to set it to NULL here?


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 473: if (NeedsValidation() && !ValidateSlot(val_ptr).ok()) {
single line if it fits

should this be applied before or after the timestamp conversion? add comment


Line 492:   /// Similar to NeedsCoversion, most column readers never require 
validation, so
NeedsConversion()


Line 507:   /// Ensures the data is valid. If it is discovered that data is not 
valid,
Sets the slot to NULL if the slot value is invalid, e.g., due to being out of 
the valid value range.


Line 599:   return Status::OK();
Where is the slot set to NULL?


http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/runtime/timestamp-value.h
File be/src/runtime/timestamp-value.h:

Line 146:   bool Validate();
Can you change this to something like:

inline bool IsValidDate() const {
  // Implementation goes here, so it can be inlined.
}

Needs brief comment, ideally, containing text describing the min/max values.

Also, better not change *this, i.e.,a void having side effects.


http://gerrit.cloudera.org:8080/#/c/4968/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

Line 307:   ("PARQUET_TIMESTAMP_INVALID", 100,
PARQUET_TIMESTAMP_OUT_OF_RANGE?


Line 308:"File '$0' column '$1' contains invalid timestamp."),
"Invalid" should be qualified a little more. The value is actually out of 
range. Might be good to include the min/max range in the error message.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4968/2/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 594:   // Conversion should only happen when this flag is enabled.
I'll delete this comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 44: 
> ?
Done


Line 491:   inline bool NeedsValidation() const {
> Can you add a brief comment?
Done


Line 502:   /// Verifies that data is acceptable
> Can you elaborate what "acceptable" means?
Rewrote the comment


http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 101: void TimestampValue::Validate() {
> Can you move the exception handling into DebugString()? This way it would b
I rewrote this so that it does not rely on catching an exception as Alex 
suggested.


PS1, Line 103:
> Nit: tab. git-clang-format should fix this.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new patch set (#2).

Change subject: Preview: IMPALA-4363: Add timestamp validation
..

Preview: IMPALA-4363: Add timestamp validation

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
M common/thrift/generate_error_codes.py
4 files changed, 60 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 1:

Overall structure looks good ok to me, minus Lars' comments.

Ideally, we'd have a cheaper way to validate other than doing DebugString() 
with a try-catch. It should be possible for us to determine the acceptable 
integer value range and check that.

-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: Preview: IMPALA-4363: Add timestamp validation
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

Line 44: 
?


Line 491:   inline bool NeedsValidation() const {
Can you add a brief comment?


Line 502:   /// Verifies that data is acceptable
Can you elaborate what "acceptable" means?


http://gerrit.cloudera.org:8080/#/c/4968/1/be/src/runtime/timestamp-value.cc
File be/src/runtime/timestamp-value.cc:

Line 101: void TimestampValue::Validate() {
Can you move the exception handling into DebugString()? This way it would be 
more obvious to see there, that we handle the exception. And we would be less 
prone to adding new calls to DebugString() that are not protected by Validate.


PS1, Line 103:
Nit: tab. git-clang-format should fix this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] Preview: IMPALA-4363: Add timestamp validation

2016-11-05 Thread Taras Bobrovytsky (Code Review)
Taras Bobrovytsky has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4968

Change subject: Preview: IMPALA-4363: Add timestamp validation
..

Preview: IMPALA-4363: Add timestamp validation

Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
---
M be/src/exec/parquet-column-readers.cc
M be/src/runtime/timestamp-value.cc
M be/src/runtime/timestamp-value.h
3 files changed, 36 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/4968/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I9988449aa0dc0f39fabb91ce6cce0dd8a06e8bcf
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky