[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Reviewed-on: http://gerrit.cloudera.org:8080/8936
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 259 insertions(+), 72 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 21
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 20: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 20
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 18 May 2018 05:05:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2503/


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 20
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 18 May 2018 01:55:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-17 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 20: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 20
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 18 May 2018 01:55:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-09 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 19: Code-Review+2

Will wait for the other prerequisite to be merged.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 19
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 10 May 2018 00:48:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-08 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#19).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 259 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/19
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 19
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-08 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8936/18/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8936/18/be/src/runtime/timestamp-parse-util.cc@370
PS18, Line 370:   DCHECK_GT(len, 0);
> Please remove this, it's not valid. E.g. cast('' as timestamp).
Sure was an oversight should have removed it, thanks for pointing out.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 18
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 08 May 2018 22:31:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-08 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#18).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
10 files changed, 260 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/18
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 18
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-03 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#17).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
10 files changed, 258 insertions(+), 72 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/17
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 17
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-03 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@588
PS15, Line 588: }
> I think we should catch the invalid column metadata when we read it in GetC
Since the end of column is calculated here I am checking it here.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 16
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 03 May 2018 18:55:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-03 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#16).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
10 files changed, 259 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/16
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 16
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-05-03 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 15:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91
PS15, Line 91: excceeded
> typo: exceeded
Done


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@545
PS15, Line 545: if (max_tuples < 0) {
> It seems like this is catching the problem too late. How does it get into t
Done


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@370
PS15, Line 370:   if (UNLIKELY(str == NULL || len <= 0)) {
> We're already checking len here but the problem is that we're trimming whit
Introduced a DCHECK_GT(len, 0) and a check in the caller.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@477
PS15, Line 477: if (lazy_len > 0 && ParseFormatTokensByStr(_ctx)) {
> Looks like you hit upon a different bug here - I can trigger that DCHECK wi
I have removed the check



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 03 May 2018 18:44:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-27 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 15:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91
PS15, Line 91: excceeded
typo: exceeded


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@91
PS15, Line 91:  8 million limit"
Let's use the constant instead of hardcoding it in the error message so that 
they can't get out of sync.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@545
PS15, Line 545: if (max_tuples < 0) {
It seems like this is catching the problem too late. How does it get into this 
invalid state? The problem is in our own code so we shoudl fix it at the source 
rather than trying to catch it here.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/exec/hdfs-rcfile-scanner.cc@588
PS15, Line 588: if (col_end > row_group_end || column.start_offset < 0 
|| column.buffer_pos < 0
I think we should catch the invalid column metadata when we read it in 
GetCurrentKeyBuffer(), rather than letting things get into an invalid state and 
trying to catch it later.


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc
File be/src/runtime/timestamp-parse-util.cc:

http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@370
PS15, Line 370:   if (UNLIKELY(str == NULL || len <= 0)) {
We're already checking len here but the problem is that we're trimming 
whitespace below. I think the right fix is to check if (len <= 0) immediately 
after stripping leading and trailing whitespace. That will be less error-prone 
than trying to catch it later


http://gerrit.cloudera.org:8080/#/c/8936/15/be/src/runtime/timestamp-parse-util.cc@477
PS15, Line 477: if (lazy_len > 0 && ParseFormatTokensByStr(_ctx)) {
Looks like you hit upon a different bug here - I can trigger that DCHECK with 
another query -

  select cast(' ' as timestamp);

I don't think it should affect release builds, but can you file a separate bug 
to fix that? I think we should fix that separately and add a specific 
regression text.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 27 Apr 2018 23:27:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-25 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#15).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
10 files changed, 264 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/15
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 15
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-25 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#14).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/runtime/timestamp-parse-util.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
10 files changed, 258 insertions(+), 73 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/14
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 14
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-25 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG@14
PS13, Line 14:   with 100 iteration without failure.
> We should run it for more than 100 iterations before committing,  I think t
I ran the the test for around  200 iterations to see if it fails.


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@86
PS13, Line 86:   const int max_ncols = 800;
> Let's make this consistent with other constant - should be all caps and def
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@90
PS13, Line 90: ss << stream_->filename() << " Invalid file-header-metadata, 
number of columns "
> I think this error message is confusing. The metadata isn't invalid, the pr
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@363
PS13, Line 363:   ss << stream_->filename() << " Bad rowcolumn index: " << 
col_idx;
> Is there meant to be a space between row and column? It seems like we shoul
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc@56
PS13, Line 56:   if (codegen == nullptr) {
> This DCHECK should be valid - why did this need to be changed? If we're hit
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py@198
PS13, Line 198: elif 'corrupt' in str(e).lower():
> In theory we should just be skipping over corrupt files with abort_on_error
Done



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 13
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 25 Apr 2018 16:59:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 13:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG@14
PS13, Line 14:   with 100 iteration without failure.
We should run it for more than 100 iterations before committing,  I think 
there's still some chance of a rare failure slipping through otherwise.


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@86
PS13, Line 86:   const int max_ncols = 800;
Let's make this consistent with other constant - should be all caps and defined 
up the top of the file.


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@90
PS13, Line 90: ss << stream_->filename() << " Invalid file-header-metadata, 
number of columns "
I think this error message is confusing. The metadata isn't invalid, the 
problem is that we've imposed a limit on the maximum number of columns that 
Impala can handle. I think the error message needs to be rewritten to explain 
the limit. It should also include the value of the limit.


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@363
PS13, Line 363:   ss << stream_->filename() << " Bad rowcolumn index: " << 
col_idx;
Is there meant to be a space between row and column? It seems like we should 
also say something relating to corruption so that it's less cryptic. E.g. 
"Corrupt column at index: "


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc@56
PS13, Line 56:   if (codegen == nullptr) {
This DCHECK should be valid - why did this need to be changed? If we're hitting 
the DCHECK it indicates a bug with how we set up codegen and we need to fix the 
bug rather than remove the DCHECK.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h@198
PS11, Line 198:   // Buffer access out of bounds.
> Shouldn't this be >=? Since buf[size] is one past the end of the buffer?
Looks like you missed this comment.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc
File be/src/util/decompress.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@452
PS11, Line 452:   if (input_len < 0) {
> Shouldn't we be checking if it's <= sizeof(uint32_t), since that's the amou
Looks like you missed this comment.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@454
PS11, Line 454: ss << " Corruption snappy decomp input_len " << 
input_len;
> Can we make this consistent with the other error handling in this function
Looks like you missed this comment.


http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py@198
PS13, Line 198: elif 'corrupt' in str(e).lower():
In theory we should just be skipping over corrupt files with abort_on_error=1. 
This doesn't always happen for a lot of file formats. We should just add rc and 
sequence file to the list of formats below on l207 that don't implement this 
skipping behaviour, rather than trying to guess based on the error message.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 13
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 11 Apr 2018 23:27:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-01 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#13).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 100 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 249 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/13
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 13
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-04-01 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13
PS11, Line 13:   a) Ran the fuzz test on a private build without failures/crash.
> It still fails sometimes - let's be clear in the commit message that this d
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176: 
> extra indentation
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176:
> extra indentation
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218
PS11, Line 218:
> formatting is a little off (leftmost << should be aligned with each other).
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331
PS11, Line 331: ss << stream_->filename() << " RCFile corrupt," << "out of 
bounds error num_rows_: "
> Error message will not have a space after the comma. (here and at least one
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345
PS11, Line 345: key_buf_ptr
> I don't think it's necessarily safe to print uint8_t* pointers with C++ str
no more printing uint8_t* pointers, and removed  Impala specific message.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433
PS11, Line 433: col_info.uncompressed_buffer_len: 1;
> Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't
replaced this with col_info.buffer_len which sets an upper bound that  should 
be checked against.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581
PS11, Line 581: row_group_sz
> row_group_end? _sz makes me think this is a numeric size, which it isn't.
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583
PS11, Line 583: col_sz
> col_end?
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587
PS11, Line 587: << col_sz << "  greater than row_group_sz "<< 
row_group_sz;
> Printing col_sz and row_group_sz is really dangerous here, since it's going
removed printing of char* and replaced the Impala specific error, the error is 
more generic now w.r.t RCFIle


http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc@362
PS12, Line 362:   if (remain_len <= 0) {
  : stringstream ss;
  : ss << stream_->filename() << " RCFile corrupt, out of 
bounds error"
  :   " GetCurrentKeyBuffer col_idx: " << col_idx << " 
remaining length: " << remain_len
  :   << " col_info.buffer_len: " << col_info.buffer_len;
  : return Status(ss.str());
  :   }
  :
  :   DCHECK_GT(remain_len, 0);
  :   int bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, 
_info.buffer_len, remain_len);
  :   if (bytes_read == -1) {
  : stringstream ss;
  : ss << stream_->filename() << " RCFile corrupt, out of 
bounds error"
  :   " GetCurrentKeyBuffer col_idx: " << col_idx << " 
remaining length: " << remain_len
  :   << " col_info.buffer_len: " << col_info.buffer_len;
  : return Status(ss.str());
  :   }
  :   *key_buf_ptr += bytes_read;
  :   remain_len -= bytes_read;
  :   DCHECK_GT(remain_len, 0);
  :
  :   bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, 
_info.uncompressed_buffer_len,
  :   remain_len);
  :   if (bytes_read == -1) {
  : stringstream ss;
  : ss << stream_->filename() << " RCFile corrupt, out of 
bounds error key buffer "
  :   "col_idx: " << col_idx << " remaining length: "<< 
remain_len;
  : return Status(ss.str());
  :   }
  :   *key_buf_ptr += bytes_read;
  :   remain_len -= bytes_read;
  :   DCHECK_GT(remain_len, 0);
  :
  :   int col_key_buf_len;
  :   bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr , 
_key_buf_len, remain_len);
  :   if (bytes_read == -1) {
  : stringstream ss;
  : ss << stream_->filename() << " RCFile corrupt, out of 
bounds error key "
  :   "buffer col_idx: " << 

[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 12:

> Uploaded patch set 12.

Tim please ignore these changes.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 12
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 23:35:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#12).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 242 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/12
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 12
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 11:

I also tried running exhaustive tests on the patch. I'm seeing at least one 
failure: test_hdfs_rcfile_scan_node_errors


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 11
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 20:49:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 11:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13
PS11, Line 13:   a) Ran the fuzz test on a private build without failures/crash.
It still fails sometimes - let's be clear in the commit message that this 
doesn't fix all the problems.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176:
extra indentation


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218
PS11, Line 218:
formatting is a little off (leftmost << should be aligned with each other).

There are a bunch of minor formatting issues. I'm not going to go through a 
point them all out individually. It's probably easiest to run the patch through 
clang-format-diff (see 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536) and 
visually check that it didn't do anything weird.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331
PS11, Line 331: ss << stream_->filename() << " RCFile corrupt," << "out of 
bounds error num_rows_: "
Error message will not have a space after the comma. (here and at least one 
more place below).


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345
PS11, Line 345: key_buf_ptr
I don't think it's necessarily safe to print uint8_t* pointers with C++ stream 
formatting - I believe in some cases it's treated as a null-terminated string.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433
PS11, Line 433: col_info.uncompressed_buffer_len: 1;
Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't that 
still overrun the buffer by 1 byte?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581
PS11, Line 581: row_group_sz
row_group_end? _sz makes me think this is a numeric size, which it isn't.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583
PS11, Line 583: col_sz
col_end?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587
PS11, Line 587: << col_sz << "  greater than row_group_sz "<< 
row_group_sz;
Printing col_sz and row_group_sz is really dangerous here, since it's going to 
be interpreted as a null-terminated string. Maybe instead print (col_end - 
row_group_buffer_) and row_group_length_.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h@198
PS11, Line 198:   if (offset > size) return -1;
Shouldn't this be >=? Since buf[size] is one past the end of the buffer?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc
File be/src/util/decompress.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@452
PS11, Line 452:   if (input_len < 0) {
Shouldn't we be checking if it's <= sizeof(uint32_t), since that's the amount 
we read unconditionally.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@454
PS11, Line 454: ss << " Corruption snappy decomp input_len "<< 
input_len;
Can we make this consistent with the other error handling in this function by 
using a TErrorCode. Maybe something like SNAPPY_DECOMPRESS_TRUNCATED?



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 11
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 20:33:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#11).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
8 files changed, 233 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/11
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 11
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-13 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8936/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/10//COMMIT_MSG@9
PS10, Line 9: Enabled the fuzz test for Sequence and RCFiles and added new 
checks to return
> Commit message is stale.
Fixed it.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 10
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 15:38:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 10:

I need to take another pass over this just to convince myself I didn't miss any 
bugs first time over.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 10
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 01:02:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-12 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8936/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/10//COMMIT_MSG@9
PS10, Line 9: Enabled the fuzz test for Sequence and RCFiles and added new 
checks to return
Commit message is stale.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 10
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 13 Mar 2018 01:02:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-08 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#10).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
8 files changed, 233 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/10
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 10
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-08 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#9).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
8 files changed, 234 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/9
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 9
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-08 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#8).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 234 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/8
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 8
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-07 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 7:

> I'm still seeing crashes when I loop this locally:
 > F0306 16:34:04.306205 29911 mem-pool.h:245] Check failed: size >= 0
 > (-9 vs. 0)
 > *** Check failure stack trace: ***
 > @  0x3c32cbd  google::LogMessage::Fail()
 > @  0x3c34562  google::LogMessage::SendToLog()
 > @  0x3c32697  google::LogMessage::Flush()
 > @  0x3c35c5e  google::LogMessageFatal::~LogMessageFatal()
 > @  0x184eeb4  impala::MemPool::TryAllocate()
 > @  0x1be5d08  impala::HdfsRCFileScanner::StartRowGroup()
 > @  0x1be8032  impala::HdfsRCFileScanner::ProcessRange()
 > @  0x296931a  impala::BaseSequenceScanner::GetNextInternal()
 > @  0x1bc71b2  impala::HdfsScanner::ProcessSplit()
 > @  0x1b9f48c  impala::HdfsScanNode::ProcessSplit()
 > @  0x1b9e8b7  impala::HdfsScanNode::ScannerThread()
 > @  0x1b9dd3f  
 > _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv
 > @  0x1b9fcdd  
 > _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
 > @  0x17f4ce0  boost::function0<>::operator()()
 > @  0x1af63ff  impala::Thread::SuperviseThread()
 > @  0x1aff3cf  boost::_bi::list5<>::operator()<>()
 > @  0x1aff2f3  boost::_bi::bind_t<>::operator()()
 > @  0x1aff2b6  boost::detail::thread_data<>::run()
 > @  0x2dbc1aa  thread_proxy
 > @ 0x7f5af8f926ba  start_thread
 > @ 0x7f5af8cc841d  clone
 > Picked up JAVA_TOOL_OPTIONS: 
 > -agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n
 > Wrote minidump to 
 > /home/tarmstrong/Impala/incubator-impala/logs/cluster/minidumps/impalad/31a8ae22-6a2a-4118-4ff4e1b6-be18ef7c.dmp
 >
 > How about we leave the test disabled and move forward with the
 > improvements, then we can finish up the remaining fixes needed
 > later?

Shall I revert my changes to tests/query_test/test_scanners_fuzz.py ?


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 7
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 07 Mar 2018 18:56:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 7:

I'm still seeing crashes when I loop this locally:
 F0306 16:34:04.306205 29911 mem-pool.h:245] Check failed: size >= 0 (-9 vs. 0)
*** Check failure stack trace: ***
@  0x3c32cbd  google::LogMessage::Fail()
@  0x3c34562  google::LogMessage::SendToLog()
@  0x3c32697  google::LogMessage::Flush()
@  0x3c35c5e  google::LogMessageFatal::~LogMessageFatal()
@  0x184eeb4  impala::MemPool::TryAllocate()
@  0x1be5d08  impala::HdfsRCFileScanner::StartRowGroup()
@  0x1be8032  impala::HdfsRCFileScanner::ProcessRange()
@  0x296931a  impala::BaseSequenceScanner::GetNextInternal()
@  0x1bc71b2  impala::HdfsScanner::ProcessSplit()
@  0x1b9f48c  impala::HdfsScanNode::ProcessSplit()
@  0x1b9e8b7  impala::HdfsScanNode::ScannerThread()
@  0x1b9dd3f  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv
@  0x1b9fcdd  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x17f4ce0  boost::function0<>::operator()()
@  0x1af63ff  impala::Thread::SuperviseThread()
@  0x1aff3cf  boost::_bi::list5<>::operator()<>()
@  0x1aff2f3  boost::_bi::bind_t<>::operator()()
@  0x1aff2b6  boost::detail::thread_data<>::run()
@  0x2dbc1aa  thread_proxy
@ 0x7f5af8f926ba  start_thread
@ 0x7f5af8cc841d  clone
Picked up JAVA_TOOL_OPTIONS: 
-agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n
Wrote minidump to 
/home/tarmstrong/Impala/incubator-impala/logs/cluster/minidumps/impalad/31a8ae22-6a2a-4118-4ff4e1b6-be18ef7c.dmp

How about we leave the test disabled and move forward with the improvements, 
then we can finish up the remaining fixes needed later?


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 7
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Wed, 07 Mar 2018 01:24:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-02-09 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/hdfs-rcfile-scanner.cc@119
PS5, Line 119: ss << stream_->filename() << " Invalid 
RCFILE_VERSION_HEADER: '"
> I meant the filename of the file being decoded, i.e. stream_->filename().
Done


http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/hdfs-rcfile-scanner.cc@119
PS5, Line 119: ss << stream_->filename() << " Invalid 
RCFILE_VERSION_HEADER: '"
> It seems like the message text should be enough to distinguish, right?
Done


http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/scanner-context.inline.h
File be/src/exec/scanner-context.inline.h:

http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/scanner-context.inline.h@96
PS5, Line 96:   }
> Is this possible? The intent of the above code is that it shouldn't go nega
Done


http://gerrit.cloudera.org:8080/#/c/8936/5/be/src/exec/scanner-context.inline.h@97
PS5, Line 97:
> This will crash since the template for the error message in common/thrift/g
This is not needed as you've suggested above I have introduced the DCHECK in 
the beginning to check that length is never negative



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Sat, 10 Feb 2018 01:01:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-02-09 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#6).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 242 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/6
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 6
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-24 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#5).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  a) Ran the fuzz test on a private build without failures/crash.
  b) Ran end to end test, back end tests with new changes.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 221 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/5
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 5
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-19 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#4).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  Ran the fuzz test on a private build without failures/crash.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/exec/scanner-context.inline.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
9 files changed, 218 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/4
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 3:

I hit a DCHECK after running this in a loop for a couple minutes.

  F0118 11:53:48.126408 11019 scanner-context.inline.h:97] Check failed: 
bytes_left > 0 (-197 vs. 0)
*** Check failure stack trace: ***
@  0x3c0541d  google::LogMessage::Fail()
@  0x3c06cc2  google::LogMessage::SendToLog()
@  0x3c04df7  google::LogMessage::Flush()
@  0x3c083be  google::LogMessageFatal::~LogMessageFatal()
@  0x1bc5c8b  impala::ScannerContext::Stream::SkipBytes()
@  0x1bc320a  impala::HdfsRCFileScanner::ReadColumnBuffers()
@  0x1bc1a71  impala::HdfsRCFileScanner::StartRowGroup()
@  0x1bc38ba  impala::HdfsRCFileScanner::ProcessRange()
@  0x293ecaa  impala::BaseSequenceScanner::GetNextInternal()
@  0x1ba2ede  impala::HdfsScanner::ProcessSplit()
@  0x1b7b1b4  impala::HdfsScanNode::ProcessSplit()
@  0x1b7a5df  impala::HdfsScanNode::ScannerThread()
@  0x1b79a62  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_17ThreadResourceMgr12ResourcePoolEENKUlvE_clEv
@  0x1b7ba05  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_17ThreadResourceMgr12ResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x17d6bda  boost::function0<>::operator()()
@  0x1ad0bf7  impala::Thread::SuperviseThread()
@  0x1ad990c  boost::_bi::list4<>::operator()<>()
@  0x1ad984f  boost::_bi::bind_t<>::operator()()
@  0x1ad9812  boost::detail::thread_data<>::run()
@  0x2d8edba  thread_proxy
@ 0x7f4c344426ba  start_thread
@ 0x7f4c341783dd  clone
Picked up JAVA_TOOL_OPTIONS: 
-agentlib:jdwp=transport=dt_socket,address=3,server=y,suspend=n
Wrote minidump to 
/home/tarmstrong/Impala/incubator-impala/logs/cluster/minidumps/impalad/aadeb6a5-be5b-4a86-c5991d87-fa93553b.dmp


-- 
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 18 Jan 2018 19:55:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-18 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@329
PS3, Line 329:   << " key_length_: " << key_length_;
Maybe also include the file name in the messages? So that if this happens in 
the wild we can figure out which file is corrupt.


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@334
PS3, Line 334:   int col_idx = 0;
It doesn't look like moving the col_idx declaration out of the loop is needed, 
right?


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@337
PS3, Line 337: key_length_
Passing key_length_ seems wrong, since it measures the number of bytes in 
key_buffer, not the bytes remaining after key_buf_ptr.


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@464
PS3, Line 464: if (column.uncompressed_buffer_len + column.start_offset > 
row_group_length_) {
Can you add a brief comment explaining what this is checking for? E.g. "Ensure 
there is enough space in 'row_group_buffer_' for the uncompressed data".


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h
File be/src/exec/hdfs-sequence-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h@257
PS3, Line 257: next_record_in_compressed_block_len
Needs an underscore in the end of the variable name.


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc@235
PS3, Line 235:   ss << "SeqFile corrupted, record_locations_.size() " << 
record_locations_.size()
Would be good to include file names so it's easier to root cause in the field.


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc
File be/src/exec/read-write-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc@115
PS3, Line 115: }
How about we add a few unit tests to make sure that passing in a too-short 
buffer results in an error?


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@58
PS3, Line 58:   /// If the size byte is corrupted then return -1;
Can you document the size argument?


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@72
PS3, Line 72:   static int GetVLong(uint8_t* buf, int64_t offset, int64_t* 
vlong, int32_t size);
Can you document the size argument?


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@198
PS3, Line 198:   if (len > MAX_VINT_LEN) return -1;
I think we can check len vs size here and save doing it in the loop below. 
Would be clearer to have the two length checks in the same place.


http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py@203
PS3, Line 203: elif 'corrupt' in str(e).lower():
The previous two aren't needed, right, since 'corrupt' is a substring of those?



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Thu, 18 Jan 2018 19:52:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-16 Thread Pranay Singh (Code Review)
Hello anujphadke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8936

to look at the new patch set (#3).

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  Ran the fuzz test on a private build without failures/crash.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-rcfile-scanner.h
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/exec/hdfs-sequence-scanner.h
M be/src/exec/read-write-util-test.cc
M be/src/exec/read-write-util.h
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
8 files changed, 160 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/3
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-16 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@177
PS2, Line 177: ss << "Codec bad, corrupted ";
> Can you include a bit more detail, i.e. mention that it's RCFile and includ
Done


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@337
PS2, Line 337: ss << "Invalid bytes read col_idx: " << col_idx;
> This could also do with a bit more detail.
Added more detail to describe the error


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@344
PS2, Line 344: void HdfsRCFileScanner::GetCurrentKeyBuffer(int col_idx, bool 
skip_col_data,
> How does this avoid buffer overflows if we don't pass in the length of the
Added length of the buffer and DCHECK to prevent reading beyond the buffer size.


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@348
PS2, Line 348: GetVInt
> These GetVInt() and GetVLong() interfaces seems fundamentally unsafe - they
Changed all callers of GetVInt/GetVLong to pass length of the buffer which can 
be used to check out of bound access.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 16 Jan 2018 21:10:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@177
PS2, Line 177: ss << "Codec bad, corrupted ";
Can you include a bit more detail, i.e. mention that it's RCFile and include 
the name of the codec?


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@337
PS2, Line 337: ss << "Invalid bytes read col_idx: " << col_idx;
This could also do with a bit more detail.


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@344
PS2, Line 344: void HdfsRCFileScanner::GetCurrentKeyBuffer(int col_idx, bool 
skip_col_data,
How does this avoid buffer overflows if we don't pass in the length of the 
'key_buf_ptr' buffer? I think this should be defensively checking if it reads 
past the end of the buffer.


http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@348
PS2, Line 348: GetVInt
These GetVInt() and GetVLong() interfaces seems fundamentally unsafe - they 
don't take the length of the buffer as input! I think we might need to change 
them so that the length of the buffer is passed in and they check the bounds.



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 01:47:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

It's ok if we don't have a reproducible test case for bugs caught by the 
fuzzer, unless the fuzzing revealed a really big hole in test coverage. It's 
good if we have a test case but for cases where we weren't handling corrupt 
data correctly there are so many ways files could be corrupted it wasn't 
feasible to have a standalone test for each case.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 17:41:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-08 Thread Pranay Singh (Code Review)
Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

> Can we test all the cases fixed this with 100% reproducible cases?
 > For Ex:
 > I commented out the change in  decompress.cc and ran the fuzz test
 > and it passed.

 > Can we test all the cases fixed this with 100% reproducible cases?
 > For Ex:
 > I commented out the change in  decompress.cc and ran the fuzz test
 > and it passed.

If you run a couple of times without the fix the assertion will be hit. I'll 
see if I can change the test so that we hit the assert every time


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 16:59:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-05 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..


Patch Set 2:

Can we test all the cases fixed this with 100% reproducible cases?
For Ex:
I commented out the change in  decompress.cc and ran the fuzz test and it 
passed.


--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 2
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 20:45:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

2018-01-03 Thread Pranay Singh (Code Review)
Pranay Singh has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8936


Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
..

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Enabled the fuzz test for Sequence and RCFiles and added new checks to return
when failure is encountered while reading an RC file or sequence file.

Testing:
  Ran the fuzz test on a private build without failures/crash.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
---
M be/src/exec/hdfs-rcfile-scanner.cc
M be/src/exec/hdfs-sequence-scanner.cc
M be/src/util/decompress.cc
M tests/query_test/test_scanners_fuzz.py
4 files changed, 25 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/36/8936/1
--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 1
Gerrit-Owner: Pranay Singh