Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
......................................................................


Patch Set 11:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13
PS11, Line 13:   a) Ran the fuzz test on a private build without failures/crash.
It still fails sometimes - let's be clear in the commit message that this 
doesn't fix all the problems.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176:
extra indentation


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218
PS11, Line 218:
formatting is a little off (leftmost << should be aligned with each other).

There are a bunch of minor formatting issues. I'm not going to go through a 
point them all out individually. It's probably easiest to run the patch through 
clang-format-diff (see 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536) and 
visually check that it didn't do anything weird.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331
PS11, Line 331:     ss << stream_->filename() << " RCFile corrupt," << "out of 
bounds error num_rows_: "
Error message will not have a space after the comma. (here and at least one 
more place below).


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345
PS11, Line 345: key_buf_ptr
I don't think it's necessarily safe to print uint8_t* pointers with C++ stream 
formatting - I believe in some cases it's treated as a null-terminated string.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433
PS11, Line 433:         col_info.uncompressed_buffer_len: 1;
Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't that 
still overrun the buffer by 1 byte?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581
PS11, Line 581: row_group_sz
row_group_end? _sz makes me think this is a numeric size, which it isn't.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583
PS11, Line 583: col_sz
col_end?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587
PS11, Line 587:             << col_sz << "  greater than row_group_sz "<< 
row_group_sz;
Printing col_sz and row_group_sz is really dangerous here, since it's going to 
be interpreted as a null-terminated string. Maybe instead print (col_end - 
row_group_buffer_) and row_group_length_.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h@198
PS11, Line 198:   if (offset > size) return -1;
Shouldn't this be >=? Since buf[size] is one past the end of the buffer?


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc
File be/src/util/decompress.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@452
PS11, Line 452:       if (input_len < 0) {
Shouldn't we be checking if it's <= sizeof(uint32_t), since that's the amount 
we read unconditionally.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@454
PS11, Line 454:         ss << " Corruption snappy decomp input_len "<< 
input_len;
Can we make this consistent with the other error handling in this function by 
using a TErrorCode. Maybe something like SNAPPY_DECOMPRESS_TRUNCATED?



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 11
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Mar 2018 20:33:55 +0000
Gerrit-HasComments: Yes

Reply via email to