Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 11: (12 comments) http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13 PS11, Line 13: a) Ran the fuzz test on a private build without failures/crash. It still fails sometimes - let's be clear in the commit message that this doesn't fix all the problems. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176 PS11, Line 176: extra indentation http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218 PS11, Line 218: formatting is a little off (leftmost << should be aligned with each other). There are a bunch of minor formatting issues. I'm not going to go through a point them all out individually. It's probably easiest to run the patch through clang-format-diff (see https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536) and visually check that it didn't do anything weird. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331 PS11, Line 331: ss << stream_->filename() << " RCFile corrupt," << "out of bounds error num_rows_: " Error message will not have a space after the comma. (here and at least one more place below). http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345 PS11, Line 345: key_buf_ptr I don't think it's necessarily safe to print uint8_t* pointers with C++ stream formatting - I believe in some cases it's treated as a null-terminated string. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433 PS11, Line 433: col_info.uncompressed_buffer_len: 1; Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't that still overrun the buffer by 1 byte? http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581 PS11, Line 581: row_group_sz row_group_end? _sz makes me think this is a numeric size, which it isn't. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583 PS11, Line 583: col_sz col_end? http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587 PS11, Line 587: << col_sz << " greater than row_group_sz "<< row_group_sz; Printing col_sz and row_group_sz is really dangerous here, since it's going to be interpreted as a null-terminated string. Maybe instead print (col_end - row_group_buffer_) and row_group_length_. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/read-write-util.h@198 PS11, Line 198: if (offset > size) return -1; Shouldn't this be >=? Since buf[size] is one past the end of the buffer? http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc File be/src/util/decompress.cc: http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@452 PS11, Line 452: if (input_len < 0) { Shouldn't we be checking if it's <= sizeof(uint32_t), since that's the amount we read unconditionally. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/util/decompress.cc@454 PS11, Line 454: ss << " Corruption snappy decomp input_len "<< input_len; Can we make this consistent with the other error handling in this function by using a TErrorCode. Maybe something like SNAPPY_DECOMPRESS_TRUNCATED? -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 11 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-Comment-Date: Tue, 13 Mar 2018 20:33:55 +0000 Gerrit-HasComments: Yes
