Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@177 PS2, Line 177: ss << "Codec bad, corrupted "; Can you include a bit more detail, i.e. mention that it's RCFile and include the name of the codec? http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@337 PS2, Line 337: ss << "Invalid bytes read col_idx: " << col_idx; This could also do with a bit more detail. http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@344 PS2, Line 344: void HdfsRCFileScanner::GetCurrentKeyBuffer(int col_idx, bool skip_col_data, How does this avoid buffer overflows if we don't pass in the length of the 'key_buf_ptr' buffer? I think this should be defensively checking if it reads past the end of the buffer. http://gerrit.cloudera.org:8080/#/c/8936/2/be/src/exec/hdfs-rcfile-scanner.cc@348 PS2, Line 348: GetVInt These GetVInt() and GetVLong() interfaces seems fundamentally unsafe - they don't take the length of the buffer as input! I think we might need to change them so that the length of the buffer is passed in and they check the bounds. -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 2 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-Comment-Date: Tue, 09 Jan 2018 01:47:27 +0000 Gerrit-HasComments: Yes
