Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 4: (11 comments) http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@329 PS3, Line 329: int bytes_read = ReadWriteUtil::GetVInt(key_buf_ptr, &num_rows_, key_length_); > Maybe also include the file name in the messages? So that if this happens i Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@334 PS3, Line 334: << " key_length_: " << key_length_; > It doesn't look like moving the col_idx declaration out of the loop is need Yes it was left over from a previous change http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@337 PS3, Line 337: > Passing key_length_ seems wrong, since it measures the number of bytes in k Corrected it to have remaining length which gets adjusted as the bytes are read in to the buffer. http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@464 PS3, Line 464: } > Can you add a brief comment explaining what this is checking for? E.g. "Ens Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h File be/src/exec/hdfs-sequence-scanner.h: http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h@257 PS3, Line 257: next_record_in_compressed_block_len > Needs an underscore in the end of the variable name. Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc@235 PS3, Line 235: ss << __FILE__ << ":" << __LINE__ << "SeqFile corrupted, record_locations_.size() " > Would be good to include file names so it's easier to root cause in the fie Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc File be/src/exec/read-write-util-test.cc: http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc@115 PS3, Line 115: TestPutGetZeroCompressedLong(val); > How about we add a few unit tests to make sure that passing in a too-short Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@58 PS3, Line 58: /// Returns the length of the long/int > Can you document the size argument? Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@72 PS3, Line 72: /// byte offset and the buffer passed is of length size, accessing beyond the > Can you document the size argument? Done http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@198 PS3, Line 198: > I think we can check len vs size here and save doing it in the loop below. Done http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py@203 PS3, Line 203: # E.g. corrupt Parquet footer (IMPALA-3773) or a corrupt LZO index file > The previous two aren't needed, right, since 'corrupt' is a substring of th Agreed removed the above two conditional statements -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 4 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-Comment-Date: Sat, 20 Jan 2018 06:26:25 +0000 Gerrit-HasComments: Yes
