Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
......................................................................


Patch Set 4:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@329
PS3, Line 329:   int bytes_read = ReadWriteUtil::GetVInt(key_buf_ptr, 
&num_rows_, key_length_);
> Maybe also include the file name in the messages? So that if this happens i
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@334
PS3, Line 334:       << " key_length_: " << key_length_;
> It doesn't look like moving the col_idx declaration out of the loop is need
Yes it was left over from a previous change


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@337
PS3, Line 337:
> Passing key_length_ seems wrong, since it measures the number of bytes in k
Corrected it to have remaining length which gets adjusted as the bytes are read 
in to the buffer.


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-rcfile-scanner.cc@464
PS3, Line 464:   }
> Can you add a brief comment explaining what this is checking for? E.g. "Ens
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h
File be/src/exec/hdfs-sequence-scanner.h:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.h@257
PS3, Line 257: next_record_in_compressed_block_len
> Needs an underscore in the end of the variable name.
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/hdfs-sequence-scanner.cc@235
PS3, Line 235:       ss << __FILE__ << ":" << __LINE__ << "SeqFile corrupted, 
record_locations_.size() "
> Would be good to include file names so it's easier to root cause in the fie
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc
File be/src/exec/read-write-util-test.cc:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util-test.cc@115
PS3, Line 115:     TestPutGetZeroCompressedLong(val);
> How about we add a few unit tests to make sure that passing in a too-short
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@58
PS3, Line 58:   /// Returns the length of the long/int
> Can you document the size argument?
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@72
PS3, Line 72:   /// byte offset and the buffer passed is of length size, 
accessing beyond the
> Can you document the size argument?
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/be/src/exec/read-write-util.h@198
PS3, Line 198:
> I think we can check len vs size here and save doing it in the loop below.
Done


http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8936/3/tests/query_test/test_scanners_fuzz.py@203
PS3, Line 203:         # E.g. corrupt Parquet footer (IMPALA-3773) or a corrupt 
LZO index file
> The previous two aren't needed, right, since 'corrupt' is a substring of th
Agreed removed the above two conditional statements



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-Comment-Date: Sat, 20 Jan 2018 06:26:25 +0000
Gerrit-HasComments: Yes

Reply via email to