Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
......................................................................


Patch Set 12:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13
PS11, Line 13:   a) Ran the fuzz test on a private build without failures/crash.
> It still fails sometimes - let's be clear in the commit message that this d
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176:     
> extra indentation
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176
PS11, Line 176:
> extra indentation
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218
PS11, Line 218:
> formatting is a little off (leftmost << should be aligned with each other).
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331
PS11, Line 331:     ss << stream_->filename() << " RCFile corrupt," << "out of 
bounds error num_rows_: "
> Error message will not have a space after the comma. (here and at least one
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345
PS11, Line 345: key_buf_ptr
> I don't think it's necessarily safe to print uint8_t* pointers with C++ str
no more printing uint8_t* pointers, and removed  Impala specific message.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433
PS11, Line 433:         col_info.uncompressed_buffer_len: 1;
> Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't
replaced this with col_info.buffer_len which sets an upper bound that  should 
be checked against.


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581
PS11, Line 581: row_group_sz
> row_group_end? _sz makes me think this is a numeric size, which it isn't.
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583
PS11, Line 583: col_sz
> col_end?
Done


http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587
PS11, Line 587:             << col_sz << "  greater than row_group_sz "<< 
row_group_sz;
> Printing col_sz and row_group_sz is really dangerous here, since it's going
removed printing of char* and replaced the Impala specific error, the error is 
more generic now w.r.t RCFIle


http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc@362
PS12, Line 362:   if (remain_len <= 0) {
              :     stringstream ss;
              :     ss << stream_->filename() << " RCFile corrupt, out of 
bounds error"
              :       " GetCurrentKeyBuffer col_idx: " << col_idx << " 
remaining length: " << remain_len
              :       << " col_info.buffer_len: " << col_info.buffer_len;
              :     return Status(ss.str());
              :   }
              :
              :   DCHECK_GT(remain_len, 0);
              :   int bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, 
&col_info.buffer_len, remain_len);
              :   if (bytes_read == -1) {
              :     stringstream ss;
              :     ss << stream_->filename() << " RCFile corrupt, out of 
bounds error"
              :       " GetCurrentKeyBuffer col_idx: " << col_idx << " 
remaining length: " << remain_len
              :       << " col_info.buffer_len: " << col_info.buffer_len;
              :     return Status(ss.str());
              :   }
              :   *key_buf_ptr += bytes_read;
              :   remain_len -= bytes_read;
              :   DCHECK_GT(remain_len, 0);
              :
              :   bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, 
&col_info.uncompressed_buffer_len,
              :       remain_len);
              :   if (bytes_read == -1) {
              :     stringstream ss;
              :     ss << stream_->filename() << " RCFile corrupt, out of 
bounds error key buffer "
              :       "col_idx: " << col_idx << " remaining length: "<< 
remain_len;
              :     return Status(ss.str());
              :   }
              :   *key_buf_ptr += bytes_read;
              :   remain_len -= bytes_read;
              :   DCHECK_GT(remain_len, 0);
              :
              :   int col_key_buf_len;
              :   bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr , 
&col_key_buf_len, remain_len);
              :   if (bytes_read == -1) {
              :     stringstream ss;
              :     ss << stream_->filename() << " RCFile corrupt, out of 
bounds error key "
              :       "buffer col_idx: " << col_idx << " remaining length: "<< 
remain_len;
              :     return Status(ss.str());
              :   }
> these error messages seem very tailored to Impala's internal bookkeeping. S
I have changed the messages to describe issues with respect to RCFILE 
terminology like  row group column buffer



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 12
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: anujphadke <[email protected]>
Gerrit-Comment-Date: Sun, 01 Apr 2018 06:11:17 +0000
Gerrit-HasComments: Yes

Reply via email to