Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 12: (11 comments) http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8936/11//COMMIT_MSG@13 PS11, Line 13: a) Ran the fuzz test on a private build without failures/crash. > It still fails sometimes - let's be clear in the commit message that this d Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176 PS11, Line 176: > extra indentation Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@176 PS11, Line 176: > extra indentation Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@218 PS11, Line 218: > formatting is a little off (leftmost << should be aligned with each other). Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@331 PS11, Line 331: ss << stream_->filename() << " RCFile corrupt," << "out of bounds error num_rows_: " > Error message will not have a space after the comma. (here and at least one Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@345 PS11, Line 345: key_buf_ptr > I don't think it's necessarily safe to print uint8_t* pointers with C++ str no more printing uint8_t* pointers, and removed Impala specific message. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@433 PS11, Line 433: col_info.uncompressed_buffer_len: 1; > Why is it safe for this to be 1 if the length of the buffer is <= 0? Won't replaced this with col_info.buffer_len which sets an upper bound that should be checked against. http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@581 PS11, Line 581: row_group_sz > row_group_end? _sz makes me think this is a numeric size, which it isn't. Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@583 PS11, Line 583: col_sz > col_end? Done http://gerrit.cloudera.org:8080/#/c/8936/11/be/src/exec/hdfs-rcfile-scanner.cc@587 PS11, Line 587: << col_sz << " greater than row_group_sz "<< row_group_sz; > Printing col_sz and row_group_sz is really dangerous here, since it's going removed printing of char* and replaced the Impala specific error, the error is more generic now w.r.t RCFIle http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/12/be/src/exec/hdfs-rcfile-scanner.cc@362 PS12, Line 362: if (remain_len <= 0) { : stringstream ss; : ss << stream_->filename() << " RCFile corrupt, out of bounds error" : " GetCurrentKeyBuffer col_idx: " << col_idx << " remaining length: " << remain_len : << " col_info.buffer_len: " << col_info.buffer_len; : return Status(ss.str()); : } : : DCHECK_GT(remain_len, 0); : int bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, &col_info.buffer_len, remain_len); : if (bytes_read == -1) { : stringstream ss; : ss << stream_->filename() << " RCFile corrupt, out of bounds error" : " GetCurrentKeyBuffer col_idx: " << col_idx << " remaining length: " << remain_len : << " col_info.buffer_len: " << col_info.buffer_len; : return Status(ss.str()); : } : *key_buf_ptr += bytes_read; : remain_len -= bytes_read; : DCHECK_GT(remain_len, 0); : : bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr, &col_info.uncompressed_buffer_len, : remain_len); : if (bytes_read == -1) { : stringstream ss; : ss << stream_->filename() << " RCFile corrupt, out of bounds error key buffer " : "col_idx: " << col_idx << " remaining length: "<< remain_len; : return Status(ss.str()); : } : *key_buf_ptr += bytes_read; : remain_len -= bytes_read; : DCHECK_GT(remain_len, 0); : : int col_key_buf_len; : bytes_read = ReadWriteUtil::GetVInt(*key_buf_ptr , &col_key_buf_len, remain_len); : if (bytes_read == -1) { : stringstream ss; : ss << stream_->filename() << " RCFile corrupt, out of bounds error key " : "buffer col_idx: " << col_idx << " remaining length: "<< remain_len; : return Status(ss.str()); : } > these error messages seem very tailored to Impala's internal bookkeeping. S I have changed the messages to describe issues with respect to RCFILE terminology like row group column buffer -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 12 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]> Gerrit-Comment-Date: Sun, 01 Apr 2018 06:11:17 +0000 Gerrit-HasComments: Yes
