Pranay Singh has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8936 )

Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile 
scanners
......................................................................


Patch Set 13:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG@14
PS13, Line 14:       with 100 iteration without failure.
> We should run it for more than 100 iterations before committing,  I think t
I ran the the test for around  200 iterations to see if it fails.


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc
File be/src/exec/hdfs-rcfile-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@86
PS13, Line 86:   const int max_ncols = 8000000;
> Let's make this consistent with other constant - should be all caps and def
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@90
PS13, Line 90:     ss << stream_->filename() << " Invalid file-header-metadata, 
number of columns "
> I think this error message is confusing. The metadata isn't invalid, the pr
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@363
PS13, Line 363:   ss << stream_->filename() << " Bad rowcolumn index: " << 
col_idx;
> Is there meant to be a space between row and column? It seems like we shoul
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc
File be/src/exec/hdfs-sequence-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc@56
PS13, Line 56:   if (codegen == nullptr) {
> This DCHECK should be valid - why did this need to be changed? If we're hit
Done


http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py@198
PS13, Line 198:         elif 'corrupt' in str(e).lower():
> In theory we should just be skipping over corrupt files with abort_on_error
Done



--
To view, visit http://gerrit.cloudera.org:8080/8936
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Gerrit-Change-Number: 8936
Gerrit-PatchSet: 13
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Apr 2018 16:59:22 +0000
Gerrit-HasComments: Yes

Reply via email to