Pranay Singh has posted comments on this change. ( http://gerrit.cloudera.org:8080/8936 )
Change subject: IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners ...................................................................... Patch Set 13: (6 comments) http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8936/13//COMMIT_MSG@14 PS13, Line 14: with 100 iteration without failure. > We should run it for more than 100 iterations before committing, I think t I ran the the test for around 200 iterations to see if it fails. http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc File be/src/exec/hdfs-rcfile-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@86 PS13, Line 86: const int max_ncols = 8000000; > Let's make this consistent with other constant - should be all caps and def Done http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@90 PS13, Line 90: ss << stream_->filename() << " Invalid file-header-metadata, number of columns " > I think this error message is confusing. The metadata isn't invalid, the pr Done http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-rcfile-scanner.cc@363 PS13, Line 363: ss << stream_->filename() << " Bad rowcolumn index: " << col_idx; > Is there meant to be a space between row and column? It seems like we shoul Done http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc File be/src/exec/hdfs-sequence-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8936/13/be/src/exec/hdfs-sequence-scanner.cc@56 PS13, Line 56: if (codegen == nullptr) { > This DCHECK should be valid - why did this need to be changed? If we're hit Done http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: http://gerrit.cloudera.org:8080/#/c/8936/13/tests/query_test/test_scanners_fuzz.py@198 PS13, Line 198: elif 'corrupt' in str(e).lower(): > In theory we should just be skipping over corrupt files with abort_on_error Done -- To view, visit http://gerrit.cloudera.org:8080/8936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a Gerrit-Change-Number: 8936 Gerrit-PatchSet: 13 Gerrit-Owner: Pranay Singh Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Pranay Singh Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: anujphadke <apha...@cloudera.com> Gerrit-Comment-Date: Wed, 25 Apr 2018 16:59:22 +0000 Gerrit-HasComments: Yes