Michael Ho has posted comments on this change. Change subject: IMPALA-5197: Erroneous corrupted Parquet file message ......................................................................
Patch Set 1: (6 comments) It's a trade off between coverage and test complexity. I can look into integrating the test into test_cancellation.py but I will probably need to use the new phase GETNEXT_SCANNER to avoid the debug action being triggered at the scan node all the time. If I cannot reproduce the problem with a reasonable rate after modifying test_cancellation.py, I will keep the test as is. http://gerrit.cloudera.org:8080/#/c/6787/1/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS1, Line 56: every other call > why? The same function is called for every column so if there are multiple columns being materialized. Triggering every other call increases the chance of not always failing in the first column reader when materializing multiple columns. Comments added. PS1, Line 56: Read*ValueBatch(). long line. http://gerrit.cloudera.org:8080/#/c/6787/1/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS1, Line 293: 'val_count > what is val_count? It's the counter used by column reader to tracker number of tuples produced so far. Comment added. PS1, Line 294: TriggerDebugAction > it's a bit tricky to follow what's happening since we use the same name "Tr Renamed to ColReaderDebugAction(), ScannerDebugAction() and ScanNodeDebugAction() respectively. A bit verbose but hopefully clear things up. http://gerrit.cloudera.org:8080/#/c/6787/1/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 323: table_format=vector.get_value('table_format')) > Do we want to do anything with 'result' here? Otherwise omit saving return Removed. Line 323: table_format=vector.get_value('table_format')) > Why pass the table format? Seems confusing/unnecessary. Removed. -- To view, visit http://gerrit.cloudera.org:8080/6787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9138039ec60fbe9deff250b8772036e40e42e1f6 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Wood <[email protected]> Gerrit-HasComments: Yes
