[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

Joe McDonnell (Code Review) Mon, 25 Sep 2017 16:25:53 -0700

Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8056 )


Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed 
parquet
......................................................................


Patch Set 4:

(4 comments)

I like the overall approach. I have a few small naming/style issues, but I 
think this is getting closer.

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101
PS4, Line 101:     """ Clone an existing parquet table with codec as none in the
             :         unique database. This cloned table is passed to 
run_fuzz_test
             :         which clones the table and corrupts the table. The test 
later
             :         checks that there is no crash while performing SQL 
queries on
             :         a corrupt table.
             :     """
I think this comment should focus on why this test is different from the 
others. For example, it should explain that the parquet tables in the default 
schema are always compressed. So, in order to test uncompressed parquet, we 
need to create a new source table. I think you can skip the last two sentences.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111
PS4, Line 111:     db_name = unique_database
I would prefer to emphasize that the source and destination are the 
unique_database. To make that clearer, I think I would get rid of this variable 
and just use unique_database directly in each location.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117
PS4, Line 117: functional_parquet.alltypes
Can we extend this to do fuzzing on decimal_tbl as well? I was thinking this 
could be a loop that runs fuzzing over a list of tables (that happens to have 
two entries).


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118
PS4, Line 118: .format(fq_tbl_name))
This indentation is a bit awkward. I don't think .format should be on its own 
line. One way to get around this is to use only 4 space indentation for the 
second line (" select"...) and then put the .format on that line.



--
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:25:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet

Reply via email to