Dan Hecht has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
Patch Set 8: (4 comments) http://gerrit.cloudera.org:8080/#/c/5400/8/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS8, Line 315: is poorly formatted this seems a bit strong and can be misinterpreted as in the file results in an error. The parquet file is valid, it's just that it's not optimally aligned with hdfs blocks for performance. "Parquet file '$0': Row group size doesn't align with HDFS block size, potentially resulting in decreased scan performance." or something like that. PS8, Line 319: fs.s3a.block.size this is a global option, though, so it might not be possible to match the size of all the files (they may have mismatched row groups). Also, the person executing the query may not be the administer of the system. Instead, maybe we can just hint strongly enough at the solution: Parquet file '$0': Row group size doesn't match the S3A blocksize (fs.s3a.block.size) potentially resulting in decreased scan performance. or similar. Also, it may help to include the actual value of fs.s3a.block.size (and similarly HDFS blocksize) in the error to help diagnose. http://gerrit.cloudera.org:8080/#/c/5400/8/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 314: @SkipIfS3.hdfs_block_size why not make this test work for S3? PS8, Line 327: hdfs://localhost:20500 this (and other places) won't work for S3 (and other non-hdfs) test setups. Use filesystem_prefix(). -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
