Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/5400/6/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: PS6, Line 333: (split_start <= row_group_start && split_end >= row_group_end) Why is this an invalid case? This can be two valid possibilities: 1. split_start == row_group_start && split_end == row_group_end 2. Multiple row groups within a split and are aligned with the split boundaries. Both are correct cases and we shouldn't throw a warning for them. http://gerrit.cloudera.org:8080/#/c/5400/6/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS6, Line 315: Row group size doesn't match HDFS block " : "size. This would be clearer if you said: "Row group size doesn't align with HDFS block size" Because it doesn't need to match the HDFS block size, it just needs to make sure that it doesn't exceed the HDFS block size and is proportianal. Eg: 2 Row groups can fit in one HDFS block and it would be fine if the start of the first row group and the end of the second row group align with the HDFS block size. PS6, Line 319: Parquet file '$0' is poorly formatted. Please change fs.s3a.block.size to match the " : "row group size of your file Here, it is a stretch to say that the Parquet file is poorly formatted. The fact is we don't know if it is or isn't for S3 files. It may be well formatted but the user's choice of the "fs.s3a.block.size" may be sub-optimal. You could rather say: "Parquet file '$0' may be poorly formatted. Please change fs.s3a.block.size to match the row group size of your file." -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: Yes
