Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
......................................................................


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5400/6/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

PS6, Line 333: (split_start <= row_group_start && split_end >= row_group_end)
Why is this an invalid case?

This can be two valid possibilities:
1. split_start == row_group_start && split_end == row_group_end
2. Multiple row groups within a split and are aligned with the split boundaries.

Both are correct cases and we shouldn't throw a warning for them.


http://gerrit.cloudera.org:8080/#/c/5400/6/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS6, Line 315: Row group size doesn't match HDFS block "
             :    "size.
This would be clearer if you said:
"Row group size doesn't align with HDFS block size"

Because it doesn't need to match the HDFS block size, it just needs to make 
sure that it doesn't exceed the HDFS block size and is proportianal.

Eg: 2 Row groups can fit in one HDFS block and it would be fine if the start of 
the first row group and the end of the second row group align with the HDFS 
block size.


PS6, Line 319: Parquet file '$0' is poorly formatted. Please change 
fs.s3a.block.size to match the "
             :    "row group size of your file
Here, it is a stretch to say that the Parquet file is poorly formatted. The 
fact is we don't know if it is or isn't for S3 files.

It may be well formatted but the user's choice of the "fs.s3a.block.size" may 
be sub-optimal.

You could rather say:
"Parquet file '$0' may be poorly formatted. Please change fs.s3a.block.size to 
match the row group size of your file."


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to