Dan Hecht has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
......................................................................


Patch Set 9:

> (1 comment)
 > 
 > > (2 comments)
 > >
 > > If you want to add the config values to the message, I can take
 > > another look. Otherwise, this looks okay.  Did you run the S3
 > tests
 > > to make sure it works?
 > 
 > I've tested with S3 today and the 'test_misaligned_parquet_row_groups()'
 > test does not work. This is probably expected.
 > 
 > The parquet files are copied to the destination file system with
 > the following command (create-load-data.sh):
 > hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst>
 > 
 > It sets dfs.block.size to 1MB to make sure that some row groups in
 > the parquet files span across block boundaries and thus the files
 > are "poorly formatted". This doesn't seem to be working with S3. I
 > tried using -Dfs.s3a.block.size=1048576 but it didn't work either.
 > 
 > So, probably we should skip the test when the file system is not
 > HDFS. What do you think?

Hmm, yeah I guess we'd have to run this as a custom cluster test so that we can 
set the fs.s3a.block.size hadoop config value for the s3a connector to pick up. 
I'm a bit worried about checking this in without any kind of testing on S3.  Is 
there some easy manual testing you could at least do (or try doing it as a 
custom cluster test)?

This is also why I'm a bit worried about making this a warning rather than just 
a profile message -- the person running queries my not be able to do anything 
to "fix" the warning. In the case of S3, they really need help from the cluster 
administrator.  For that (and other reasons), the message is not always 
actionable, and it seems like warnings should always be actionable.  What do 
you think?

-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: No

Reply via email to