Dan Hecht has posted comments on this change. Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ......................................................................
Patch Set 9: > (1 comment) > > > (2 comments) > > > > If you want to add the config values to the message, I can take > > another look. Otherwise, this looks okay. Did you run the S3 > tests > > to make sure it works? > > I've tested with S3 today and the 'test_misaligned_parquet_row_groups()' > test does not work. This is probably expected. > > The parquet files are copied to the destination file system with > the following command (create-load-data.sh): > hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst> > > It sets dfs.block.size to 1MB to make sure that some row groups in > the parquet files span across block boundaries and thus the files > are "poorly formatted". This doesn't seem to be working with S3. I > tried using -Dfs.s3a.block.size=1048576 but it didn't work either. > > So, probably we should skip the test when the file system is not > HDFS. What do you think? Hmm, yeah I guess we'd have to run this as a custom cluster test so that we can set the fs.s3a.block.size hadoop config value for the s3a connector to pick up. I'm a bit worried about checking this in without any kind of testing on S3. Is there some easy manual testing you could at least do (or try doing it as a custom cluster test)? This is also why I'm a bit worried about making this a warning rather than just a profile message -- the person running queries my not be able to do anything to "fix" the warning. In the case of S3, they really need help from the cluster administrator. For that (and other reasons), the message is not always actionable, and it seems like warnings should always be actionable. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-HasComments: No
