Attila Jeges has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet 
files
......................................................................


Patch Set 8:

(1 comment)

> (2 comments)
 > 
 > If you want to add the config values to the message, I can take
 > another look. Otherwise, this looks okay.  Did you run the S3 tests
 > to make sure it works?

I've tested with S3 today and the 'test_misaligned_parquet_row_groups()' test 
does not work. This is probably expected.

The parquet files are copied to the destination file system with the following 
command (create-load-data.sh):
hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst>

It sets dfs.block.size to 1MB to make sure that some row groups in the parquet 
files span across block boundaries and thus the files are "poorly formatted". 
This doesn't seem to be working with S3. I tried using 
-Dfs.s3a.block.size=1048576 but it didn't work either.

So, probably we should skip the test when the file system is not HDFS. What do 
you think?

http://gerrit.cloudera.org:8080/#/c/5400/8/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS8, Line 319: fs.s3a.block.size
> Grep for GetHadoopConfig().  If you think it's overkill to add this info, i
Thanks, I'll leave it like this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]>
Gerrit-HasComments: Yes

Reply via email to