Hmm, yeah, that is weird but because it’s only on some files it might mean those didn’t get fully uploaded.
Matei On Jul 2, 2014, at 4:50 PM, Brian Gawalt <bgaw...@gmail.com> wrote: > HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a > 403 Forbidden, then remembered the slash prescription. Can confirm I was > never scrubbing the actual URIs. It looks like it'd all be working now > except it's smacking its head against: > > 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: > s3n://odesk-bucket/subbucket/2014/01/datafile-01.gz:0+661974299 > 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: > s3n://odesk-bucket/subbucket/2014/01/datafile-03.gz:0+1207089239 > 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split: > s3n://odesk-bucket/subbucket/2014/01/datafile-06.gz:0+1155725077 > 14/07/02 23:38:57 ERROR executor.Executor: Exception in task ID 0 > java.io.IOException: stored gzip size doesn't match decompressed size > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389) > at > org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224) > at > org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82) > at > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76) > > but maybe that's just something we need to deal with internally. > > Thanks, > --Brian > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/AWS-Credentials-for-private-S3-reads-tp8689p8692.html > Sent from the Apache Spark User List mailing list archive at Nabble.com.