Hmm, yeah, that is weird but because it’s only on some files it might mean 
those didn’t get fully uploaded. 

Matei

On Jul 2, 2014, at 4:50 PM, Brian Gawalt <bgaw...@gmail.com> wrote:

> HUH; not-scrubbing the slashes fixed it. I would have sworn I tried it, got a
> 403 Forbidden, then remembered the slash prescription. Can confirm I was
> never scrubbing the actual URIs. It looks like it'd all be working now
> except it's smacking its head against:
> 
> 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
> s3n://odesk-bucket/subbucket/2014/01/datafile-01.gz:0+661974299
> 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
> s3n://odesk-bucket/subbucket/2014/01/datafile-03.gz:0+1207089239
> 14/07/02 23:37:38 INFO rdd.HadoopRDD: Input split:
> s3n://odesk-bucket/subbucket/2014/01/datafile-06.gz:0+1155725077
> 14/07/02 23:38:57 ERROR executor.Executor: Exception in task ID 0
> java.io.IOException: stored gzip size doesn't match decompressed size
>        at
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeTrailerState(BuiltInGzipDecompressor.java:389)
>        at
> org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:224)
>        at
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:82)
>        at
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:76)
> 
> but maybe that's just something we need to deal with internally.
> 
> Thanks,
> --Brian
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/AWS-Credentials-for-private-S3-reads-tp8689p8692.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to