Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4

Ted Yu Tue, 03 Mar 2015 09:56:23 -0800

If you can use hadoop 2.6.0 binary, you can use s3a

s3a is being polished in the upcoming 2.7.0 release:
https://issues.apache.org/jira/browse/HADOOP-11571


Cheers

On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava <ankur.srivast...@gmail.com
> wrote:

> Hi,
>
> We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not having
> any other dependency on hadoop jars, except for reading our source files
> from S3.
>
> Since we have upgraded to the latest version our reads from S3 have
> considerably slowed down. For some jobs we see the read from S3 is stalled
> for a long time and then it starts.
>
> Is there a known issue with S3 or do we need to upgrade any settings? The
> only settings that we are using are:
> sc.hadoopConfiguration().set("fs.s3n.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>
> sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", someKey);
>
>  sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", someSecret);
>
>
> Thanks for help!!
>
> - Ankur
>

Re: Issue using S3 bucket from Spark 1.2.1 with hadoop 2.4

Reply via email to