If you can use hadoop 2.6.0 binary, you can use s3a s3a is being polished in the upcoming 2.7.0 release: https://issues.apache.org/jira/browse/HADOOP-11571
Cheers On Tue, Mar 3, 2015 at 9:44 AM, Ankur Srivastava <ankur.srivast...@gmail.com > wrote: > Hi, > > We recently upgraded to Spark 1.2.1 - Hadoop 2.4 binary. We are not having > any other dependency on hadoop jars, except for reading our source files > from S3. > > Since we have upgraded to the latest version our reads from S3 have > considerably slowed down. For some jobs we see the read from S3 is stalled > for a long time and then it starts. > > Is there a known issue with S3 or do we need to upgrade any settings? The > only settings that we are using are: > sc.hadoopConfiguration().set("fs.s3n.impl", > "org.apache.hadoop.fs.s3native.NativeS3FileSystem"); > > sc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", someKey); > > sc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", someSecret); > > > Thanks for help!! > > - Ankur >