Hi, And on another note, is it required to use s3a? Why not use s3:// only? I prefer to use s3a:// only while writing files to S3 from EMR.
Regards, Gourav Sengupta On Tue, May 31, 2016 at 12:04 PM, Gourav Sengupta <gourav.sengu...@gmail.com > wrote: > Hi, > > Is your spark cluster running in EMR or via self created SPARK cluster > using EC2 or from a local cluster behind firewall? What is the SPARK > version you are using? > > Regards, > Gourav Sengupta > > On Sun, May 29, 2016 at 10:55 PM, Mayuresh Kunjir <mayur...@cs.duke.edu> > wrote: > >> I'm running into permission issues while accessing data in S3 bucket >> stored using s3a file system from a local Spark cluster. Has anyone found >> success with this? >> >> My setup is: >> - Spark 1.6.1 compiled against Hadoop 2.7.2 >> - aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.2.jar in the classpath >> - Spark's Hadoop configuration is as follows: >> >> >> sc.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem") >> >> sc.hadoopConfiguration.set("fs.s3a.access.key", <access>) >> >> sc.hadoopConfiguration.set("fs.s3a.secret.key", <secret>) >> >> (The secret key does not have any '/' characters which is reported to >> cause some issue by others) >> >> >> I have configured my S3 bucket to grant the necessary permissions. ( >> https://sparkour.urizone.net/recipes/configuring-s3/) >> >> >> What works: Listing, reading from, and writing to s3a using hadoop >> command. e.g. hadoop dfs -ls s3a://<bucket name>/<file path> >> >> >> What doesn't work: Reading from s3a using Spark's textFile API. Each task >> throws an exception which says *Forbidden Access(403)*. >> >> >> Some online documents suggest to use IAM roles to grant permissions for >> an AWS cluster. But I would like a solution for my local standalone cluster. >> >> >> Any help would be appreciated. >> >> >> Regards, >> >> ~Mayuresh >> > >