Thanks Rishi. That is exactly what I am trying to do now :) On Tue, Oct 14, 2014 at 2:41 PM, Rishi Pidva <rpi...@pivotal.io> wrote:
> > As per EMR documentation: > http://docs.amazonaws.cn/en_us/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html > Access AWS Resources Using IAM Roles > > If you've launched your cluster with an IAM role, applications running on > the EC2 instances of that cluster can use the IAM role to obtain temporary > account credentials to use when calling services in AWS. > > The version of Hadoop available on AMI 2.3.0 and later has already been > updated to make use of IAM roles. If your application runs strictly on top > of the Hadoop architecture, and does not directly call any service in AWS, > it should work with IAM roles with no modification. > > If your application calls services in AWS directly, you'll need to update > it to take advantage of IAM roles. This means that instead of obtaining > account credentials from/home/hadoop/conf/core-site.xml on the EC2 > instances in the cluster, your application will now either use an SDK to > access the resources using IAM roles, or call the EC2 instance metadata to > obtain the temporary credentials. > -- > > Maybe you can use AWS SDK in your application to provide AWS credentials? > > https://github.com/seratch/AWScala > > > On Oct 14, 2014, at 11:10 AM, Ranga <sra...@gmail.com> wrote: > > One related question. Could I specify the " > com.amazonaws.services.s3.AmazonS3Client" implementation for the " > fs.s3.impl" parameter? Let me try that and update this thread with my > findings. > > On Tue, Oct 14, 2014 at 10:48 AM, Ranga <sra...@gmail.com> wrote: > >> Thanks for the input. >> Yes, I did use the "temporary" access credentials provided by the IAM >> role (also detailed in the link you provided). The session token needs to >> be specified and I was looking for a way to set that in the header (which >> doesn't seem possible). >> Looks like a static key/secret is the only option. >> >> On Tue, Oct 14, 2014 at 10:32 AM, Gen <gen.tan...@gmail.com> wrote: >> >>> Hi, >>> >>> If I remember well, spark cannot use the IAMrole credentials to access to >>> s3. It use at first the id/key in the environment. If it is null in the >>> environment, it use the value in the file core-site.xml. So, IAMrole is >>> not >>> useful for spark. The same problem happens if you want to use distcp >>> command >>> in hadoop. >>> >>> >>> Do you use curl http://169.254.169.254/latest/meta-data/iam/. >>> <http://169.254.169.254/latest/meta-data/iam/>.. to get the >>> "temporary" access. If yes, this code cannot use directly by spark, for >>> more >>> information, you can take a look >>> http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html >>> <http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html> >>> >>> >>> >>> sranga wrote >>> > Thanks for the pointers. >>> > I verified that the access key-id/secret used are valid. However, the >>> > secret may contain "/" at times. The issues I am facing are as follows: >>> > >>> > - The EC2 instances are setup with an IAMRole () and don't have a >>> > static >>> > key-id/secret >>> > - All of the EC2 instances have access to S3 based on this role (I >>> used >>> > s3ls and s3cp commands to verify this) >>> > - I can get a "temporary" access key-id/secret based on the IAMRole >>> but >>> > they generally expire in an hour >>> > - If Spark is not able to use the IAMRole credentials, I may have to >>> > generate a static key-id/secret. This may or may not be possible in >>> the >>> > environment I am in (from a policy perspective) >>> > >>> > >>> > >>> > - Ranga >>> > >>> > On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny < >>> >>> > mag@ >>> >>> > > wrote: >>> > >>> >> Hi, >>> >> keep in mind that you're going to have a bad time if your secret key >>> >> contains a "/" >>> >> This is due to old and stupid hadoop bug: >>> >> https://issues.apache.org/jira/browse/HADOOP-3733 >>> >> >>> >> Best way is to regenerate the key so it does not include a "/" >>> >> >>> >> /Raf >>> >> >>> >> >>> >> Akhil Das wrote: >>> >> >>> >> Try the following: >>> >> >>> >> 1. Set the access key and secret key in the sparkContext: >>> >> >>> >> sparkContext.set(" >>> >>> >>> >>> AWS_ACCESS_KEY_ID",yourAccessKey) >>> >> >>> >> sparkContext.set(" >>> >>> >>> >>> AWS_SECRET_ACCESS_KEY",yourSecretKey) >>> >> >>> >> >>> >> 2. Set the access key and secret key in the environment before >>> starting >>> >> your application: >>> >> >>> >> >>> >>> >>> >> export >>> >>> >>> >>> AWS_ACCESS_KEY_ID= >>> > <your access> >>> >> >>> >> export >>> >>> >>> >>> AWS_SECRET_ACCESS_KEY= >>> > <your secret> >>> > >>> >> >>> >> >>> >> 3. Set the access key and secret key inside the hadoop configurations >>> >> >>> >> val hadoopConf=sparkContext.hadoopConfiguration; >>> >>> >>> >>> hadoopConf.set("fs.s3.impl", >>> >>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem") >>> >>> >>> >>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey) >>> >>> >>> >>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey) >>> >>> >>> >>> >>> >> 4. You can also try: >>> >> >>> >> val lines = >>> >> >>> >> s >>> >>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@ >>> >>> >>> > <yourBucket> >>> > /path/") >>> >> >>> >> >>> >> Thanks >>> >> Best Regards >>> >> >>> >> On Mon, Oct 13, 2014 at 11:33 PM, Ranga < >>> >>> > sranga@ >>> >>> > > wrote: >>> >> >>> >>> Hi >>> >>> >>> >>> I am trying to access files/buckets in S3 and encountering a >>> permissions >>> >>> issue. The buckets are configured to authenticate using an IAMRole >>> >>> provider. >>> >>> I have set the KeyId and Secret using environment variables ( >>> >>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still >>> unable >>> >>> to access the S3 buckets. >>> >>> >>> >>> Before setting the access key and secret the error was: >>> >>> "java.lang.IllegalArgumentException: >>> >>> AWS Access Key ID and Secret Access Key must be specified as the >>> >>> username >>> >>> or password (respectively) of a s3n URL, or by setting the >>> >>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties >>> >>> (respectively)." >>> >>> >>> >>> After setting the access key and secret, the error is: "The AWS >>> Access >>> >>> Key Id you provided does not exist in our records." >>> >>> >>> >>> The id/secret being set are the right values. This makes me believe >>> that >>> >>> something else ("token", etc.) needs to be set as well. >>> >>> Any help is appreciated. >>> >>> >>> >>> >>> >>> - Ranga >>> >>> >>> >> >>> >> >>> >> >>> >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> > >