Re: S3 Bucket Access

Ranga Tue, 14 Oct 2014 14:54:45 -0700

Thanks Rishi. That is exactly what I am trying to do now :)

On Tue, Oct 14, 2014 at 2:41 PM, Rishi Pidva <rpi...@pivotal.io> wrote:


>
> As per EMR documentation:
> http://docs.amazonaws.cn/en_us/ElasticMapReduce/latest/DeveloperGuide/emr-iam-roles.html
> Access AWS Resources Using IAM Roles
>
> If you've launched your cluster with an IAM role, applications running on
> the EC2 instances of that cluster can use the IAM role to obtain temporary
> account credentials to use when calling services in AWS.
>
> The version of Hadoop available on AMI 2.3.0 and later has already been
> updated to make use of IAM roles. If your application runs strictly on top
> of the Hadoop architecture, and does not directly call any service in AWS,
> it should work with IAM roles with no modification.
>
> If your application calls services in AWS directly, you'll need to update
> it to take advantage of IAM roles. This means that instead of obtaining
> account credentials from/home/hadoop/conf/core-site.xml on the EC2
> instances in the cluster, your application will now either use an SDK to
> access the resources using IAM roles, or call the EC2 instance metadata to
> obtain the temporary credentials.
> --
>
> Maybe you can use AWS SDK in your application to provide AWS credentials?
>
> https://github.com/seratch/AWScala
>
>
> On Oct 14, 2014, at 11:10 AM, Ranga <sra...@gmail.com> wrote:
>
> One related question. Could I specify the "
> com.amazonaws.services.s3.AmazonS3Client" implementation for the  "
> fs.s3.impl" parameter? Let me try that and update this thread with my
> findings.
>
> On Tue, Oct 14, 2014 at 10:48 AM, Ranga <sra...@gmail.com> wrote:
>
>> Thanks for the input.
>> Yes, I did use the "temporary" access credentials provided by the IAM
>> role (also detailed in the link you provided). The session token needs to
>> be specified and I was looking for a way to set that in the header (which
>> doesn't seem possible).
>> Looks like a static key/secret is the only option.
>>
>> On Tue, Oct 14, 2014 at 10:32 AM, Gen <gen.tan...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If I remember well, spark cannot use the IAMrole credentials to access to
>>> s3. It use at first the id/key in the environment. If it is null in the
>>> environment, it use the value in the file core-site.xml.  So, IAMrole is
>>> not
>>> useful for spark. The same problem happens if you want to use distcp
>>> command
>>> in hadoop.
>>>
>>>
>>> Do you use curl http://169.254.169.254/latest/meta-data/iam/.
>>> <http://169.254.169.254/latest/meta-data/iam/>.. to get the
>>> "temporary" access. If yes, this code cannot use directly by spark, for
>>> more
>>> information, you can take a look
>>> http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html
>>> <http://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html>
>>>
>>>
>>>
>>> sranga wrote
>>> > Thanks for the pointers.
>>> > I verified that the access key-id/secret used are valid. However, the
>>> > secret may contain "/" at times. The issues I am facing are as follows:
>>> >
>>> >    - The EC2 instances are setup with an IAMRole () and don't have a
>>> > static
>>> >    key-id/secret
>>> >    - All of the EC2 instances have access to S3 based on this role (I
>>> used
>>> >    s3ls and s3cp commands to verify this)
>>> >    - I can get a "temporary" access key-id/secret based on the IAMRole
>>> but
>>> >    they generally expire in an hour
>>> >    - If Spark is not able to use the IAMRole credentials, I may have to
>>> >    generate a static key-id/secret. This may or may not be possible in
>>> the
>>> >    environment I am in (from a policy perspective)
>>> >
>>> >
>>> >
>>> > - Ranga
>>> >
>>> > On Tue, Oct 14, 2014 at 4:21 AM, Rafal Kwasny &lt;
>>>
>>> > mag@
>>>
>>> > &gt; wrote:
>>> >
>>> >> Hi,
>>> >> keep in mind that you're going to have a bad time if your secret key
>>> >> contains a "/"
>>> >> This is due to old and stupid hadoop bug:
>>> >> https://issues.apache.org/jira/browse/HADOOP-3733
>>> >>
>>> >> Best way is to regenerate the key so it does not include a "/"
>>> >>
>>> >> /Raf
>>> >>
>>> >>
>>> >> Akhil Das wrote:
>>> >>
>>> >> Try the following:
>>> >>
>>> >> 1. Set the access key and secret key in the sparkContext:
>>> >>
>>> >> sparkContext.set("
>>> >>> 
>>> >>> AWS_ACCESS_KEY_ID",yourAccessKey)
>>> >>
>>> >> sparkContext.set("
>>> >>> 
>>> >>> AWS_SECRET_ACCESS_KEY",yourSecretKey)
>>> >>
>>> >>
>>> >> 2. Set the access key and secret key in the environment before
>>> starting
>>> >> your application:
>>> >>
>>> >> 
>>> >>>
>>> >> export
>>> >>> 
>>> >>> AWS_ACCESS_KEY_ID=
>>> > <your access>
>>> >>
>>> >> export
>>> >>> 
>>> >>> AWS_SECRET_ACCESS_KEY=
>>> > <your secret>
>>> > 
>>> >>
>>> >>
>>> >> 3. Set the access key and secret key inside the hadoop configurations
>>> >>
>>> >> val hadoopConf=sparkContext.hadoopConfiguration;
>>> >>>
>>> >>> hadoopConf.set("fs.s3.impl",
>>> >>>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
>>> >>>
>>> >>> hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
>>> >>>
>>> >>> hadoopConf.set("fs.s3.awsSecretAccessKey",yourSecretKey)
>>> >>>
>>> >>>
>>> >> 4. You can also try:
>>> >>
>>> >> val lines =
>>> >>
>>> >> s
>>> >>> parkContext.textFile("s3n://yourAccessKey:yourSecretKey@
>>> >>>
>>> > <yourBucket>
>>> > /path/")
>>> >>
>>> >>
>>> >> Thanks
>>> >> Best Regards
>>> >>
>>> >> On Mon, Oct 13, 2014 at 11:33 PM, Ranga &lt;
>>>
>>> > sranga@
>>>
>>> > &gt; wrote:
>>> >>
>>> >>> Hi
>>> >>>
>>> >>> I am trying to access files/buckets in S3 and encountering a
>>> permissions
>>> >>> issue. The buckets are configured to authenticate using an IAMRole
>>> >>> provider.
>>> >>> I have set the KeyId and Secret using environment variables (
>>> >>> AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID). However, I am still
>>> unable
>>> >>> to access the S3 buckets.
>>> >>>
>>> >>> Before setting the access key and secret the error was:
>>> >>> "java.lang.IllegalArgumentException:
>>> >>> AWS Access Key ID and Secret Access Key must be specified as the
>>> >>> username
>>> >>> or password (respectively) of a s3n URL, or by setting the
>>> >>> fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties
>>> >>> (respectively)."
>>> >>>
>>> >>> After setting the access key and secret, the error is: "The AWS
>>> Access
>>> >>> Key Id you provided does not exist in our records."
>>> >>>
>>> >>> The id/secret being set are the right values. This makes me believe
>>> that
>>> >>> something else ("token", etc.) needs to be set as well.
>>> >>> Any help is appreciated.
>>> >>>
>>> >>>
>>> >>> - Ranga
>>> >>>
>>> >>
>>> >>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/S3-Bucket-Access-tp16303p16397.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>
>

Re: S3 Bucket Access

Reply via email to