On 27 Sep 2016, at 15:53, Daniel Siegmann 
<dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:

I am running Spark on Amazon EMR and writing data to an S3 bucket. However, the 
data is read from an S3 bucket in a separate AWS account. Setting the 
fs.s3a.access.key and fs.s3a.secret.key values is sufficient to get access to 
the other account (using the s3a protocol), however I then won't have access to 
the S3 bucket in the EMR cluster's AWS account.

Is there any way for Spark to access S3 buckets in multiple accounts? If not, 
is there any best practice for how to work around this?



There are 2 ways to do this without changing permissions

1. different implementations: use s3a for one, s3n for the other, give them the 
different secrets

2. insecure: use the secrets in the URI. s3a://AWSID:escaped-secret@bucket/path
-leaks your secrets thoughout the logs, has problems with "/" in the 
password..if there is one, you'll probably need to regenerate the password.

This is going to have to be fixed in the s3a implementation at some point, as 
it's not only needed for cross user auth, once you switch to v4 AWS auth you 
need to specify the appropriate s3 endpoint for your region; you can't just use 
s3 central, but need to choose s3 frankfurt, s3 seoul, etc: so won't be able to 
work with data across regions.

Reply via email to