On 27 Sep 2016, at 15:53, Daniel Siegmann <dsiegm...@securityscorecard.io<mailto:dsiegm...@securityscorecard.io>> wrote:
I am running Spark on Amazon EMR and writing data to an S3 bucket. However, the data is read from an S3 bucket in a separate AWS account. Setting the fs.s3a.access.key and fs.s3a.secret.key values is sufficient to get access to the other account (using the s3a protocol), however I then won't have access to the S3 bucket in the EMR cluster's AWS account. Is there any way for Spark to access S3 buckets in multiple accounts? If not, is there any best practice for how to work around this? There are 2 ways to do this without changing permissions 1. different implementations: use s3a for one, s3n for the other, give them the different secrets 2. insecure: use the secrets in the URI. s3a://AWSID:escaped-secret@bucket/path -leaks your secrets thoughout the logs, has problems with "/" in the password..if there is one, you'll probably need to regenerate the password. This is going to have to be fixed in the s3a implementation at some point, as it's not only needed for cross user auth, once you switch to v4 AWS auth you need to specify the appropriate s3 endpoint for your region; you can't just use s3 central, but need to choose s3 frankfurt, s3 seoul, etc: so won't be able to work with data across regions.