Thanks for the input Steven and Hariharan. I think this ended up being a
combination of bad configuration with the credential providers I was using
*and* using the wrong set of credentials for the test data I was trying to
access.
I was able to get this working with both hadoop 2.8 and 3.1 by pull
If you're using hadoop 2.7 or below, you may also need to use the
following hadoop settings:
fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
fs.AbstractFileSystem.s3.impl=org.apache.hadoop.fs.s3a.S3A
fs.AbstractFileSystem.s3a.impl=org.apache.had
To successfully read from S3 using s3a, I've had to also set
```
spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
```
in addition to `spark.hadoop.fs.s3a.access.key` and
`spark.hadoop.fs.s3a.secret.key`. I've also needed to ensure Spark has
access to the AWS SDK jar. I have downloade
Hello,
I'm attempting to run Spark within a Docker container with the hope of
eventually running Spark on Kubernetes. Nearly all the data we currently
process with Spark is stored in S3, so I need to be able to interface with
it using the S3A filesystem.
I feel like I've gotten close to getting t