steveloughran opened a new pull request #33064: URL: https://github.com/apache/spark/pull/33064
This patches the hadoop configuration so that fs.s3a.endpoint is set to s3.amazonaws.com if neither it nor fs.s3a.endpoint.region is set. This stops S3A Filesystem creation failing with the error "Unable to find a region via the region provider chain." in some non-EC2 deployments. See: HADOOP-17771. ### What changes were proposed in this pull request? when spark options are propagated to the hadoop configuration in SparkHadoopUtils. the fs.s3a.endpoint value is set to "s3.amazonaws.com" if unset and no explicit region is set in fs.s3a.endpoint.region. ### Why are the changes needed? A regression in Hadoop 3.3.1 has surfaced which causes S3A filesystem instantiation to fail outside EC2 deployments if the host lacks a CLI configuration in ~/.aws/config declaring the region, or the `AWS_REGION` environment variable HADOOP-17771 fixes this in Hadoop-3.3.2+, but this spark patch will correct the behavior when running Spark with the 3.3.1 artifacts. It is harmless for older versions and compatible with hadoop releases containing the HADOOP-17771 fix. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New tests to verify propagation logic from spark conf to hadoop conf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
