steveloughran opened a new pull request #33064:
URL: https://github.com/apache/spark/pull/33064


   
   This patches the hadoop configuration so that fs.s3a.endpoint is set to
   s3.amazonaws.com if neither it nor fs.s3a.endpoint.region is set.
   
   This stops S3A Filesystem creation failing with the error
   "Unable to find a region via the region provider chain."
   in some non-EC2 deployments.
   
   See: HADOOP-17771.
   
   
   
   ### What changes were proposed in this pull request?
   
   when spark options are propagated to the hadoop configuration
   in SparkHadoopUtils. the fs.s3a.endpoint value is set to
   "s3.amazonaws.com" if unset and no explicit region
   is set in fs.s3a.endpoint.region.
   
   ### Why are the changes needed?
   
   A regression in Hadoop 3.3.1 has surfaced which causes S3A filesystem
   instantiation to fail outside EC2 deployments if the host lacks
   a CLI configuration in ~/.aws/config declaring the region, or
   the `AWS_REGION` environment variable
   
   HADOOP-17771 fixes this in Hadoop-3.3.2+, but
   this spark patch will correct the behavior when running
   Spark with the 3.3.1 artifacts.
   
   It is harmless for older versions and compatible
   with hadoop releases containing the HADOOP-17771
   fix.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New tests to verify propagation logic from spark conf to hadoop conf.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to