parthchandra opened a new pull request, #37558: URL: https://github.com/apache/spark/pull/37558
### What changes were proposed in this pull request? The PR implements a mechanism to get credentials for a cloud service like AWS from an external credentials provider, and share the credentials between the driver and executors. There are two commits. The first implements the feature by implementing the same mechanism as Kerberos delegation tokens. The second refactors the code and merges the two implementations. Keeping the two commits separate to make it easier to follow the changes. ### Why are the changes needed? For large scale deployments of Spark accessing data from cloud services like AWS/S3, every executor in every Spark job needs to have access to credentials. These can be provided on the command line when the job is launched(not entirely secure), however, if the job continues for a longer period, the credentials may expire and need to be renewed. If every executor gets new credentials from the external service then a stampeding herd situation might cause disruption of the credentials provider service. This PR implements a mechanism such that the credentials are obtained once at startup and then again before expiry and distributed to executors. ### Does this PR introduce _any_ user-facing change? There are new configuration parameters documented in the README. ### How was this patch tested? Unit tests and manual testing in a user environment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
