parthchandra opened a new pull request, #37558:
URL: https://github.com/apache/spark/pull/37558

   
   
   ### What changes were proposed in this pull request?
   The PR implements a mechanism to get credentials for a cloud service like 
AWS from an external credentials provider, and share the credentials between 
the driver and executors.
   
   There are two commits. The first implements the feature by implementing the 
same mechanism as Kerberos delegation tokens. The second refactors the code and 
merges the two implementations. Keeping the two commits separate to make it 
easier to follow the changes.
   
   ### Why are the changes needed?
   For large scale deployments of Spark accessing data from cloud services like 
AWS/S3, every executor in every Spark job needs to have access to credentials. 
These can be provided on the command line when the job is launched(not entirely 
secure), however, if the job continues for a longer period, the credentials may 
expire and need to be renewed. If every executor gets new credentials from the 
external service then a stampeding herd situation might cause disruption of the 
credentials provider service. This PR implements a mechanism such that the 
credentials are obtained once at startup and then again before expiry and 
distributed to executors.
   
   ### Does this PR introduce _any_ user-facing change?
   There are new configuration parameters documented in the README.
   
   ### How was this patch tested?
   Unit tests and manual testing in a user environment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to