GitHub user harishreedharan opened a pull request:
https://github.com/apache/spark/pull/4688
[SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS
Current Spark apps running on Secure YARN/HDFS would not be able to write
data
to HDFS after 7 days, since delegation tokens cannot be renewed beyond
that. This
means Spark Streaming apps will not be able to run on Secure YARN.
This commit adds basic functionality to fix this issue. In this patch:
- new parameters are added - principal and keytab, which can be used to
login to a KDC
- the client logs in, and then get tokens to start the AM
- the keytab is copied to the staging directory
- the AM waits for 60% of the time till expiry of the tokens and then logs
in using the keytab
- each time after 60% of the time, new tokens are created and sent to the
executors
Currently, to avoid complicating the architecture, we set the keytab and
principal in the
SparkHadoopUtil singleton, and schedule a login. Once the login is
completed, a callback is scheduled.
This is being posted for feedback, so I can gather feedback on the general
implementation.
There are currently a bunch of things to do:
- [ ] logging
- [ ] testing - I plan to manually test this soon. If you have ideas of how
to add unit tests, comment.
- [ ] add code to ensure that if these params are set in non-YARN cluster
mode, we complain
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/harishreedharan/spark kerberos-longrunning
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4688.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4688
----
commit 77914dd74b3a4af3501bda7a72c658ffcdd0682f
Author: Hari Shreedharan <[email protected]>
Date: 2015-01-30T19:14:35Z
WIP: Add kerberos principal and keytab to YARN client.
commit ccba5bc3e7ceceb9b1f15072888454b88d1a2322
Author: Hari Shreedharan <[email protected]>
Date: 2015-02-02T23:06:30Z
WIP: More changes wrt kerberos
commit 2b0d745ec7b76c3dd992660c24ddac556ba1de6a
Author: Hari Shreedharan <[email protected]>
Date: 2015-02-19T05:47:14Z
[SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
Current Spark apps running on Secure YARN/HDFS would not be able to write
data
to HDFS after 7 days, since delegation tokens cannot be renewed beyond
that. This
means Spark Streaming apps will not be able to run on Secure YARN.
This commit adds basic functionality to fix this issue. In this patch:
- new parameters are added - principal and keytab, which can be used to
login to a KDC
- the client logs in, and then get tokens to start the AM
- the keytab is copied to the staging directory
- the AM waits for 60% of the time till expiry of the tokens and then logs
in using the keytab
- each time after 60% of the time, new tokens are created and sent to the
executors
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]