GitHub user harishreedharan opened a pull request:

    https://github.com/apache/spark/pull/4688

    [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS

    Current Spark apps running on Secure YARN/HDFS would not be able to write 
data
    to HDFS after 7 days, since delegation tokens cannot be renewed beyond 
that. This
    means Spark Streaming apps will not be able to run on Secure YARN.
    
    This commit adds basic functionality to fix this issue. In this patch:
    - new parameters are added - principal and keytab, which can be used to 
login to a KDC
    - the client logs in, and then get tokens to start the AM
    - the keytab is copied to the staging directory
    - the AM waits for 60% of the time till expiry of the tokens and then logs 
in using the keytab
    - each time after 60% of the time, new tokens are created and sent to the 
executors
    
    Currently, to avoid complicating the architecture, we set the keytab and 
principal in the
    SparkHadoopUtil singleton, and schedule a login. Once the login is 
completed, a callback is scheduled.
    
    This is being posted for feedback, so I can gather feedback on the general 
implementation.
    
    There are currently a bunch of things to do:
    - [ ] logging
    - [ ] testing - I plan to manually test this soon. If you have ideas of how 
to add unit tests, comment.
    - [ ] add code to ensure that if these params are set in non-YARN cluster 
mode, we complain

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harishreedharan/spark kerberos-longrunning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4688.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4688
    
----
commit 77914dd74b3a4af3501bda7a72c658ffcdd0682f
Author: Hari Shreedharan <[email protected]>
Date:   2015-01-30T19:14:35Z

    WIP: Add kerberos principal and keytab to YARN client.

commit ccba5bc3e7ceceb9b1f15072888454b88d1a2322
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-02T23:06:30Z

    WIP: More changes wrt kerberos

commit 2b0d745ec7b76c3dd992660c24ddac556ba1de6a
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-19T05:47:14Z

    [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
    
    Current Spark apps running on Secure YARN/HDFS would not be able to write 
data
    to HDFS after 7 days, since delegation tokens cannot be renewed beyond 
that. This
    means Spark Streaming apps will not be able to run on Secure YARN.
    
    This commit adds basic functionality to fix this issue. In this patch:
    - new parameters are added - principal and keytab, which can be used to 
login to a KDC
    - the client logs in, and then get tokens to start the AM
    - the keytab is copied to the staging directory
    - the AM waits for 60% of the time till expiry of the tokens and then logs 
in using the keytab
    - each time after 60% of the time, new tokens are created and sent to the 
executors

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to