GitHub user harishreedharan opened a pull request:

    https://github.com/apache/spark/pull/5823

    [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS

    Take 2. Does the same thing as #4688, but fixes Hadoop-1 build.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/harishreedharan/spark kerberos-longrunning

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5823
    
----
commit 77914dd74b3a4af3501bda7a72c658ffcdd0682f
Author: Hari Shreedharan <[email protected]>
Date:   2015-01-30T19:14:35Z

    WIP: Add kerberos principal and keytab to YARN client.

commit ccba5bc3e7ceceb9b1f15072888454b88d1a2322
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-02T23:06:30Z

    WIP: More changes wrt kerberos

commit 2b0d745ec7b76c3dd992660c24ddac556ba1de6a
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-19T05:47:14Z

    [SPARK-5342][YARN] Allow long running Spark apps to run on secure YARN/HDFS.
    
    Current Spark apps running on Secure YARN/HDFS would not be able to write 
data
    to HDFS after 7 days, since delegation tokens cannot be renewed beyond 
that. This
    means Spark Streaming apps will not be able to run on Secure YARN.
    
    This commit adds basic functionality to fix this issue. In this patch:
    - new parameters are added - principal and keytab, which can be used to 
login to a KDC
    - the client logs in, and then get tokens to start the AM
    - the keytab is copied to the staging directory
    - the AM waits for 60% of the time till expiry of the tokens and then logs 
in using the keytab
    - each time after 60% of the time, new tokens are created and sent to the 
executors

commit f8fe694efd117d707313748c02cef42240a3aec7
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-19T18:46:46Z

    Handle None if keytab-login is not scheduled.

commit bcfc3747e14ce6f1a242ba9cd71c03017b77f36e
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-20T07:08:15Z

    Fix Hadoop-1 build by adding no-op methods in SparkHadoopUtil, with impl in 
YarnSparkHadoopUtil.

commit d282d7a69d563604776f6985760bd2c367725d0d
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-20T07:41:00Z

    Fix ClientSuite to set YARN mode, so that the correct class is used in 
tests.

commit 41efde0ce0523f53f5d27c85e5c760b6d20fe0d6
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-22T02:00:55Z

    Merge branch 'master' into kerberos-longrunning

commit fb27f46f2f2b06cffe15a4728e828371f92e17a5
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-24T01:38:51Z

    Make sure principal and keytab are set before CoarseGrainedSchedulerBackend 
is started.
    Also schedule re-logins in CoarseGrainedSchedulerBackend#start()

commit 8c6928a1cf966136847e6035a0610781b17e2769
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-24T03:29:46Z

    Fix issue caused by direct creation of Actor object.

commit d79b2b98532b1b5133026fe095104c5dc5f52af9
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-24T19:36:00Z

    Make sure correct credentials are passed to FileSystem#addDelegationTokens()

commit 0985b4e2fb1c51247eb993c28718c9b86f242563
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-27T22:00:29Z

    Write tokens to HDFS and read them back when required, rather than sending 
them over the wire.

commit b4cb917d8ed5e06b3470f43ec221dd7ecdba7ec8
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-28T00:04:07Z

    Send keytab to AM via DistributedCache rather than directly via HDFS

commit 5c11c3e348fecdd070f5ab471314bce94bb4b66e
Author: Hari Shreedharan <[email protected]>
Date:   2015-02-28T06:28:39Z

    Move tests to YarnSparkHadoopUtil to fix compile issues.

commit f6954dab2c1d7ebc614093bbde80ae1ae59bf97e
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-05T20:30:06Z

    Got rid of Akka communication to renew, instead the executors check a known 
file's
    modification time to read the credentials.

commit f0f54cba1e579a0ee320dd27d0683ff4cd458cc4
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-05T21:19:09Z

    Be more defensive when updating the credentials file.

commit af6d5f0b2ca70f46507c98f1c930b21591a35e59
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-05T23:41:22Z

    Cleaning up files where changes weren't required.

commit 2debcea367aa7e54d49af2109b47568ba61f829b
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-06T08:49:38Z

    Change the file structure for credentials files. I will push a followup 
patch which
    adds a cleanup mechanism for old credentials files. The credentials files 
are small
    and few enough for it to cause issues on HDFS.

commit f4fd711f44c24224178ca1ad9c7b4529bf62fa47
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-06T19:26:11Z

    Fix SparkConf usage.

commit 9ef5f1b7558731762435524636ff5bc87552a355
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-07T00:41:15Z

    Added explanation of how the credentials refresh works, some other minor 
fixes.

commit 55522e3733de3d6f1a9dc7d1206a6fe10f9fc8e9
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-07T01:28:23Z

    Fix failure caused by Preconditions ambiguity.

commit 0de27eeb4114cfaf10dc7f58b900086bd4b0af24
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-23T02:40:55Z

    Merge branch 'master' into kerberos-longrunning
    
    Conflicts:
        bin/utils.sh
        core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala

commit 42813b4d852e590aea742898d8603113f4bdb675
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-23T04:20:10Z

    Remove utils.sh, which was re-added due to merge with master.

commit fa233bd5c22078cc75f0ab5163974420e438e47f
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-24T19:43:24Z

    Adding logging, fixing minor formatting and ordering issues.

commit 62c45ce494cb242698db9a53c3e6068c86540e0d
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-24T23:59:11Z

    Relogin from keytab periodically.

commit 61b2b279e04fcbdde5c3206164a1605fe357abb3
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-25T20:54:41Z

    Account for AM restarts by making sure lastSuffix is read from the files on 
HDFS.

commit 2f9975c6169c11b4e33280b8604ddf3f87935618
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-26T00:20:10Z

    Ensure new tokens are written out immediately on AM restart. Also, pikc up 
the latest suffix from HDFS if the AM is restarted.

commit f74303c89c8fe542122680077e2a471d41b809c2
Author: Hari Shreedharan <[email protected]>
Date:   2015-03-27T00:50:56Z

    Move the new logic into specialized classes. Add cleanup for old 
credentials files.

commit bcd11f92ed75bfff5dda1f38d85bd757c07bb122
Author: Hari Shreedharan <[email protected]>
Date:   2015-04-08T19:58:24Z

    Refactor AM and Executor token update code into separate classes, also send 
tokens via akka on executor startup.

commit 7f1bc58affe3c88762c32f9641d0f4671a408b8a
Author: Hari Shreedharan <[email protected]>
Date:   2015-04-09T20:52:48Z

    Minor fixes, cleanup.

commit 0e9507e37adbd552bb2039bc8bdd5c8fa85e03b0
Author: Hari Shreedharan <[email protected]>
Date:   2015-04-09T22:31:46Z

    Merge branch 'master' into kerberos-longrunning
    
    Conflicts:
        
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
        
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
        yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala
        yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to