[GitHub] spark pull request #17530: [SPARK-5158] Access kerberized HDFS from Spark st...

themodernlife Tue, 04 Apr 2017 11:27:02 -0700

GitHub user themodernlife opened a pull request:

    https://github.com/apache/spark/pull/17530


    [SPARK-5158] Access kerberized HDFS from Spark standalone

    ## What changes were proposed in this pull request?
    
    - Refactor `ConfigurableCredentialManager` and related 
`CredentialProviders` so that they are no longer tied to YARN
    - Setup credential renewal/updating from within the 
`StandaloneSchedulerBackend`
    - Ensure executors/drivers are able to find initial tokens for contacting 
HDFS and renew them at regular intervals
    
    The implementation does basically the same thing as the YARN backend. The 
keytab is copied to driver/executors through an environment variable in the 
`ApplicationDescription`.
    
    ## How was this patch tested?
    
    https://github.com/themodernlife/spark-standalone-kerberos contains a 
docker-compose environment with a KDC and Kerberized HDFS mini-cluster.  The 
README contains instructions for running the integration test script to see 
credential refresh/updating occur.  Credentials are set to update very 2 
minutes or so.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/themodernlife/spark spark-5158

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17530.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17530
    
----
commit 62a6e20179dd63703d18de9784c8b3770077e968
Author: Ian Hummel <[email protected]>
Date:   2017-02-24T21:29:43Z

    WIP

commit accfe0cebc645ed2b99aaded7629b93b56fcb7ea
Author: Ian Hummel <[email protected]>
Date:   2017-02-24T21:35:24Z

    Add license header that somehow got removed

commit b8559b5895c81c871b1db00b75f038082b2dd4fb
Author: Ian Hummel <[email protected]>
Date:   2017-02-24T21:46:18Z

    Fixup tests

commit 539cc6cf630e9429e7131e755d8e9fa12479cd0c
Author: Ian Hummel <[email protected]>
Date:   2017-02-26T01:01:12Z

    WIP

commit 3f76281094493d63b6364fe38612e56f437c6a7c
Author: Ian Hummel <[email protected]>
Date:   2017-02-27T21:26:48Z

    Push delegation token out to ExecutorRunner

commit 25e7639af248bba4f648d13f5dc76a4fe8bfca34
Author: Ian Hummel <[email protected]>
Date:   2017-02-28T21:21:10Z

    More wip... probably borked

commit 847f6044d2fd0bf1af52d3d7c5d618c8e537e916
Author: Ian Hummel <[email protected]>
Date:   2017-03-02T16:48:45Z

    Untested... make cluster mode work with standalone

commit 4689a55402f193199faf2dc2e2c6c4c904e34bf0
Author: Ian Hummel <[email protected]>
Date:   2017-03-07T16:35:51Z

    Hadoop FileInputFormat is hardcoded to request delegation tokens with 
renewer = yarn.resourcemanager.principal

commit 3e85aa5bfbaee2760d9eb3559d23546508b463d9
Author: Ian Hummel <[email protected]>
Date:   2017-03-07T20:59:21Z

    Still need to sort out a few things, but overall much smaller patch-set

commit f743e6b207b7f71034fe617a402f54e0121b13a2
Author: Ian Hummel <[email protected]>
Date:   2017-03-08T17:06:14Z

    WIP

commit 31c91dcec25718052ae5c775bfe1b41359e8840f
Author: Ian Hummel <[email protected]>
Date:   2017-03-08T18:14:48Z

    WIP

commit 19644195af14c9b8a451609157b9d47f7251ced4
Author: Ian Hummel <[email protected]>
Date:   2017-03-08T22:15:41Z

    Still something isn't working

commit b5bacf31e00243073e7311b768a13aec51c6b9db
Author: Ian Hummel <[email protected]>
Date:   2017-03-15T15:56:41Z

    Merge master

commit 83f05014659e08a4cd8c9703941c98aaaba9eb31
Author: Ian Hummel <[email protected]>
Date:   2017-03-15T20:28:56Z

    Actually use credential updater

commit 917b077ca1e05a9bb44bcb91c33ed64a1d1c364c
Author: Ian Hummel <[email protected]>
Date:   2017-04-04T14:38:18Z

    Change order of configuration setting so that everything works

commit a4c22a92496271935a769313b09da1b8ae88107a
Author: Ian Hummel <[email protected]>
Date:   2017-04-04T14:39:44Z

    Merge branch 'master' into spark-5158
    
    * master: (164 commits)
      [SPARK-20198][SQL] Remove the inconsistency in table/function name 
conventions in SparkSession.Catalog APIs
      [SPARK-20190][APP-ID] applications//jobs' in rest api,status should be 
[running|sâ¦
      [SPARK-19825][R][ML] spark.ml R API for FPGrowth
      [SPARK-20067][SQL] Unify and Clean Up Desc Commands Using Catalog 
Interface
      [SPARK-10364][SQL] Support Parquet logical type TIMESTAMP_MILLIS
      [SPARK-19408][SQL] filter estimation on two columns of same table
      [SPARK-20145] Fix range case insensitive bug in SQL
      [SPARK-20194] Add support for partition pruning to in-memory catalog
      [SPARK-19641][SQL] JSON schema inference in DROPMALFORMED mode produces 
incorrect schema for non-array/object JSONs
      [SPARK-19969][ML] Imputer doc and example
      [SPARK-9002][CORE] KryoSerializer initialization does not include 
'Array[Int]'
      [MINOR][DOCS] Replace non-breaking space to normal spaces that breaks 
rendering markdown
      [SPARK-20166][SQL] Use XXX for ISO 8601 timezone instead of ZZ 
(FastDateFormat specific) in CSV/JSON timeformat options
      [SPARK-19985][ML] Fixed copy method for some ML Models
      [SPARK-20159][SPARKR][SQL] Support all catalog API in R
      [SPARK-20173][SQL][HIVE-THRIFTSERVER] Throw NullPointerException when 
HiveThriftServer2 is shutdown
      [SPARK-20123][BUILD] SPARK_HOME variable might have spaces in it(e.g. 
$SPARKâ¦
      [SPARK-20143][SQL] DataType.fromJson should throw an exception with 
better message
      [SPARK-20186][SQL] BroadcastHint should use child's stats
      [SPARK-19148][SQL][FOLLOW-UP] do not expose the external table concept in 
Catalog
      ...

commit 16f9551ef1073160f13e9522600416d29388b85b
Author: Ian Hummel <[email protected]>
Date:   2017-04-04T15:48:09Z

    Remove inadvertent file

commit 246c76a82554ee20ed31202a6b12d92a823d68a2
Author: Ian Hummel <[email protected]>
Date:   2017-04-04T16:39:23Z

    Cleanup code

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17530: [SPARK-5158] Access kerberized HDFS from Spark st...

Reply via email to