GitHub user themodernlife opened a pull request:
https://github.com/apache/spark/pull/17530
[SPARK-5158] Access kerberized HDFS from Spark standalone
## What changes were proposed in this pull request?
- Refactor `ConfigurableCredentialManager` and related
`CredentialProviders` so that they are no longer tied to YARN
- Setup credential renewal/updating from within the
`StandaloneSchedulerBackend`
- Ensure executors/drivers are able to find initial tokens for contacting
HDFS and renew them at regular intervals
The implementation does basically the same thing as the YARN backend. The
keytab is copied to driver/executors through an environment variable in the
`ApplicationDescription`.
## How was this patch tested?
https://github.com/themodernlife/spark-standalone-kerberos contains a
docker-compose environment with a KDC and Kerberized HDFS mini-cluster. The
README contains instructions for running the integration test script to see
credential refresh/updating occur. Credentials are set to update very 2
minutes or so.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/themodernlife/spark spark-5158
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17530.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17530
----
commit 62a6e20179dd63703d18de9784c8b3770077e968
Author: Ian Hummel <[email protected]>
Date: 2017-02-24T21:29:43Z
WIP
commit accfe0cebc645ed2b99aaded7629b93b56fcb7ea
Author: Ian Hummel <[email protected]>
Date: 2017-02-24T21:35:24Z
Add license header that somehow got removed
commit b8559b5895c81c871b1db00b75f038082b2dd4fb
Author: Ian Hummel <[email protected]>
Date: 2017-02-24T21:46:18Z
Fixup tests
commit 539cc6cf630e9429e7131e755d8e9fa12479cd0c
Author: Ian Hummel <[email protected]>
Date: 2017-02-26T01:01:12Z
WIP
commit 3f76281094493d63b6364fe38612e56f437c6a7c
Author: Ian Hummel <[email protected]>
Date: 2017-02-27T21:26:48Z
Push delegation token out to ExecutorRunner
commit 25e7639af248bba4f648d13f5dc76a4fe8bfca34
Author: Ian Hummel <[email protected]>
Date: 2017-02-28T21:21:10Z
More wip... probably borked
commit 847f6044d2fd0bf1af52d3d7c5d618c8e537e916
Author: Ian Hummel <[email protected]>
Date: 2017-03-02T16:48:45Z
Untested... make cluster mode work with standalone
commit 4689a55402f193199faf2dc2e2c6c4c904e34bf0
Author: Ian Hummel <[email protected]>
Date: 2017-03-07T16:35:51Z
Hadoop FileInputFormat is hardcoded to request delegation tokens with
renewer = yarn.resourcemanager.principal
commit 3e85aa5bfbaee2760d9eb3559d23546508b463d9
Author: Ian Hummel <[email protected]>
Date: 2017-03-07T20:59:21Z
Still need to sort out a few things, but overall much smaller patch-set
commit f743e6b207b7f71034fe617a402f54e0121b13a2
Author: Ian Hummel <[email protected]>
Date: 2017-03-08T17:06:14Z
WIP
commit 31c91dcec25718052ae5c775bfe1b41359e8840f
Author: Ian Hummel <[email protected]>
Date: 2017-03-08T18:14:48Z
WIP
commit 19644195af14c9b8a451609157b9d47f7251ced4
Author: Ian Hummel <[email protected]>
Date: 2017-03-08T22:15:41Z
Still something isn't working
commit b5bacf31e00243073e7311b768a13aec51c6b9db
Author: Ian Hummel <[email protected]>
Date: 2017-03-15T15:56:41Z
Merge master
commit 83f05014659e08a4cd8c9703941c98aaaba9eb31
Author: Ian Hummel <[email protected]>
Date: 2017-03-15T20:28:56Z
Actually use credential updater
commit 917b077ca1e05a9bb44bcb91c33ed64a1d1c364c
Author: Ian Hummel <[email protected]>
Date: 2017-04-04T14:38:18Z
Change order of configuration setting so that everything works
commit a4c22a92496271935a769313b09da1b8ae88107a
Author: Ian Hummel <[email protected]>
Date: 2017-04-04T14:39:44Z
Merge branch 'master' into spark-5158
* master: (164 commits)
[SPARK-20198][SQL] Remove the inconsistency in table/function name
conventions in SparkSession.Catalog APIs
[SPARK-20190][APP-ID] applications//jobs' in rest api,status should be
[running|sâ¦
[SPARK-19825][R][ML] spark.ml R API for FPGrowth
[SPARK-20067][SQL] Unify and Clean Up Desc Commands Using Catalog
Interface
[SPARK-10364][SQL] Support Parquet logical type TIMESTAMP_MILLIS
[SPARK-19408][SQL] filter estimation on two columns of same table
[SPARK-20145] Fix range case insensitive bug in SQL
[SPARK-20194] Add support for partition pruning to in-memory catalog
[SPARK-19641][SQL] JSON schema inference in DROPMALFORMED mode produces
incorrect schema for non-array/object JSONs
[SPARK-19969][ML] Imputer doc and example
[SPARK-9002][CORE] KryoSerializer initialization does not include
'Array[Int]'
[MINOR][DOCS] Replace non-breaking space to normal spaces that breaks
rendering markdown
[SPARK-20166][SQL] Use XXX for ISO 8601 timezone instead of ZZ
(FastDateFormat specific) in CSV/JSON timeformat options
[SPARK-19985][ML] Fixed copy method for some ML Models
[SPARK-20159][SPARKR][SQL] Support all catalog API in R
[SPARK-20173][SQL][HIVE-THRIFTSERVER] Throw NullPointerException when
HiveThriftServer2 is shutdown
[SPARK-20123][BUILD] SPARK_HOME variable might have spaces in it(e.g.
$SPARKâ¦
[SPARK-20143][SQL] DataType.fromJson should throw an exception with
better message
[SPARK-20186][SQL] BroadcastHint should use child's stats
[SPARK-19148][SQL][FOLLOW-UP] do not expose the external table concept in
Catalog
...
commit 16f9551ef1073160f13e9522600416d29388b85b
Author: Ian Hummel <[email protected]>
Date: 2017-04-04T15:48:09Z
Remove inadvertent file
commit 246c76a82554ee20ed31202a6b12d92a823d68a2
Author: Ian Hummel <[email protected]>
Date: 2017-04-04T16:39:23Z
Cleanup code
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]