GitHub user jerryshao opened a pull request:
https://github.com/apache/spark/pull/14065
[SPARK-16342][YARN][WIP] Add a configurable token manager for Spark running
on YARN
## What changes were proposed in this pull request?
Add a configurable token manager for Spark on running on yarn.
### Current Problems ###
1. Supported token provider is hard-coded, currently only hdfs, hbase and
hive are supported and it is impossible for user to add new token provider
without code changes.
2. Also this problem exits in timely token renewer and updater.
### Changes In This Proposal ###
In this proposal, to address the problems mentioned above and make the
current code more cleaner and easier to understand, mainly has 3 changes:
1. Abstract a `ServiceTokenProvider` as well as `ServiceTokenRenewable`
interface for token provider. Each service wants to communicate with Spark
through token way needs to implement this interface.
2. Provide a `ConfigurableTokenManager` to manage all the register token
providers, also token renewer and updater. Also this class offers the API for
other modules to obtain tokens, get renewal interval and so on.
3. Implement 3 built-in token providers `HDFSTokenProvider`,
`HiveTokenProvider` and `HBaseTokenProvider` to keep the same semantics as
supported today. Whether to load in these built-in token providers is
controlled by configuration "spark.yarn.security.tokens.${service}.enabled", by
default for all the built-in token providers are loaded.
### Behavior Changes ###
For the end user there's no behavior change, we still use the same
configuration `spark.yarn.security.tokens.${service}.enabled` to decide which
token provider is enabled (hbase or hive).
For user implemented token provider (assume the name of token provider is
"test") needs to add into this class should have two configurations:
1. `spark.yarn.security.tokens.test.enabled` to true
2. `spark.yarn.security.tokens.test.class` to the full qualified class name.
So we still keep the same semantics as current code while add one new
configuration.
### Current Status ###
- [x] token provider interface and management framework.
- [x] implement built-in token providers (hdfs, hbase, hive).
- [ ] Coverage of unit test.
- [ ] Integrated test with security cluster.
## How was this patch tested?
Unit test and integrated test.
Please suggest and review, any comment is greatly appreciated.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jerryshao/apache-spark SPARK-16342
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14065.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14065
----
commit 9e9311cd956eb0b2f900625b042c5c22d1016a08
Author: jerryshao <[email protected]>
Date: 2016-07-01T09:40:41Z
Add ConfigurableTokenManager initial commit
commit 3aaf0706d71321c2c150e0ddd21fee1cd218a4e1
Author: jerryshao <[email protected]>
Date: 2016-07-05T06:07:58Z
Further change on ConfigurableTokenManager
commit 9e8702140614f0551cad889e26f98ed36d7f6f15
Author: jerryshao <[email protected]>
Date: 2016-07-05T09:18:43Z
Some refactory works and unit test added
commit 8c0821b2074799a05c3dbb448368b3f195eff661
Author: jerryshao <[email protected]>
Date: 2016-07-06T06:38:29Z
Add more unit tests
commit 90f194e34ffd198d2b1ee5b04f24afd8c4454d90
Author: jerryshao <[email protected]>
Date: 2016-07-06T07:01:31Z
Add more comments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]