GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/15981
[SPARK-18547][core] Propagate I/O encryption key when executors register.
This change modifies the method used to propagate encryption keys used
during
shuffle. Instead of relying on YARN's UserGroupInformation credential
propagation,
this change explicitly distributes the key using the messages exchanged
between
driver and executor during registration. When RPC encryption is enabled,
this means
key propagation is also secure.
This allows shuffle encryption to work in non-YARN mode, which means that
it's
easier to write unit tests for areas of the code that are affected by the
feature.
The key is stored in the SecurityManager; because there are many instances
of
that class used in the code, the key is only guaranteed to exist in the
instance
managed by the SparkEnv. This path was chosen to avoid storing the key in
the
SparkConf, which would risk having the key being written to disk as part of
the
configuration (as, for example, is done when starting YARN applications).
Tested by new and existing unit tests (which were moved from the YARN
module to
core), and by running apps with shuffle encryption enabled.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-18547
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15981.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15981
----
commit 7ed0d7c0312224252768b6f463603e57ca5e65d4
Author: Marcelo Vanzin <[email protected]>
Date: 2016-11-20T02:02:56Z
[SPARK-18547][core] Propagate I/O encryption key when executors register.
This change modifies the method used to propagate encryption keys used
during
shuffle. Instead of relying on YARN's UserGroupInformation credential
propagation,
this change explicitly distributes the key using the messages exchanged
between
driver and executor during registration. When RPC encryption is enabled,
this means
key propagation is also secure.
This allows shuffle encryption to work in non-YARN mode, which means that
it's
easier to write unit tests for areas of the code that are affected by the
feature.
The key is stored in the SecurityManager; because there are many instances
of
that class used in the code, the key is only guaranteed to exist in the
instance
managed by the SparkEnv. This path was chosen to avoid storing the key in
the
SparkConf, which would risk having the key being written to disk as part of
the
configuration (as, for example, is done when starting YARN applications).
Test by new and existing unit tests (which were moved from the YARN module
to
core), and by running apps with shuffle encryption enabled.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]