GitHub user vanzin opened a pull request:
https://github.com/apache/spark/pull/19631
[SPARK-22372][core, yarn] Make cluster submission use SparkApplication.
The main goal of this change is to allow multiple cluster-mode
submissions from the same JVM, without having them end up with
mixed configuration. That is done by extending the SparkApplication
trait, and doing so was reasonably trivial for standalone and
mesos modes.
For YARN mode, there was a complication. YARN used a "SPARK_YARN_MODE"
system property to control behavior indirectly in a whole bunch of
places, mainly in the SparkHadoopUtil / YarnSparkHadoopUtil classes.
Most of the changes here are removing that.
Since we removed support for Hadoop 1.x, some methods that lived in
YarnSparkHadoopUtil can now live in SparkHadoopUtil. The remaining
methods don't need to be part of the class, and can be called directly
from the YarnSparkHadoopUtil object, so now there's a single
implementation of SparkHadoopUtil.
One remaining use case was fetching the external shuffle
service port, which can come from the YARN configuration. That
is now done by checking the master used to submit the app,
instead of the system property.
The other use case was the propagation of the auth secret.
That was done by stashing the secret in the current user's
credentials in YARN mode. Instead, the secret is now propagated
using the config / env variable like for standalone and mesos.
That allowed a few methods in SparkHadoopUtil to go away.
There's still a little bit of code in SecurityManager that
is currently YARN-specific, but that uses the conf's master
to detect whether the app is a YARN app. This also has the
benefit of not stashing the secret in a shared location (the
current UGI), making sure different apps use different secrets.
With those out of the way, actually changing the YARN client
to extend SparkApplication was easy.
Tested with existing unit tests, and also by running YARN apps
with auth and kerberos both on and off in a real cluster.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/vanzin/spark SPARK-22372
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19631.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19631
----
commit c80554bed52f96ada2e1c4b6a16f5f3e5c7e5317
Author: Marcelo Vanzin <[email protected]>
Date: 2017-10-30T21:06:41Z
[SPARK-22372][core, yarn] Make cluster submission use SparkApplication.
The main goal of this change is to allow multiple cluster-mode
submissions from the same JVM, without having them end up with
mixed configuration. That is done by extending the SparkApplication
trait, and doing so was reasonably trivial for standalone and
mesos modes.
For YARN mode, there was a complication. YARN used a "SPARK_YARN_MODE"
system property to control behavior indirectly in a whole bunch of
places, mainly in the SparkHadoopUtil / YarnSparkHadoopUtil classes.
Most of the changes here are removing that.
Since we removed support for Hadoop 1.x, some methods that lived in
YarnSparkHadoopUtil can now live in SparkHadoopUtil. The remaining
methods don't need to be part of the class, and can be called directly
from the YarnSparkHadoopUtil object, so now there's a single
implementation of SparkHadoopUtil.
One remaining use case was fetching the external shuffle
service port, which can come from the YARN configuration. That
is now done by checking the master used to submit the app,
instead of the system property.
The other use case was the propagation of the auth secret.
That was done by stashing the secret in the current user's
credentials in YARN mode. Instead, the secret is now propagated
using the config / env variable like for standalone and mesos.
That allowed a few methods in SparkHadoopUtil to go away.
There's still a little bit of code in SecurityManager that
is currently YARN-specific, but that uses the conf's master
to detect whether the app is a YARN app. This also has the
benefit of not stashing the secret in a shared location (the
current UGI), making sure different apps use different secrets.
With those out of the way, actually changing the YARN client
to extend SparkApplication was easy.
Tested with existing unit tests, and also by running YARN apps
with auth and kerberos both on and off in a real cluster.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]