[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

vanzin Wed, 01 Nov 2017 10:52:40 -0700

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/19631


    [SPARK-22372][core, yarn] Make cluster submission use SparkApplication.

    The main goal of this change is to allow multiple cluster-mode
    submissions from the same JVM, without having them end up with
    mixed configuration. That is done by extending the SparkApplication
    trait, and doing so was reasonably trivial for standalone and
    mesos modes.
    
    For YARN mode, there was a complication. YARN used a "SPARK_YARN_MODE"
    system property to control behavior indirectly in a whole bunch of
    places, mainly in the SparkHadoopUtil / YarnSparkHadoopUtil classes.
    Most of the changes here are removing that.
    
    Since we removed support for Hadoop 1.x, some methods that lived in
    YarnSparkHadoopUtil can now live in SparkHadoopUtil. The remaining
    methods don't need to be part of the class, and can be called directly
    from the YarnSparkHadoopUtil object, so now there's a single
    implementation of SparkHadoopUtil.
    
    One remaining use case was fetching the external shuffle
    service port, which can come from the YARN configuration. That
    is now done by checking the master used to submit the app,
    instead of the system property.
    
    The other use case was the propagation of the auth secret.
    That was done by stashing the secret in the current user's
    credentials in YARN mode. Instead, the secret is now propagated
    using the config / env variable like for standalone and mesos.
    That allowed a few methods in SparkHadoopUtil to go away.
    There's still a little bit of code in SecurityManager that
    is currently YARN-specific, but that uses the conf's master
    to detect whether the app is a YARN app. This also has the
    benefit of not stashing the secret in a shared location (the
    current UGI), making sure different apps use different secrets.
    
    With those out of the way, actually changing the YARN client
    to extend SparkApplication was easy.
    
    Tested with existing unit tests, and also by running YARN apps
    with auth and kerberos both on and off in a real cluster.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-22372

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19631
    
----
commit c80554bed52f96ada2e1c4b6a16f5f3e5c7e5317
Author: Marcelo Vanzin <[email protected]>
Date:   2017-10-30T21:06:41Z

    [SPARK-22372][core, yarn] Make cluster submission use SparkApplication.
    
    The main goal of this change is to allow multiple cluster-mode
    submissions from the same JVM, without having them end up with
    mixed configuration. That is done by extending the SparkApplication
    trait, and doing so was reasonably trivial for standalone and
    mesos modes.
    
    For YARN mode, there was a complication. YARN used a "SPARK_YARN_MODE"
    system property to control behavior indirectly in a whole bunch of
    places, mainly in the SparkHadoopUtil / YarnSparkHadoopUtil classes.
    Most of the changes here are removing that.
    
    Since we removed support for Hadoop 1.x, some methods that lived in
    YarnSparkHadoopUtil can now live in SparkHadoopUtil. The remaining
    methods don't need to be part of the class, and can be called directly
    from the YarnSparkHadoopUtil object, so now there's a single
    implementation of SparkHadoopUtil.
    
    One remaining use case was fetching the external shuffle
    service port, which can come from the YARN configuration. That
    is now done by checking the master used to submit the app,
    instead of the system property.
    
    The other use case was the propagation of the auth secret.
    That was done by stashing the secret in the current user's
    credentials in YARN mode. Instead, the secret is now propagated
    using the config / env variable like for standalone and mesos.
    That allowed a few methods in SparkHadoopUtil to go away.
    There's still a little bit of code in SecurityManager that
    is currently YARN-specific, but that uses the conf's master
    to detect whether the app is a YARN app. This also has the
    benefit of not stashing the secret in a shared location (the
    current UGI), making sure different apps use different secrets.
    
    With those out of the way, actually changing the YARN client
    to extend SparkApplication was easy.
    
    Tested with existing unit tests, and also by running YARN apps
    with auth and kerberos both on and off in a real cluster.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19631: [SPARK-22372][core, yarn] Make cluster submission...

Reply via email to