GitHub user andrewor14 reopened a pull request:

    https://github.com/apache/spark/pull/1472

    [SPARK-2454] Do not assume drivers and executors share the same Spark home

    **Problem.** When standalone Workers launch executors, they inherit the 
Spark home set by the driver. This means if the worker machines do not share 
the same directory structure as the driver node, the Workers will attempt to 
run scripts (e.g. `bin/compute-classpath.sh`) that do not exist locally and 
fail. This is a common scenario if the driver is launched from outside of the 
cluster.
    
    **Solution.** Simply do not pass the driver's Spark home to the Workers. 
Note that we should still send *some* Spark home to the Workers, in case there 
are multiple installations of Spark on the worker machines and the application 
wants to pick among them.
    
    **Spark config changes.**
    - `spark.home` - This is removed and deprecated. The motivation is that 
this is currently used for 3+ different things and is often confused with 
`SPARK_HOME`.
    - `spark.executor.home` - This is the Spark home that the executors will 
use. If this is not set, the Worker will use its own current working directory. 
This is not set by default.
    - `spark.driver.home` - Same as above, but for the driver. This is only 
relevant for standalone-cluster mode (not yet supported. See SPARK-2260).
    - `spark.test.home` - This is the Spark home used only for tests.
    
    Note: #1392 proposes part of the solution described here.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/andrewor14/spark spark-home

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1472.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1472
    
----
commit 75923697a08e035c8e46b53b67a9d98938212915
Author: Andrew Or <[email protected]>
Date:   2014-07-17T18:42:18Z

    Allow applications to specify their executor/driver spark homes
    
    This allows the worker to launch a driver or an executor from a
    different installation of Spark on the same machine. To do so, the
    user needs to set "spark.executor.home" and/or "spark.driver.home".
    
    Note that this was already possible for the executors even before
    this commit. However, it used to rely on "spark.home", which was
    also used for 20 other things. The next step is to remove all usages
    of "spark.home", which was confusing to many users (myself included).

commit b90444d65744174ba6105da23459218e90788644
Author: Andrew Or <[email protected]>
Date:   2014-07-17T21:33:44Z

    Remove / deprecate all occurrences of spark.home
    
    This involves replacing spark.home to spark.test.home in tests.
    Looks like python still uses spark.home, however. The next commit
    will fix this.

commit 2a64cfcc63023a7ded58421f094421e9a1067e10
Author: Andrew Or <[email protected]>
Date:   2014-07-17T21:52:48Z

    Remove usages of spark.home in python

commit 81710627925ee6cbd2099215efd17c3173b7bed8
Author: Andrew Or <[email protected]>
Date:   2014-07-17T22:02:58Z

    Add back *SparkContext functionality to setSparkHome
    
    This is because we cannot deprecate these constructors easily...

commit 2333c0ecb8ccd16a2c9dbf1a97ae58d7c6e708eb
Author: Andrew Or <[email protected]>
Date:   2014-07-17T22:04:46Z

    Minor deprecation message change

commit b94020e13917ae59b1f3d8954cdecc7089c77141
Author: Andrew Or <[email protected]>
Date:   2014-07-17T22:46:58Z

    Document spark.executor.home (but not spark.driver.home)
    
    ... because the only mode that uses spark.driver.home right now is
    standalone-cluster, which is broken (SPARK-2260). It makes little
    sense to document that this feature exists on a mode that is broken.

commit a50f0e74d3916a7fc7e178ba8391260d4127ba36
Author: Andrew Or <[email protected]>
Date:   2014-07-17T22:47:59Z

    Merge branch 'master' of github.com:apache/spark into spark-home

commit 953997a279f5cd4a7f47f07d5fd32ff65c59620d
Author: Andrew Or <[email protected]>
Date:   2014-07-21T20:57:38Z

    Merge branch 'master' of github.com:apache/spark into spark-home

commit 00147646ec8594caa8915c9a3fb329fcbe0042a4
Author: Andrew Or <[email protected]>
Date:   2014-07-22T01:28:18Z

    Fix tests that use local-cluster mode

commit ecdfa92fd33f19fc57e041e4269405c011a43261
Author: Andrew Or <[email protected]>
Date:   2014-07-22T01:28:32Z

    Formatting changes (minor)

commit c81f506639a88d789fd6736c11a9098901b394cd
Author: Andrew Or <[email protected]>
Date:   2014-07-22T01:28:50Z

    Merge branch 'master' of github.com:apache/spark into spark-home

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to