GitHub user andrewor14 reopened a pull request:
https://github.com/apache/spark/pull/1472
[SPARK-2454] Do not assume drivers and executors share the same Spark home
**Problem.** When standalone Workers launch executors, they inherit the
Spark home set by the driver. This means if the worker machines do not share
the same directory structure as the driver node, the Workers will attempt to
run scripts (e.g. `bin/compute-classpath.sh`) that do not exist locally and
fail. This is a common scenario if the driver is launched from outside of the
cluster.
**Solution.** Simply do not pass the driver's Spark home to the Workers.
Note that we should still send *some* Spark home to the Workers, in case there
are multiple installations of Spark on the worker machines and the application
wants to pick among them.
**Spark config changes.**
- `spark.home` - This is removed and deprecated. The motivation is that
this is currently used for 3+ different things and is often confused with
`SPARK_HOME`.
- `spark.executor.home` - This is the Spark home that the executors will
use. If this is not set, the Worker will use its own current working directory.
This is not set by default.
- `spark.driver.home` - Same as above, but for the driver. This is only
relevant for standalone-cluster mode (not yet supported. See SPARK-2260).
- `spark.test.home` - This is the Spark home used only for tests.
Note: #1392 proposes part of the solution described here.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/andrewor14/spark spark-home
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1472.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1472
----
commit 75923697a08e035c8e46b53b67a9d98938212915
Author: Andrew Or <[email protected]>
Date: 2014-07-17T18:42:18Z
Allow applications to specify their executor/driver spark homes
This allows the worker to launch a driver or an executor from a
different installation of Spark on the same machine. To do so, the
user needs to set "spark.executor.home" and/or "spark.driver.home".
Note that this was already possible for the executors even before
this commit. However, it used to rely on "spark.home", which was
also used for 20 other things. The next step is to remove all usages
of "spark.home", which was confusing to many users (myself included).
commit b90444d65744174ba6105da23459218e90788644
Author: Andrew Or <[email protected]>
Date: 2014-07-17T21:33:44Z
Remove / deprecate all occurrences of spark.home
This involves replacing spark.home to spark.test.home in tests.
Looks like python still uses spark.home, however. The next commit
will fix this.
commit 2a64cfcc63023a7ded58421f094421e9a1067e10
Author: Andrew Or <[email protected]>
Date: 2014-07-17T21:52:48Z
Remove usages of spark.home in python
commit 81710627925ee6cbd2099215efd17c3173b7bed8
Author: Andrew Or <[email protected]>
Date: 2014-07-17T22:02:58Z
Add back *SparkContext functionality to setSparkHome
This is because we cannot deprecate these constructors easily...
commit 2333c0ecb8ccd16a2c9dbf1a97ae58d7c6e708eb
Author: Andrew Or <[email protected]>
Date: 2014-07-17T22:04:46Z
Minor deprecation message change
commit b94020e13917ae59b1f3d8954cdecc7089c77141
Author: Andrew Or <[email protected]>
Date: 2014-07-17T22:46:58Z
Document spark.executor.home (but not spark.driver.home)
... because the only mode that uses spark.driver.home right now is
standalone-cluster, which is broken (SPARK-2260). It makes little
sense to document that this feature exists on a mode that is broken.
commit a50f0e74d3916a7fc7e178ba8391260d4127ba36
Author: Andrew Or <[email protected]>
Date: 2014-07-17T22:47:59Z
Merge branch 'master' of github.com:apache/spark into spark-home
commit 953997a279f5cd4a7f47f07d5fd32ff65c59620d
Author: Andrew Or <[email protected]>
Date: 2014-07-21T20:57:38Z
Merge branch 'master' of github.com:apache/spark into spark-home
commit 00147646ec8594caa8915c9a3fb329fcbe0042a4
Author: Andrew Or <[email protected]>
Date: 2014-07-22T01:28:18Z
Fix tests that use local-cluster mode
commit ecdfa92fd33f19fc57e041e4269405c011a43261
Author: Andrew Or <[email protected]>
Date: 2014-07-22T01:28:32Z
Formatting changes (minor)
commit c81f506639a88d789fd6736c11a9098901b394cd
Author: Andrew Or <[email protected]>
Date: 2014-07-22T01:28:50Z
Merge branch 'master' of github.com:apache/spark into spark-home
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---