GitHub user JoshRosen opened a pull request:
https://github.com/apache/spark/pull/2002
[SPARK-2974] [SPARK-2975] Fix two bugs related to spark.local.dirs
This PR fixes two bugs related to `spark.local.dirs` and
`SPARK_LOCAL_DIRS`, one where `Utils.getLocalDir()` might return an invalid
directory (SPARK-2974) and another where the `SPARK_LOCAL_DIRS` override didn't
affect the driver, which could cause problems when running tasks in local mode
(SPARK-2975).
This patch fixes both issues: the new `Utils.getOrCreateLocalRootDirs(conf:
SparkConf)` utility method manages the creation of local directories and
handles the precedence among the different configuration options, so we should
see the same behavior whether we're running in local mode or on a worker.
It's kind of a pain to mock out environment variables in tests (no easy way
to mock System.getenv), so I added a `private[spark]` method to SparkConf for
accessing environment variables (by default, it just delegates to
System.getenv). By subclassing SparkConf and overriding this method, we can
mock out SPARK_LOCAL_DIRS in tests.
I also fixed a typo in PySpark where we used `SPARK_LOCAL_DIR` instead of
`SPARK_LOCAL_DIRS` (I think this was technically innocuous, but it seemed worth
fixing).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/JoshRosen/spark local-dirs
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2002.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2002
----
commit 6d9259baab9cc7f292b6889a7e2484047fb7bdc5
Author: Josh Rosen <[email protected]>
Date: 2014-08-17T03:17:23Z
Fix typo in PySpark: SPARK_LOCAL_DIR should be SPARK_LOCAL_DIRS
commit 007298bd1916f1cede3a026313bf56ab90a10a8d
Author: Josh Rosen <[email protected]>
Date: 2014-08-17T03:18:10Z
Allow environment variables to be mocked in tests.
commit b2c473679eecdc99c39ec5d7520a98837561e268
Author: Josh Rosen <[email protected]>
Date: 2014-08-17T03:23:01Z
Add failing tests for SPARK-2974 and SPARK-2975.
commit 3e92d44db372c51fa6db3b962372b42bdeeec1c4
Author: Josh Rosen <[email protected]>
Date: 2014-08-17T19:47:56Z
Move local dirs override logic into Utils; fix bugs:
Now, the logic for determining the precedence of the different
configuration options is in Utils.getOrCreateLocalRootDirs().
DiskBlockManager now accepts a SparkConf rather than a list of root
directories and Iâve updated other tests to reflect this.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]