Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/3651#issuecomment-68074101
I did a quick `git grep` through the codebase to find uses of `SPARK_HOME`
and it looks like there's only a few places where it's read:
SparkContext, which is a fallback if `spark.home` is not set:
```
core/src/main/scala/org/apache/spark/SparkContext.scala- * Get Spark's
home location from either a value set through the constructor,
core/src/main/scala/org/apache/spark/SparkContext.scala- * or the
spark.home Java property, or the SPARK_HOME environment variable
core/src/main/scala/org/apache/spark/SparkContext.scala- * (in that order
of preference). If neither of these is set, return None.
core/src/main/scala/org/apache/spark/SparkContext.scala- */
core/src/main/scala/org/apache/spark/SparkContext.scala- private[spark]
def getSparkHome(): Option[String] = {
core/src/main/scala/org/apache/spark/SparkContext.scala:
conf.getOption("spark.home").orElse(Option(System.getenv("SPARK_HOME")))
core/src/main/scala/org/apache/spark/SparkContext.scala- }
core/src/main/scala/org/apache/spark/SparkContext.scala-
core/src/main/scala/org/apache/spark/SparkContext.scala- /**
core/src/main/scala/org/apache/spark/SparkContext.scala- * Set the
thread-local property for overriding the call sites
core/src/main/scala/org/apache/spark/SparkContext.scala- * of actions and
RDDs.
```
PythonUtils, with no fallback:
```
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-private[spark]
object PythonUtils {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala- /** Get
the PYTHONPATH for PySpark, either from SPARK_HOME, if it is set, or from our
JAR */
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala- def
sparkPythonPath: String = {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala- val
pythonPath = new ArrayBuffer[String]
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala: for
(sparkHome <- sys.env.get("SPARK_HOME")) {
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath += Seq(sparkHome, "python").mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath += Seq(sparkHome, "python", "lib",
"py4j-0.8.2.1-src.zip").mkString(File.separator)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala- }
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath ++= SparkContext.jarOfObject(this)
core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala-
pythonPath.mkString(File.pathSeparator)
```
FaultToleranceTest, which isn't actually run in our tests (since it needs a
bunch of manual Docker setup to work):
```
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala- val
zk = SparkCuratorUtil.newClient(conf)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala- var
numPassed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala- var
numFailed = 0
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala: val
sparkHome = System.getenv("SPARK_HOME")
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
assertTrue(sparkHome != null, "Run with a valid SPARK_HOME")
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala- val
containerSparkHome = "/opt/spark"
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala- val
dockerMountDir = "%s:%s".format(sparkHome, containerSparkHome)
core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala-
```
SparkSubmitArguments, which uses this without a fallback:
```
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala- */
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
private def mergeSparkProperties(): Unit = {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
// Use common defaults file, if not specified by user
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
if (propertiesFile == null) {
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
val sep = File.separator
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala:
val sparkHomeConfig = env.get("SPARK_HOME").map(sparkHome =>
s"${sparkHome}${sep}conf")
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
val confDir = env.get("SPARK_CONF_DIR").orElse(sparkHomeConfig)
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
confDir.foreach { sparkConfDir =>
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
val defaultPath = s"${sparkConfDir}${sep}spark-defaults.conf"
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala-
val file = new File(defaultPath)
```
Worker.scala, where this is overridden by `spark.test.home`:
```
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- val
sparkHome =
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- if
(testing) {
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala-
assert(sys.props.contains("spark.test.home"), "spark.test.home is not set!")
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- new
File(sys.props("spark.test.home"))
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- } else {
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala: new
File(sys.env.get("SPARK_HOME").getOrElse("."))
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- }
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- var
workDir: File = null
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- val
executors = new HashMap[String, ExecutorRunner]
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- val
finishedExecutors = new HashMap[String, ExecutorRunner]
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala- val
drivers = new HashMap[String, DriverRunner]
```
There are also a few PySpark tests that use it, but the `./bin/pyspark`
script takes care of setting up the proper `SPARK_HOME` variable.
So, it looks like there a couple of usages that are potentially hazardous
(PythonUtils and SparkSubmitArguments), but they're not in the YARN or REPL
packages.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]