Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/21468#discussion_r197971721
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
---
@@ -811,10 +811,18 @@ private[spark] class Client(
// Finally, update the Spark config to propagate PYTHONPATH to the AM
and executors.
if (pythonPath.nonEmpty) {
- val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+ val pythonPathStr = (sys.env.get("PYTHONPATH") ++=: pythonPath)
.mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
- env("PYTHONPATH") = pythonPathStr
- sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+ val newValue =
+ if (env.contains("PYTHONPATH")) {
+ env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR +
pythonPathStr
+ } else {
+ pythonPathStr
+ }
+ env("PYTHONPATH") = newValue
+ if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) {
--- End diff --
I see that the previous code was overriding this in the executor env; but
perhaps the right thing here is to concatenate them, otherwise the executor
might be missing the py4j/pyspark stuff this class adds.
So, basically, what you want is:
- driver: env.get(pp) ++ sys.env.get(pp) ++ pythonPath
- executor: pythonPath ++ sparkConf.getExecutorEnv(pp)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]