Github user zjffdu commented on a diff in the pull request:
https://github.com/apache/spark/pull/13599#discussion_r164037239
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
---
@@ -98,7 +98,7 @@ class CoarseGrainedSchedulerBackend(scheduler:
TaskSchedulerImpl, val rpcEnv: Rp
private val reviveThread =
ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-revive-thread")
- class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties:
Seq[(String, String)])
+ class DriverEndpoint(override val rpcEnv: RpcEnv)
--- End diff --
Without this change, the following scenario won't work.
1. Launch spark app
2. call `sc.install_packages("numpy")`
3. run `sc.range(3).map(lambda x: np.__version__).collect()`
4. Restart executor (by kill it, scheduler will scheduler another executor)
5. run `sc.range(3).map(lambda x: np.__version___.collect()` again, this
time it would fail. Because the new scheduled executor can not set up
virtualenv correctly as it can not get the updated
`spark.pyspark.virtualenv.packages`.
That's why make this change in core. Now executor would always get the
updated SparkConf instead of the SparkConf created when spark app is started.
There's some overhead, but I believe it is very trivial, and could be
improved later.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]