Github user zjffdu commented on a diff in the pull request: https://github.com/apache/spark/pull/13599#discussion_r164037239 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -98,7 +98,7 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp private val reviveThread = ThreadUtils.newDaemonSingleThreadScheduledExecutor("driver-revive-thread") - class DriverEndpoint(override val rpcEnv: RpcEnv, sparkProperties: Seq[(String, String)]) + class DriverEndpoint(override val rpcEnv: RpcEnv) --- End diff -- Without this change, the following scenario won't work. 1. Launch spark app 2. call `sc.install_packages("numpy")` 3. run `sc.range(3).map(lambda x: np.__version__).collect()` 4. Restart executor (by kill it, scheduler will scheduler another executor) 5. run `sc.range(3).map(lambda x: np.__version___.collect()` again, this time it would fail. Because the new scheduled executor can not set up virtualenv correctly as it can not get the updated `spark.pyspark.virtualenv.packages`. That's why make this change in core. Now executor would always get the updated SparkConf instead of the SparkConf created when spark app is started. There's some overhead, but I believe it is very trivial, and could be improved later.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org