Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/20151#discussion_r159819670
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -34,17 +34,25 @@ private[spark] class PythonWorkerFactory(pythonExec:
String, envVars: Map[String
import PythonWorkerFactory._
- // Because forking processes from Java is expensive, we prefer to launch
a single Python daemon
- // (pyspark/daemon.py) and tell it to fork new workers for our tasks.
This daemon currently
- // only works on UNIX-based systems now because it uses signals for
child management, so we can
- // also fall back to launching workers (pyspark/worker.py) directly.
+ // Because forking processes from Java is expensive, we prefer to launch
a single Python daemon,
+ // pyspark/daemon.py (by default) and tell it to fork new workers for
our tasks. This daemon
+ // currently only works on UNIX-based systems now because it uses
signals for child management,
+ // so we can also fall back to launching workers, pyspark/worker.py (by
default) directly.
val useDaemon = {
val useDaemonEnabled =
SparkEnv.get.conf.getBoolean("spark.python.use.daemon", true)
// This flag is ignored on Windows as it's unable to fork.
!System.getProperty("os.name").startsWith("Windows") &&
useDaemonEnabled
}
+ // This configuration indicates the module to run the daemon to execute
its Python workers.
+ val daemonModule = SparkEnv.get.conf.get("spark.python.daemon.module",
"pyspark.daemon")
--- End diff --
generally, I thought we use the name "command" as what we call the thing to
execute
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]