Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/23055#discussion_r235226292
--- Diff:
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
private val reuseWorker = conf.getBoolean("spark.python.worker.reuse",
true)
// each python worker gets an equal part of the allocation. the worker
pool will grow to the
// number of concurrent tasks, which is determined by the number of
cores in this executor.
- private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
+ private val memoryMb = if (Utils.isWindows) {
--- End diff --
I am doing this because:
1. The code consistency - usually the configuration is dealt with in JVM
side if possible - otherwise we should send the configuration to worker side
and the process it in Python worker side ideally. What we need to do is disable
the configuration on a certain condition explicitly.
2. If we do it only in Python side, it's going to introduce the third
status in JVM (`memoryMb`). When the configuration value exists in JVM (which
means it's enabled) but the functionaility is disabled in Python side. Dealing
with it will reduce the complexity.
Why are you so against about disabling it in JVM side? I don't see big
downside of doing this. I added a flag as you said as well. For clarification,
it's a rather minor point.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]