Github user rdblue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23055#discussion_r236345625
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
    @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
       private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
       // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
       // number of concurrent tasks, which is determined by the number of 
cores in this executor.
    -  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
    +  private val memoryMb = if (Utils.isWindows) {
    --- End diff --
    
    > functionality is disabled in Python side
    
    The only functionality that is disabled is limiting the memory space. The 
allocation for Python is still requested from resource managers. Setting the 
environment property tells python how much memory it was allocated, no matter 
how that is used or enforced.
    
    > code consistency - usually the configuration is dealt with in JVM side if 
possible
    
    The JVM is handling the setting by requesting that memory for python and 
passing on the amount requested to python. The fact that the python process 
can't limit doesn't affect how the JVM side should behave. This needlessly 
couples JVM and python behavior with an assumption that may not be true in the 
future.
    
    > Why are you so against about disabling it in JVM side?
    
    There is no benefit to disabling this. It is more code with no purpose and 
it makes assumptions about what python can or cannot do that aren't obvious.
    
    What if pandas implements some method to spill to disk to limit memory 
consumption? Will implementers of that future feature know that the environment 
variable is not set when running in windows? This adds complexity for no 
benefit because it doesn't change either the resource allocation in the JVM or 
the behavior of the python process. It only avoids sending valuable 
information. I see no reason for it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to