Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23055#discussion_r236483417
  
    --- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
    @@ -74,8 +74,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
       private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
       // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
       // number of concurrent tasks, which is determined by the number of 
cores in this executor.
    -  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
    +  private val memoryMb = if (Utils.isWindows) {
    --- End diff --
    
    > Strictly we should move it into JVM rather then adding more controls at 
workers.
    
    There's a reason why the value is sent to the worker. It enables a feature 
that, when available, gives you better error information.
    
    With the python code, if the app runs over the specified limit, you'll get 
an error from python saying it's using more memory than it should.
    
    Without it, you'll get a generic error from the resource manager that your 
app exceeded its memory allocation, and you wont' know exactly what caused it 
(java? python? something else?).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to