[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

rdblue Wed, 08 Aug 2018 09:19:33 -0700

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21977
  
    @squito, this is much more clear for our user base. Right now, they can 
control the YARN container allocation to make room for python by increasing the 
overhead, but that does nothing to actually limit python to some defined space. 
We've found that python requires a lot less memory than it actually uses 
because it doesn't know when to GC. If we only had overhead, then we wouldn't 
know what to limit python to.
    
    If we made python memory a subset of overhead, then we would see a lot more 
people misconfiguring jobs that don't use python when they copy another job's 
settings. This way we can avoid requesting this memory if the job isn't 
PySpark. I also think it is more clear to allocate memory to the JVM, python, 
and overhead separately. That way executor memory and python executor memory 
are similar and you don't have to remember which one requires you to bump up 
overhead as well.
    
    For supported platforms, I think that it's only windows that doesn't 
support the limits. Even on systems that don't support the limit, explicitly 
allocating memory to python is better because users see something to increase 
when memory runs out, instead of needing to know that they should increase some 
generic overhead setting.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

Reply via email to