Github user squito commented on the issue:

    https://github.com/apache/spark/pull/21977
  
    > We've found that python requires a lot less memory than it actually uses 
because it doesn't know when to GC
    
    yes, totally agree, sorry I wasn't clear in my initial comment -- overall I 
think this is a great idea!
    
    > If we made python memory a subset of overhead, then we would see a lot 
more people misconfiguring jobs that don't use python when they copy another 
job's settings. This way we can avoid requesting this memory if the job isn't 
PySpark. I also think it is more clear to allocate memory to the JVM, python, 
and overhead separately. That way executor memory and python executor memory 
are similar and you don't have to remember which one requires you to bump up 
overhead as well.
    
    while I agree with this to some extent, when users copy configs they 
already get memory horribly wrong, they really just need to understand what 
their job is doing. My concern is that meaning of the overhead parameter 
becomes pretty confusing.  Its (offheap JVM) + (any external process), unless 
you have this new python conf set, in which case its offheap JVM + (any 
external process other than python), though yarn still monitors based on 
everything combined.  Maybe thats unavoidable.
    
    So if users don't set this conf, the behavior is the same as before, right? 
 And when they want to take advantage of it, they change their confs to just 
move memory from the overhead to the new conf?  I think I'm OK with it then, I 
thought this was doing something else on the first read.
    
    More general brainstorming -- I suppose there is no way to give python hint 
to gc more often?  This is sort of like moving from the UnifiedMemoryManager 
back to the Static one, as now you put in a hard barrier.  Seems worth it 
anyway, just thinking about what this means.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to