Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/21977
  
    > So if users don't set this conf, the behavior is the same as before, 
right?
    
    Yes.
    
    > My concern is that meaning of the overhead parameter becomes pretty 
confusing.
    
    I think it's easier to reason about, but that's because I think that most 
users don't really know what the overhead is for besides python anyway.
    
    > More general brainstorming -- I suppose there is no way to give python 
hint to gc more often? This is sort of like moving from the 
UnifiedMemoryManager back to the Static one, as now you put in a hard barrier.
    
    There are other python memory limits, like heap size. We found that the 
heap size limit isn't enforced and YARN would still kill the process for 
exceeding its allocation. We also didn't see MemoryError, which is what signals 
to the user that Python is responsible. So I think this is the best we can do 
for now. We might be able to play around with other limits to get it to gc more 
often, but we need the hard limit to keep the errors in python and not in YARN.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to