Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    I would like to clarify why we are doing this. The jira has some discussion 
on it but I would like to know exactly what we are improving/fixing?    
    
    If we do this change, I definitely want a config for it and prefer off by 
default.  As I mentioned in the jira if you don't ask for containers then when 
containers are freed up you might miss them because yarn gives them to someone 
else already asking. This can be an issue on a very busy cluster and I see many 
apps getting worse performance.
    
    The reasons I've seen mentioned in the jira are for memory in the AM and 
events in the spark driver. This change isn't going to help anything if your 
cluster is actually large enough to get say 50000 containers.  Memory is still 
going to be used. Hence why I want to define what the problem is.
    
    I'm also not sure why we just don't ask for all the containers up front. I 
was going to take a look at this to see if any info originally but never got 
around to it. I think it was just being cautious on the first implementation.   
Asking up front all at once would help with us having to ask for containers on 
every heartbeat, and I assume help the event processing issues they mention in 
the driver (but I'm not sure since there was no details).  The only exception 
is if you are cancelling requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to