Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/19046
  
    > The MR AM does something similar.
    
    Can you be more specific here, which exact code are you referring to?  MR 
am does do some headroom calculation for things like slow start and preempting 
reducers to run maps.  I don't think it does it otherwise.
    
    From the comment:
    Maps are scheduled as soon as their requests are received. Reduces are 
      added to the pending and are ramped up (added to scheduled) based 
      on completed maps and current availability in the cluster.
    
    Spark doesn't have slow start, (doesn't start next stage tasks before 
previous stage has finished) so if you look at the code you see:  
    //if all maps are assigned, then ramp up all reduces irrespective of the 
headroom
    
    > Why? The only downside you mention is that if the cluster is completely 
full, then another app may be making requests and may get a container sooner 
than Spark. That sounds like an edge case where you really should be isolating 
those apps into separate queues if such resource contention is a problem.
    
    I disagree. We run some clusters very busy and I think if you don't ask for 
the containers up front, you will rarely get the available headroom to ask for 
more. Whereas if you have some container requests in there when other 
containers free up your application will get them since you are next in line.  
users should not have to use different dedicated queues for this to work.  This 
could just be an adhoc queue but the spark users would lose out to 
tez/mapreduce users.   I'm pretty positive this will hurt spark users on some 
of our cluster so would want performance numbers to prove it doesn't.  
Otherwise would want the config to turn it off.  Another way to possibly help 
this problem would be to ask for a certain percentage over the actual limit.
    
    > If all of those containers are actually allocated to apps, or your 
application's queue actually limits the number of containers your app can get 
to a much smaller number, then yes it can help.
    
     If the problem is the RM memory issue, then how does this make it better?  
I still do 50,000 container requests, sure they do eventually get allocated if 
the cluster is large enough but I still had to do that request and RM had to 
handle it.  Yes when they get allocated they do go away so I guess it could be 
a bit better, but either way the RM has to handle the worse case so it has to 
handle 50,000 container requests from possibly more then 1 app.  I don't see 
this helping anything for large clusters.
    
    > Wouldn't that mean that YARN would actually allocate resources you don't 
need? 
    
    No you ask immediately for all the containers you are going to need for 
that stage.  You would start them unless of course by the time you get them you 
don't need them anymore.    If you happen to have tasks that finish fast, then 
you would have to cancel the container requests or give them back, but if they 
don't you aren't wasting anything.  I'm not sure how much it matters, we ramp 
up the ask pretty quickly, but if that is causing a lot of overhead in the 
spark driver then perhaps we should switch to do that or do some analysis on 
it.   
    
    Again that comes down to the actual problem we are trying to solve here.   
Do you know was the problem purely the total # of container requests, the rate 
at which it was making requests, or other?  What was it causing on the RM?  
memory usage, scheduler slow down? 
    
    this feels to me like a hack for small clusters.  If you have a small 
cluster, limit the max # of containers to ask for in the first place.  I do 
that even on our large clusters and personally would be ok with changing the 
default in Spark.  If you are still seeing a lot of yarn issues with it, then I 
would be ok putting this in, perhaps with the change to ask for a few over your 
limit but still would want a config for it unless we had some pretty positive 
performance numbers on a large cluster.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to