[ 
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250137#comment-15250137
 ] 

Nathan Roberts commented on YARN-4963:
--------------------------------------

bq. IMO, I think application specific configurations should be there rather at 
scheduler level. Some applications are fine with assigning containers in 
off_switch they can specify number of containers to be assigned. But few 
applications are very strict to node locality, they can configure 1 in 
off_switch.

bq. Even i feel the same, any specfic reason it has been set only at the 
scheduler level other than the AMRM interface change ? We can keep the default 
value as 1 so that its still compatible. Also anyway allocation happens within 
app's & queue's capacity limits so i feel it would be ideal for app to decide 
how many allocations in off_switch node. thoughts ?

Thanks [~Naganarasimha], [~rohithsharma], [~leftnoteasy] for the comments. I 
think we're all in agreement that there needs to be some control at the 
application level for things like OFF_SWITCH allocations, and locality delays 
(That's what #2 was going for and I think that should be a separate jira if 
folks are agreeable to that.) This new feature will require some discussion:
- The current value of 1 is not a good value for almost all applications so I 
think when we do the application-level support the default would need to be 
either unlimited or some high value, otherwise we force all applications to set 
this limit to something other than 1 to get decent OFF_SWITCH scheduling 
behavior.
- This setting not only affects the application at hand, but can also affect 
the entire system. I can see many cases where applications will relax these 
settings significantly so that their application schedules faster, however that 
may not have been the right thing for the system as a whole. Sure, my 
application scheduled very quickly but my locality was terrible so I caused a 
lot of unnecessary cross-switch traffic. So I think we'll need some 
system-minimums that will prevent this type of abuse. 
- These changes would potentially affect the fifo-ness of the queues. If 
application A meets its OFF-SWITCH-per-node limit, do we offer the node to 
other applications in the same queue? 

So my suggestion is:
1) Have this jira make the system-level OFF-SWITCH check  configurable so 
admins can easily crank this up and dramatically improve scheduling rate. 
2) Have a second jira to address per-application settings for things like 
locality_delay and off_switch limits.

Reasonable?





> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat 
> configurable
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-4963
>                 URL: https://issues.apache.org/jira/browse/YARN-4963
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacityscheduler
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: Nathan Roberts
>         Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment 
> per heartbeat. With more and more non MapReduce workloads coming along, the 
> degree of locality is declining, causing scheduling to be significantly 
> slower. It's still important to limit the number of OFF_SWITCH assignments to 
> avoid densely packing OFF_SWITCH containers onto nodes. 
> Proposal is to add a simple config that makes the number of OFF_SWITCH 
> assignments configurable.
> Will upload candidate patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to