[
https://issues.apache.org/jira/browse/YARN-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15250137#comment-15250137
]
Nathan Roberts commented on YARN-4963:
--------------------------------------
bq. IMO, I think application specific configurations should be there rather at
scheduler level. Some applications are fine with assigning containers in
off_switch they can specify number of containers to be assigned. But few
applications are very strict to node locality, they can configure 1 in
off_switch.
bq. Even i feel the same, any specfic reason it has been set only at the
scheduler level other than the AMRM interface change ? We can keep the default
value as 1 so that its still compatible. Also anyway allocation happens within
app's & queue's capacity limits so i feel it would be ideal for app to decide
how many allocations in off_switch node. thoughts ?
Thanks [~Naganarasimha], [~rohithsharma], [~leftnoteasy] for the comments. I
think we're all in agreement that there needs to be some control at the
application level for things like OFF_SWITCH allocations, and locality delays
(That's what #2 was going for and I think that should be a separate jira if
folks are agreeable to that.) This new feature will require some discussion:
- The current value of 1 is not a good value for almost all applications so I
think when we do the application-level support the default would need to be
either unlimited or some high value, otherwise we force all applications to set
this limit to something other than 1 to get decent OFF_SWITCH scheduling
behavior.
- This setting not only affects the application at hand, but can also affect
the entire system. I can see many cases where applications will relax these
settings significantly so that their application schedules faster, however that
may not have been the right thing for the system as a whole. Sure, my
application scheduled very quickly but my locality was terrible so I caused a
lot of unnecessary cross-switch traffic. So I think we'll need some
system-minimums that will prevent this type of abuse.
- These changes would potentially affect the fifo-ness of the queues. If
application A meets its OFF-SWITCH-per-node limit, do we offer the node to
other applications in the same queue?
So my suggestion is:
1) Have this jira make the system-level OFF-SWITCH check configurable so
admins can easily crank this up and dramatically improve scheduling rate.
2) Have a second jira to address per-application settings for things like
locality_delay and off_switch limits.
Reasonable?
> capacity scheduler: Make number of OFF_SWITCH assignments per heartbeat
> configurable
> ------------------------------------------------------------------------------------
>
> Key: YARN-4963
> URL: https://issues.apache.org/jira/browse/YARN-4963
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: capacityscheduler
> Affects Versions: 3.0.0, 2.7.2
> Reporter: Nathan Roberts
> Assignee: Nathan Roberts
> Attachments: YARN-4963.001.patch
>
>
> Currently the capacity scheduler will allow exactly 1 OFF_SWITCH assignment
> per heartbeat. With more and more non MapReduce workloads coming along, the
> degree of locality is declining, causing scheduling to be significantly
> slower. It's still important to limit the number of OFF_SWITCH assignments to
> avoid densely packing OFF_SWITCH containers onto nodes.
> Proposal is to add a simple config that makes the number of OFF_SWITCH
> assignments configurable.
> Will upload candidate patch shortly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)