[
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669047#comment-15669047
]
Carlo Curino commented on YARN-5864:
------------------------------------
[~wangda] I think we are on the same page on the problem side, and I agree that
the scheduling invariants (that were once hard constraints) will eventually
look more like soft-constraints, which we aim to meet/maximize but are ok to
comprise over in some cases.
Understanding how to trade one for the other, or how to make decisions that
maximize the number/amount of met constraints is the hard problem. To this
purpose I would argue that (2) is structurally better position to capture all
the tradeoffs in a compact and easy to understand way, than any combination of
heuristics. Said this how to design (2) in a scalable/fast way is an open
problem (an interesting direction recently appeared in OSDI 2016,
http://www.firmament.io/, while it is not enough, it has some good ideas we
could consider to leverage). So I am proposing it more as a north-star than as
a short-term proposal of how to tackle this JIRA (or the scheduler issues in
general). On the other hand, (1) is an ongoing activity we can start
right-away, and we should do it regardless of whether we eventually manage to
do something like (2) or not.
Regarding abuses/scope of the feature. I am certain that the initial scenarios
you are designing for has all the right properties to be
safe/reasonable/trusted, but once the feature is out there, people will start
using it in the most baroque ways and some of the issues I allude it to, might
come up. Having very crisply defined semantics, configuration-validation
mechanics (that prevent the worst configuration mistakes), and very tight unit
tests are probably our best line of defense.
> Capacity Scheduler preemption for fragmented cluster
> -----------------------------------------------------
>
> Key: YARN-5864
> URL: https://issues.apache.org/jira/browse/YARN-5864
> Project: Hadoop YARN
> Issue Type: New Feature
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case
> that large container cannot be allocated even if all queues are under their
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50
> Two nodes: n1 and n2, each of them have 50 resource
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45.
> {code}
> The container could be reserved on any of the host, but no preemption will
> happen because all queues are under their limits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]