[ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15669047#comment-15669047
 ] 

Carlo Curino commented on YARN-5864:
------------------------------------

[~wangda] I think we are on the same page on the problem side, and I agree that 
the scheduling invariants (that were once hard constraints) will eventually 
look more like soft-constraints, which we aim to meet/maximize but are ok to 
comprise over in some cases. 

Understanding how to trade one for the other, or how to make decisions that 
maximize the number/amount of met constraints is the hard problem. To this 
purpose I would argue that (2) is structurally better position to capture all 
the tradeoffs in a compact and easy to understand way, than any combination of 
heuristics.  Said this how to design (2) in a scalable/fast way is an open 
problem (an interesting direction recently appeared in OSDI 2016,  
http://www.firmament.io/, while it is not enough, it has some good ideas we 
could consider to leverage). So I am proposing it more as a north-star than as 
a short-term proposal of how to tackle this JIRA (or the scheduler issues in 
general).  On the other hand, (1) is an ongoing activity we can start 
right-away, and we should do it regardless of whether we eventually manage to 
do something like (2) or not. 

Regarding abuses/scope of the feature. I am certain that the initial scenarios 
you are designing for has all the right properties to be 
safe/reasonable/trusted, but once the feature is out there, people will start 
using it in the most baroque ways and some of the issues I allude it to, might 
come up.  Having very crisply defined semantics, configuration-validation 
mechanics (that prevent the worst configuration mistakes), and very tight unit 
tests are probably our best line of defense.



> Capacity Scheduler preemption for fragmented cluster 
> -----------------------------------------------------
>
>                 Key: YARN-5864
>                 URL: https://issues.apache.org/jira/browse/YARN-5864
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>         Attachments: YARN-5864.poc-0.patch
>
>
> YARN-4390 added preemption for reserved container. However, we found one case 
> that large container cannot be allocated even if all queues are under their 
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50 
> Two nodes: n1 and n2, each of them have 50 resource 
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45. 
> {code} 
> The container could be reserved on any of the host, but no preemption will 
> happen because all queues are under their limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to