[ 
https://issues.apache.org/jira/browse/YARN-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146790#comment-14146790
 ] 

Carlo Curino commented on YARN-2592:
------------------------------------

I hear you, and I agree we will need to cope with non-cheap preemption for a 
while, and even long term not everyone will be nicely preemptable (our work on 
YARN-1051 is for example designed to allow people to get very guaranteed and 
protected resources when needed). 

However, the compromise you propose means that the over-capacity "zone" is 
weirdly policed... on one side we expect the "giving" of containers to respect 
a notion of fairness (proportional to your rightful capacity), which is in 
turns not enforce by preemption. I find this inconsistent.

Moreover, as I was saying, I think this will only spare containers in a rather 
narrow band (when imbalance happened among over capacity queues, and no 
under-capacity queues are requesting resources yet, and we are above the 
dead-zone, and tasks run longer than 2x the grace period). Is this a large 
enough use case to require special-casing?
If this is important in practice and an adoption show-stopper I am fine with 
compromises, but we should make sure this is the case. 

A way to do this is to enable preemption but run it in "observe-only" mode, 
where the policy logs what he would like to do without actually doing it... We 
can see whether on a real cluster we are often/ever in the scenario you are 
trying to address.




> Preemption can kill containers to fulfil need of already over-capacity queue.
> -----------------------------------------------------------------------------
>
>                 Key: YARN-2592
>                 URL: https://issues.apache.org/jira/browse/YARN-2592
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.5.1
>            Reporter: Eric Payne
>
> There are scenarios in which one over-capacity queue can cause preemption of 
> another over-capacity queue. However, since killing containers may lose work, 
> it doesn't make sense to me to kill containers to feed an already 
> over-capacity queue.
> Consider the following:
> {code}
> root has A,B,C, total capacity = 90
> A.guaranteed = 30, A.pending = 5, A.current = 40
> B.guaranteed = 30, B.pending = 0, B.current = 50
> C.guaranteed = 30, C.pending = 0, C.current = 0
> {code}
> In this case, the queue preemption monitor will kill 5 resources from queue B 
> so that queue A can pick them up, even though queue A is already over its 
> capacity. This could lose any work that those containers in B had already 
> done.
> Is there a use case for this behavior? It seems to me that if a queue is 
> already over its capacity, it shouldn't destroy the work of other queues. If 
> the over-capacity queue needs more resources, that seems to be a problem that 
> should be solved by increasing its guarantee.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to