Eric Payne commented on YARN-3275:

Thanks very much, [~leftnoteasy], for reviewing this issue.
Actually, go over max capacity is possible, when a cluster with resource = 
1000G, and a queue reaches its max capacity, after the cluster resource goes 
down to 100G, it can over max capacity.
n addition, parent queue can go beyond max capacity as described in YARN-3243 
no matter if cluster resource changed or not. But child queue can only go 
beyond max capacity when cluster resource reduced.
It is possible that the total available capacity of the cluster dropped by some 
percentage, causing the leaf node to go over its abs max cap by 5%. The cluster 
has a large number of nodes and memory, and that value is always changing 
slightly as nodes are lost and re-register. This may not account for the 5% 
overage we saw on the small leaf queue, because that total memory number isn't 
varying by 5%.
we haven't defined "disable-preemption" is more important than "max-capacity". 
IMO, if we should do this JIRA or not is still discussable.
I see your point. In other words, it could be argued that the preemption 
monitor is doing the right thing. That is, when it sees that the queue is over 
its absolute max capacity (which should not happen), the preemption monitor is 
moving those resources back into the usable pool.

However, the expectation of our users is that if they are running a job on a 
non-preemptable queue, their containers should never be preempted. From their 
point of view, it doesn't matter what the reason is, they are expecting the RM 
to obey the contract that says it will not preempt their resources.

> Preemption happening on non-preemptable queues
> ----------------------------------------------
>                 Key: YARN-3275
>                 URL: https://issues.apache.org/jira/browse/YARN-3275
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>         Attachments: YARN-3275.v1.txt
> YARN-2056 introduced the ability to turn preemption on and off at the queue 
> level. In cases where a queue goes over its absolute max capacity (YARN-3243, 
> for example), containers can be preempted from that queue, even though the 
> queue is marked as non-preemptable.
> We are using this feature in large, busy clusters and seeing this behavior.

This message was sent by Atlassian JIRA

Reply via email to