[ 
https://issues.apache.org/jira/browse/YARN-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14573619#comment-14573619
 ] 

Eric Payne commented on YARN-3769:
----------------------------------

The following configuration will cause this:

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 40 | 90 | N/A |
| A | 10 | 100 | 20 | 70 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

One app is running in each queue. Both apps are asking for more resources, but 
they have each reached their user limit, so even though both are asking for 
more and there are resources available, no more resources are allocated to 
either app.

The preemption monitor will see that {{B}} is asking for a lot more resources, 
and it will see that {{B}} is more underserved than {{A}}, so the preemption 
monitor will try to make the queues balance by preempting resources (10, for 
example) from {{A}}.

|| queue || capacity || max || pending || used || user limit
| root | 100 | 100 | 50 | 80 | N/A |
| A | 10 | 100 | 30 | 60 | 70 |
| B | 10 | 100 | 20 | 20 | 20 |

However, when the capacity scheduler tries to give that container to the app in 
{{B}}, the app will recognize that it has no headroom, and refuse the 
container. So the capacity scheduler offers the container again to the app in 
{{A}}, which accepts it because it has headroom now, and the process starts 
over again.

Note that this happens even when used cluster resources are below 100% because 
the used + pending for the cluster would put it above 100%.

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-3769
>                 URL: https://issues.apache.org/jira/browse/YARN-3769
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to