Wangda Tan commented on YARN-3769:

This is a very interesting problem, actually not only user-limit causes it.

For example, fair ordering (YARN-3306), hard locality requirements (I want 
resources from rackA and nodeX only), AM resource limit; In the near future we 
can have constraints (YARN-3409), all can lead to resource is preempted from 
one queue, but the other queue cannot use it because of specific resource 
requirement and limits.

One thing I've thought for a while is adding a "lazy preemption" mechanism, 
which is: when a container is marked preempted and wait for 
max_wait_before_time, it becomes a "can_be_killed" container. If there's 
another queue can allocate on a node with "can_be_killed" container, such 
container will be killed immediately to make room the new containers.

This mechanism can make preemption policy doesn't need to consider complex 
resource requirements and limits inside a queue, and also it can avoid kill 
unnecessary containers.

If you think it's fine, could I take a shot at it?

Thoughts? [~vinodkv].

> Preemption occurring unnecessarily because preemption doesn't consider user 
> limit
> ---------------------------------------------------------------------------------
>                 Key: YARN-3769
>                 URL: https://issues.apache.org/jira/browse/YARN-3769
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0
>            Reporter: Eric Payne
>            Assignee: Eric Payne
> We are seeing the preemption monitor preempting containers from queue A and 
> then seeing the capacity scheduler giving them immediately back to queue A. 
> This happens quite often and causes a lot of churn.

This message was sent by Atlassian JIRA

Reply via email to