[ 
https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652608#comment-15652608
 ] 

Wangda Tan commented on YARN-5864:
----------------------------------

The problem in the description is hard  because it's hard clearly explain why a 
queue will be preempted even if a queue is within its limit.

So I'm proposing to solve one use case only: in some of our customer's 
configuration, we have separate queues for long running services, for example 
LLAP-queue for LLAP services. LLAP services will scale up and down depends on 
the workload, they will ask container with lots of resource to make sure hosts 
running LLAP daemons not used by other applications.

And we want to allocate containers for such LRS sooner when they have 
requirements to scale up.

There's one quick approach in my mind to handle the use case above: 
- Add a new preemption selector (which make sure this feature can be disabled 
by configuration)
- Add a white-list of queues for the new selection: Only queue in white list 
can preempt from other queues
- When a reserved container from white-list queue created beyond configured 
timeout, we will look at the node which reserves the container, and select 
container from non-whitelisted queue to preempt.

Thoughts and suggestions? [~curino], [~eepayne], [~sunilg].

Attached patch for review as well.

> Capacity Scheduler preemption for fragmented cluster 
> -----------------------------------------------------
>
>                 Key: YARN-5864
>                 URL: https://issues.apache.org/jira/browse/YARN-5864
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>
> YARN-4390 added preemption for reserved container. However, we found one case 
> that large container cannot be allocated even if all queues are under their 
> limit.
> For example, we have:
> {code}
> Two queues, a and b, capacity 50:50 
> Two nodes: n1 and n2, each of them have 50 resource 
> Now queue-a uses 10 on n1 and 10 on n2
> queue-b asks for one single container with resource=45. 
> {code} 
> The container could be reserved on any of the host, but no preemption will 
> happen because all queues are under their limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to