[ https://issues.apache.org/jira/browse/YARN-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15652608#comment-15652608 ]
Wangda Tan commented on YARN-5864: ---------------------------------- The problem in the description is hard because it's hard clearly explain why a queue will be preempted even if a queue is within its limit. So I'm proposing to solve one use case only: in some of our customer's configuration, we have separate queues for long running services, for example LLAP-queue for LLAP services. LLAP services will scale up and down depends on the workload, they will ask container with lots of resource to make sure hosts running LLAP daemons not used by other applications. And we want to allocate containers for such LRS sooner when they have requirements to scale up. There's one quick approach in my mind to handle the use case above: - Add a new preemption selector (which make sure this feature can be disabled by configuration) - Add a white-list of queues for the new selection: Only queue in white list can preempt from other queues - When a reserved container from white-list queue created beyond configured timeout, we will look at the node which reserves the container, and select container from non-whitelisted queue to preempt. Thoughts and suggestions? [~curino], [~eepayne], [~sunilg]. Attached patch for review as well. > Capacity Scheduler preemption for fragmented cluster > ----------------------------------------------------- > > Key: YARN-5864 > URL: https://issues.apache.org/jira/browse/YARN-5864 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Wangda Tan > Assignee: Wangda Tan > > YARN-4390 added preemption for reserved container. However, we found one case > that large container cannot be allocated even if all queues are under their > limit. > For example, we have: > {code} > Two queues, a and b, capacity 50:50 > Two nodes: n1 and n2, each of them have 50 resource > Now queue-a uses 10 on n1 and 10 on n2 > queue-b asks for one single container with resource=45. > {code} > The container could be reserved on any of the host, but no preemption will > happen because all queues are under their limits. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org