[ 
https://issues.apache.org/jira/browse/YARN-11015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Chung reassigned YARN-11015:
-----------------------------------

    Assignee: Andrew Chung

> Decouple queue capacity with ability to run OPPORTUNISTIC container
> -------------------------------------------------------------------
>
>                 Key: YARN-11015
>                 URL: https://issues.apache.org/jira/browse/YARN-11015
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: container-queuing, resourcemanager
>            Reporter: Andrew Chung
>            Assignee: Andrew Chung
>            Priority: Minor
>
> Motivation:
> With YARN-11005, we will be able to schedule OContainers on nodes based on 
> resource availability. That said, we should be able to allow nodes with 0 
> queue capacity to run OContainers (as these containers should be started 
> directly immediately if resources are available, even if they are put on a 
> "queue" first).
> However, with the current implementation, if we set the queue length of NMs 
> to be 0, at the RM, it assumes infinite queue capacity while at the NM, it 
> disables the running of any OContainers, killing OContainers that arrive 
> directly.
> This issue works to address the above issues with the 
> {{QUEUE_LENGTH_THEN_RESOURCES}} allocator.
> This issue does not aim to change the existing behavior of the 
> {{QUEUE_LENGTH}} allocator.
> Proposed design:
> To add a new {{NodeManager}} config, 
> {{opportunistic-containers-queue-policy}}, which allows the specification of 
> the queueing policy at the NM.
> Will start with {{BY_RESOURCES}} and {{BY_QUEUE_LEN}}, where if 
> {{BY_RESOURCES}} is specified, the NM will queue as long as it has enough 
> resources to run all pending + running containers. Otherwise, it will reject 
> the {{OPPORTUNISTIC}} container.
> On the other hand, if {{BY_QUEUE_LEN}} is specified, the NM will only accept 
> as many containers as its queue capacity is configured.
> Thus, if {{BY_QUEUE_LEN}} is specified and the NM's queue capacity is 
> configured to be 0, the NM will reject all incoming {{OPPORTUNISTIC}} 
> containers (today's behavior).
> Note that this configuration *does not affect how the RM behaves*.
> At the RM, if the queue capacity reported by the node is = 0 *and* the 
> allocation policy is set to {{QUEUE_LENGTH_THEN_RESOURCES}}, it assumes that 
> the node can still run {{OPPORTUNISTIC}} containers if it has available 
> resources, otherwise it skips the node.
> Subsequently, if the queue capacity reported by the node is = 0 *and* the 
> allocation policy is set to {{QUEUE_LENGTH}}, it still assumes that the node 
> can run infinitely many {{OPPORTUNISTIC}} containers, and it will be on the 
> NM to reject these containers (today's behavior).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to