Andrew Chung created YARN-11015:
-----------------------------------

             Summary: Decouple queue capacity with ability to run OPPORTUNISTIC 
container
                 Key: YARN-11015
                 URL: https://issues.apache.org/jira/browse/YARN-11015
             Project: Hadoop YARN
          Issue Type: Sub-task
          Components: container-queuing, resourcemanager
            Reporter: Andrew Chung


Motivation:
With YARN-11005, we will be able to schedule OContainers on nodes based on 
resource availability. That said, we should be able to allow nodes with 0 queue 
capacity to run OContainers (as these containers should be started directly 
immediately if resources are available, even if they are put on a "queue" 
first).
However, with the current implementation, if we set the queue length of NMs to 
be 0, at the RM, it assumes infinite queue capacity while at the NM, it 
disables the running of any OContainers, killing OContainers that arrive 
directly.
This issue works to address the above issues with the 
{{QUEUE_LENGTH_THEN_RESOURCES}} allocator.
This issue does not aim to change the existing behavior of the {{QUEUE_LENGTH}} 
allocator.

Proposed design:
To add a new {{NodeManager}} config, {{opportunistic-containers-queue-policy}}, 
which allows the specification of the queueing policy at the NM.
Will start with {{BY_RESOURCES}} and {{BY_QUEUE_LEN}}, where if 
{{BY_RESOURCES}} is specified, the NM will queue as long as it has enough 
resources to run all pending + running containers. Otherwise, it will reject 
the {{OPPORTUNISTIC}} container.
On the other hand, if {{BY_QUEUE_LEN}} is specified, the NM will only accept as 
many containers as its queue capacity is configured.
Thus, if {{BY_QUEUE_LEN}} is specified and the NM's queue capacity is 
configured to be 0, the NM will reject all incoming {{OPPORTUNISTIC}} 
containers (today's behavior).

Note that this configuration *does not affect how the RM behaves*.
At the RM, if the queue capacity reported by the node is = 0 *and* the 
allocation policy is set to {{QUEUE_LENGTH_THEN_RESOURCES}}, it assumes that 
the node can still run {{OPPORTUNISTIC}} containers if it has available 
resources, otherwise it skips the node.
Subsequently, if the queue capacity reported by the node is = 0 *and* the 
allocation policy is set to {{QUEUE_LENGTH}}, it still assumes that the node 
can run infinitely many {{OPPORTUNISTIC}} containers, and it will be on the NM 
to reject these containers (today's behavior).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to