[
https://issues.apache.org/jira/browse/YARN-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364893#comment-15364893
]
Hitesh Sharma commented on YARN-5216:
-------------------------------------
[~asuresh], [~kkaranasos]], thank you for the feedback and comments.
Regarding the refactoring being done and the reason to pull queues into the
currently named {{OpportunisticContainerManager}}:
Roughly speaking the {{QueuingContainersManagerImpl}} does the following for
starting and stopping opportunistic containers:
* A running container simply gets preempted while a container waiting in the
queue is removed and RM is notified to reallocate it elsewhere.
* Periodically it is checked if there are too many waiting containers in the
queue and they are removed so RM can rebalance them.
* When a running container finishes then a waiting opportunistic container will
be run if there are no guaranteed waiting in the queue.
If the preemption policy is to kill the container then things are a little
simpler and you can leave the opportunistic container queue within
{{QueuingContainersManagerImpl}}. However if the preemption policy is different
then we need extension points to know about the operations that the
{{QueuingContainersManagerImpl}} wants to do and respond appropriately. Say the
preemption policy is to put the container in a pause state so that it can be
resumed once there is some room to run a container. This requires to
distinguish between whether the {{QueuingContainersManagerImpl}} is looking to
run pending containers (e.g. we want to resume a preempted container over an OC
which is still waiting in the queue) or is looking to rebalance waiting
containers to other nodes (e.g. we can't reallocate a container in the pause
state). For pretty much these reasons the pluggable policy is named as
{{OpportunisticContainerManager}} as it allows you to preempt and start the
opportunistic containers and also manages the queue of the opportunistic
containers. I'm open to suggestion on how to do this differently without having
to change {{QueuingContainersManagerImpl}} a lot.
[~asuresh], can you elaborate a little why {{queuedGuaranteedContainers}}
should also be pulled into the {{OpportunisticContainerManagerImpl}}?
I will look into using ServiceLoader framework over reflection and add an extra
constant to determine the default value.
Thank a lot for the feedback and comments.
> Expose configurable preemption policy for OPPORTUNISTIC containers running on
> the NM
> ------------------------------------------------------------------------------------
>
> Key: YARN-5216
> URL: https://issues.apache.org/jira/browse/YARN-5216
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Arun Suresh
> Assignee: Hitesh Sharma
> Attachments: YARN5216.001.patch, yarn5216.002.patch
>
>
> Currently, the default action taken by the QueuingContainerManager,
> introduced in YARN-2883, when a GUARANTEED Container is scheduled on an NM
> with OPPORTUNISTIC containers using up resources, is to KILL the running
> OPPORTUNISTIC containers.
> This JIRA proposes to expose a configurable hook to allow the NM to take a
> different action.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]