[ 
https://issues.apache.org/jira/browse/YARN-5972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15832683#comment-15832683
 ] 

Konstantinos Karanasos commented on YARN-5972:
----------------------------------------------

bq. My opinion is that PAUSED state should not be handled any differently from 
the current QUEUED state we already persist in the store, this implies 
YARN-6059 can probably be closed (We do need to fix the ContainerScheduler to 
populate it with the running containers though, but this is orthogonal to the 
paused/resume feature and should be handled as a separate JIRA).
Regarding this, I just had an offline chat with [~asuresh]. It seems that it 
would be better to add a separate entry at the StateStore for the PAUSED 
containers. This way we can decide what to do with them at node recovery. For 
example, we might want to kill them to release node resources that they might 
be holding.

I will give a look at YARN-5292 and YARN-5216 next week.

> Support Pausing/Freezing of opportunistic containers
> ----------------------------------------------------
>
>                 Key: YARN-5972
>                 URL: https://issues.apache.org/jira/browse/YARN-5972
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Hitesh Sharma
>            Assignee: Hitesh Sharma
>         Attachments: container-pause-resume.pdf
>
>
> YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add 
> capability to customize how OPPORTUNISTIC containers get preempted.
> In this JIRA we propose introducing a PAUSED container state.
> Instead of preempting a running container, the container can be moved to a 
> PAUSED state, where it remains until resources get freed up on the node then 
> the preempted container can resume to the running state.
> Note that process freezing this is already supported by 'cgroups freezer' 
> which is used internally by the docker pause functionality. Windows also has 
> OS level support of a similar nature.
> One scenario where this capability is useful is work preservation. How 
> preemption is done, and whether the container supports it, is implementation 
> specific.
> For instance, if the container is a virtual machine, then preempt call would 
> pause the VM and resume would restore it back to the running state.
> If the container executor / runtime doesn't support preemption, then preempt 
> would default to killing the container. 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to