[ https://issues.apache.org/jira/browse/YARN-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15718618#comment-15718618 ]
Arun Suresh commented on YARN-5292: ----------------------------------- Thanks for the patch Hitesh, couple of comments: * {{ContainerExecutor}}, maybe default behavior should not be to throw an Exception. We should probably LOG.warn() too. * {{ContainerImpl}}, In a couple of places, you can maybe collapse a bunch of transitions like this : {noformat} .addTransition(ContainerState.KILLING, ContainerState.KILLING, ContainerEventType.CONTAINER_LAUNCHED) .addTransition(ContainerState.KILLING, ContainerState.KILLING, ContainerEventType.PAUSE_CONTAINER) {noformat} into {noformat} .addTransition(ContainerState.KILLING, ContainerState.KILLING, EnumSet.of(ContainerEventType.CONTAINER_LAUNCHED, ContainerEventType.PAUSE_CONTAINER) {noformat} * It looks like when a container is REINITIALIZING, and it receives a PAUSE event, you are killing the container… Think it might be better to re-queue the container somehow in this case - so the scheduler can restart it when there is available resources. * I was thinking PAUSED and RESUMING should be notified to the RM as SCHEDULED itself. SCHEDULED should be used signify that the container allocation is secure, but is not running. > Support for PAUSED container state > ---------------------------------- > > Key: YARN-5292 > URL: https://issues.apache.org/jira/browse/YARN-5292 > Project: Hadoop YARN > Issue Type: New Feature > Reporter: Hitesh Sharma > Assignee: Hitesh Sharma > Attachments: YARN-5292.001.patch, YARN-5292.002.patch, > YARN-5292.003.patch, YARN-5292.004.patch, yarn-5292.pdf > > > YARN-2877 introduced OPPORTUNISTIC containers, and YARN-5216 proposes to add > capability to customize how OPPORTUNISTIC containers get preempted. > In this JIRA we propose introducing a PAUSED container state. > When a running container gets preempted, it enters the PAUSED state, where it > remains until resources get freed up on the node then the preempted container > can resume to the running state. > > One scenario where this capability is useful is work preservation. How > preemption is done, and whether the container supports it, is implementation > specific. > For instance, if the container is a virtual machine, then preempt would pause > the VM and resume would restore it back to the running state. > If the container doesn't support preemption, then preempt would default to > killing the container. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org