[
https://issues.apache.org/jira/browse/YARN-4759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15370847#comment-15370847
]
Shane Kumpf commented on YARN-4759:
-----------------------------------
I've started working on this again and have a patch ready based on the logic
above.
While the patch works to properly reacquire containers on NM restart,
exceptions occur when attempting to "docker stop" the container because
container-executor#launch_docker_container_as_user removes the container once
it completes (docker rm container_id). Removal of the container should be
configurable to enable users to debug issues when a container fails to
launch/produce the desired outcome, but changing the function signature has
consequences elsewhere that need to be considered. Currently researching the
options for one that will be least impactful.
> Revisit signalContainer() for docker containers
> -----------------------------------------------
>
> Key: YARN-4759
> URL: https://issues.apache.org/jira/browse/YARN-4759
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: yarn
> Reporter: Sidharta Seethana
> Assignee: Shane Kumpf
>
> The current signal handling (in the DockerContainerRuntime) needs to be
> revisited for docker containers. For example, container reacquisition on NM
> restart might not work, depending on which user the process in the container
> runs as.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]