[
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420505#comment-16420505
]
Shane Kumpf commented on YARN-7973:
-----------------------------------
Thanks for trying out the patch [~eyang]!
{quote} Container relaunch is kind of working on my cluster using the example
above. If an app is stopped, and restarted, new containers would be acquired.
If container fails, and the same one will be used for relaunch. {quote}
So it seems that there may be inconsistent use of the container relaunch policy
in Native Services. That isn't really in scope for this patch, but sounds like
something we should review in a separate issue. The only change in flow is when
a container transitions to the relaunching state and Docker is in use, so this
patch doesn't change how Native Services leverages that transition.
{quote}However, I encountered a problem where flexing containers from 2 to 3,
then decrease back to 2. The flexing command failed to be received by AM with
the following error message{code}
I haven't been able to recreate this. Based on the exception type, it looks
like the Services API may have been down? Can you share the RM and NM logs when
this happens? I really wouldn't expect this patch to be related to that
exception as it doesn't touch the Services API.
> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container
> when it exited. The removal is now handled by the
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse
> the workdir from the previous attempt, and does not call {{cleanupContainer}}
> prior to {{launchContainer}}. The container ID is reused as well. As a
> result, the previous Docker container still exists, resulting in an error
> from Docker indicating the a container by that name already exists.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]