[
https://issues.apache.org/jira/browse/YARN-7973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393346#comment-16393346
]
Billie Rinaldi commented on YARN-7973:
--------------------------------------
I started taking a look at patch 002. When I ran my first app, I had a
configuration problem: I was trying to run a privileged container as a user
that wasn't allowed to run privileged containers. The container failed with the
appropriate message about the user failing the ACL check, but when it was
relaunched the following was logged repeatedly. It seems like we could improve
the failure handling in scenarios like this.
{noformat}
2018-03-08 22:02:53,791 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Getting container-status for container_1520546307703_0001_01_000002
2018-03-08 22:02:53,791 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Returning ContainerStatus: [ContainerId:
container_1520546307703_0001_01_000002, ExecutionType: GUARANTEED, State:
RUNNING, Capability: <memory:1024, vCores:1>, Diagnostics: [2018-03-08
22:02:53.397]Exception from container-launch.
Container id: container_1520546307703_0001_01_000002
Exit code: -1
Exception message: <unknown>
Shell output: <unknown>
[2018-03-08 22:02:53.500]Diagnostic message from attempt 0 : [2018-03-08
22:02:53.500]
[2018-03-08 22:02:53.501]Container exited with a non-zero exit code -1.
, ExitStatus: -1, IP: null, Host: null, ContainerSubState: SCHEDULED]
{noformat}
> Support ContainerRelaunch for Docker containers
> -----------------------------------------------
>
> Key: YARN-7973
> URL: https://issues.apache.org/jira/browse/YARN-7973
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Priority: Major
> Attachments: YARN-7973.001.patch, YARN-7973.002.patch
>
>
> Prior to YARN-5366, {{container-executor}} would remove the Docker container
> when it exited. The removal is now handled by the
> {{DockerLinuxContainerRuntime}}. {{ContainerRelaunch}} is intended to reuse
> the workdir from the previous attempt, and does not call {{cleanupContainer}}
> prior to {{launchContainer}}. The container ID is reused as well. As a
> result, the previous Docker container still exists, resulting in an error
> from Docker indicating the a container by that name already exists.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]