[
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572593#comment-16572593
]
Chandni Singh commented on YARN-8160:
-------------------------------------
{quote}
Exit code 255 is coming from docker inspect
container_e02_1533231998644_0009_01_000003. There looks like a race condition
where ContainerLaunch thread has issued the termination on docker container
pid. LinuxContainerExecutor still has a independent child process that is
checking the liveness of the docker container.
{quote}
[~eyang], the container exit code comes from the below stmt in
{{ContainerLaunch.call()}}
{code}
ret = launchContainer(new ContainerStartContext.Builder()
.setContainer(container)
.setLocalizedResources(localResources)
.setNmPrivateContainerScriptPath(nmPrivateContainerScriptPath)
.setNmPrivateTokensPath(nmPrivateTokensPath)
.setUser(user)
.setAppId(appIdStr)
.setContainerWorkDir(containerWorkDir)
.setLocalDirs(localDirs)
.setLogDirs(logDirs)
.setFilecacheDirs(filecacheDirs)
.setUserLocalDirs(userLocalDirs)
.setContainerLocalDirs(containerLocalDirs)
.setContainerLogDirs(containerLogDirs)
.setUserFilecacheDirs(userFilecacheDirs)
.setApplicationLocalDirs(applicationLocalDirs).build());
{code}
The docker inspect of the container that has been stopped and cleaned would
just tell the container is not alive. How does it affect the container's exit
code? I cannot find this in the code. Could you please point me to it?
I still think, below are the only 2 solutions for this:
1. In node manager, if a container is in REINITIALIZING_AWAITING_KILL and gets
a CONTAINER_EXITED_WITH_FAILURE event, then it should handle it in the similar
way as it currently handle the CONTAINER_KILLED_ON_REQUEST.
2. cleanup of container files is not performed until the container exits
> Yarn Service Upgrade: Support upgrade of service that use docker containers
> ----------------------------------------------------------------------------
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Major
> Labels: Docker
> Attachments: container_e02_1533231998644_0009_01_000003.nm.log
>
>
> Ability to upgrade dockerized yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api.
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT*
> require additional resources IIUC.
>
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a
> relaunch.
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers.
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify
> {{reInitializeContainer}} to trigger docker container launch without pulling
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with
> an option to just pull the images on the NMs.
> -- When a component instance is upgrade, it calls the
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM
> will have already pulled the images.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]