[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16570918#comment-16570918
 ] 

Eric Yang commented on YARN-8160:
---------------------------------

[~csingh] Exit code 255 is coming from docker inspect 
container_e02_1533231998644_0009_01_000003.  There looks like a race condition 
where ContainerLaunch thread has issued the termination on docker container 
pid.  LinuxContainerExecutor still has a independent child process that is 
checking the liveness of the docker container.  The two code path are not 
coordinated to cause the status of the container to record incorrect transient 
result from container-executor.  One possibility is to find the parent pid of 
docker container, and send a sigkill to avoid additional status to be written 
from container-executor, then send signal to docker container.  This can 
prevent node manager from processing the exit code from docker inspect 
container_e02_1533231998644_0009_01_000003.


> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8160
>                 URL: https://issues.apache.org/jira/browse/YARN-8160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>         Attachments: container_e02_1533231998644_0009_01_000003.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
>     -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
>     -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to