[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138
 ] 

Chandni Singh edited comment on YARN-8160 at 8/14/18 2:15 AM:
--

[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is not specific to docker container. For changes that are 
not supported by yarn service upgrade, initiation of upgrade will fail with 
appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes are 
supported by yarn service upgrade. 


was (Author: csingh):
[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is not specific to docker container. For changes that are 
not supported by yarn service upgrade, initiation of upgrade will fail with 
appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes which are 
supported by yarn service upgrade. 

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, YARN-8160.002.patch, 
> YARN-8160.003.patch, YARN-8160.004.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138
 ] 

Chandni Singh edited comment on YARN-8160 at 8/14/18 2:13 AM:
--

[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is not specific to docker container. For changes that are 
not supported by yarn service upgrade, initiation of upgrade will fail with 
appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes which are 
supported by yarn service upgrade. 


was (Author: csingh):
[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is not specific to docker container. For changes that are 
not supported by yarn service upgrade, initiation of upgrade will fail with 
appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes are 
supported by yarn service upgrade. 

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, YARN-8160.002.patch, 
> YARN-8160.003.patch, YARN-8160.004.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-13 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138
 ] 

Chandni Singh edited comment on YARN-8160 at 8/14/18 2:12 AM:
--

[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is not specific to docker container. For changes that are 
not supported by yarn service upgrade, initiation of upgrade will fail with 
appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes are 
supported by yarn service upgrade. 


was (Author: csingh):
[~eyang] Changing resource for a component is not supported as part of yarn 
service upgrade. This is nothing in specific to docker container. For changes 
that are not supported by yarn service upgrade, initiation of upgrade will fail 
with appropriate error message.

{{UpgradeComponentsFinder}} has the logic which outlines what changes are 
supported by yarn service upgrade. 

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, YARN-8160.002.patch, 
> YARN-8160.003.patch, YARN-8160.004.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-08 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573888#comment-16573888
 ] 

Chandni Singh edited comment on YARN-8160 at 8/8/18 9:30 PM:
-

The reapContainer and cleaning of container files during cleanup can be 
mutually exclusive to the run. Attached is the patch for it. I am testing it.


was (Author: csingh):
The cleaning of container files during cleanup can be mutually exclusive to the 
run. Attached is the patch for it.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8610.001.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-08 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573888#comment-16573888
 ] 

Chandni Singh edited comment on YARN-8160 at 8/8/18 9:09 PM:
-

The cleaning of container files during cleanup can be mutually exclusive to the 
run. Attached is the patch for it.


was (Author: csingh):
The clean of container files during container cleanup can be mutually exclusive 
to the run. Attached is the patch for it.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
> Attachments: YARN-8160.001.patch, 
> container_e02_1533231998644_0009_01_03.nm.log
>
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-06 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774
 ] 

Chandni Singh edited comment on YARN-8160 at 8/6/18 10:06 PM:
--

Attached are the logs of container 3 that fails to re-initialize. When it is 
re-initialized, the container is stopped and cleanup. This causes the container 
to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or 
{{TERMINATED}}.

Since the container exits with a failure code, that is {{255}}, the status of 
the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to 
{{EXITED_WITH_FAILURE}}.

Below are the relevant log stmts:

1. Reinit of the container is triggered
{code:java}
 ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER

ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl 
(ContainerImpl.java:handle(2093)) - Container 
container_e02_1533231998644_0009_01_03 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KIL
{code}
2. Reinit triggers cleanup of the container
{code:java}
ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService 
(NMLeveldbStateStoreService.java:storeContainerKilled(555)) - 
storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(752)) - Marking container 
container_e02_1533231998644_0009_01_03 as inactive
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container 
container_e02_1533231998644_0009_01_03 to kill from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container 
container_e02_1533231998644_0009_01_03 from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as 
user root for container container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: 
running ContainerId: container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
stop docker-command=stop name=container_e02_1533231998644_0009_01_03
{code}
3. After 10 seconds, the stop command sent to the executor completes and the 
container is 

[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-06 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774
 ] 

Chandni Singh edited comment on YARN-8160 at 8/6/18 10:06 PM:
--

Attached are the logs of container 3 that fails to re-initialize. When it is 
re-initialized, the container is stopped and cleanup. This causes the container 
to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or 
{{TERMINATED}}.

Since the container exits with a failure code, that is {{255}}, the status of 
the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to 
{{EXITED_WITH_FAILURE}}.

Below are the relevant log stmts:

1. Reinit of the container is triggered
{code:java}
 ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER

ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl 
(ContainerImpl.java:handle(2093)) - Container 
container_e02_1533231998644_0009_01_03 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KIL
{code}
2. Reinit triggers cleanup of the container
{code:java}
ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService 
(NMLeveldbStateStoreService.java:storeContainerKilled(555)) - 
storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(752)) - Marking container 
container_e02_1533231998644_0009_01_03 as inactive
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container 
container_e02_1533231998644_0009_01_03 to kill from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container 
container_e02_1533231998644_0009_01_03 from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as 
user root for container container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: 
running ContainerId: container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
stop docker-command=stop name=container_e02_1533231998644_0009_01_03
{code}
3. After 10 seconds, the stop command sent to the executor completes and the 
container is 

[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-06 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774
 ] 

Chandni Singh edited comment on YARN-8160 at 8/6/18 9:07 PM:
-

Attached are the logs of container 3 that fails to re-initialize. When it is 
re-initialized, the container is stopped and cleanup. This causes the container 
to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or 
{{TERMINATED}}.

Since the container exits with a failure code, that is {{255}}, the status of 
the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to 
{{EXITED_WITH_FAILURE}}.

Below are the relevant log stmts:

1. Reinit of the container is triggered
{code:java}
 ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl 
(ContainerImpl.java:handle(2080)) - Processing 
container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER

ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl 
(ContainerImpl.java:handle(2093)) - Container 
container_e02_1533231998644_0009_01_03 transitioned from RUNNING to 
REINITIALIZING_AWAITING_KIL
{code}
2. Reinit triggers cleanup of the container
{code:java}
ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService 
(NMLeveldbStateStoreService.java:storeContainerKilled(555)) - 
storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(752)) - Marking container 
container_e02_1533231998644_0009_01_03 as inactive
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container 
container_e02_1533231998644_0009_01_03 to kill from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container 
container_e02_1533231998644_0009_01_03 from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader 
(ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path 
/tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container 
container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch 
(ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as 
user root for container container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
inspect docker-command=inspect format=\{{.State.Status}} 
name=container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) 
- Privileged Execution Command Array: 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor 
(PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - 
[/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, 
--format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03]
ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: 
running ContainerId: container_e02_1533231998644_0009_01_03
ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor 
(DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: 
stop docker-command=stop name=container_e02_1533231998644_0009_01_03
{code}
3. After 10 seconds, the stop command sent to the executor completes and the 
container is 

[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-08-01 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566144#comment-16566144
 ] 

Chandni Singh edited comment on YARN-8160 at 8/1/18 11:27 PM:
--

Thanks [~eyang]. This explanation is very helpful. Completely agree that we 
don't need to worry about logic to break docker image download into separate 
steps at this time. I will create a separate ticket for that and use this to 
just fix the bugs with \{{reInitializeContainer}} with docker container.


was (Author: csingh):
Thanks [~eyang]. This explanation is very helpful. Completely agree that we 
don't need to worry about logic to break docker image download into separate 
steps at this time. I will create a separate ticket for that and use this to 
just fix the bugs with \{{ reInitializeContainer}} with docker container.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
> -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-07-31 Thread Chandni Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441414#comment-16441414
 ] 

Chandni Singh edited comment on YARN-8160 at 8/1/18 12:29 AM:
--

We do need to build a new CLC because upgrade lets user modify the 
artifacts/env/configs of their service. Likely scenario is that the version of 
the component (docker based) changed. I haven't tested with a docker based app, 
but I think this re-init should work seamlessly in the docker case. If it 
doesn't then that would be a bug.

I don't clearly understand [~eyang]'s comments on the improvement that is being 
proposed. 



was (Author: csingh):
I looked at the {{ContainerImpl}} code and I think re-init already uses the 
re-launch logic.
As [~shaneku...@gmail.com] pointed out, re-init first will deactivate the 
existing container then do a relaunch with a new {{ContainerLaunchContext}}. 

We do need to build a new CLC because upgrade lets user modify the 
artifacts/env/configs of their service. Likely scenario is that the version of 
the component (docker based) changed. I haven't tested with a docker based app, 
but I think this re-init should work seamlessly in the docker case. If it 
doesn't then that would be a bug.

I don't clearly understand [~eyang]'s comments on the improvement that is being 
proposed. 


> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>  Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers

2018-04-17 Thread Shane Kumpf (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441045#comment-16441045
 ] 

Shane Kumpf edited comment on YARN-8160 at 4/17/18 3:43 PM:


{quote}It might help to refine relaunch logic and reuse some of the existing 
code.
{quote}
I think there are parts of Relaunch that could be useful in the upgrade case, 
but I believe some semantics will clash. The idea behind Relaunch is to reuse 
the same NodeManager, the container id, previous localized resources, and the 
working directory. In the Docker case, we go a step further and run a {{docker 
start}} on the existing container. IIUC, the current upgrade design uses 
re-init which cleans up the existing container prior to launching with a new 
CLC. These are quite different in their approaches.

Where I see Relaunch features being beneficial is the reuse of the container 
id/NM, which may allow for reusing the IP address previously assigned. Reuse of 
the existing work dir could also be useful in some case, but I expect those 
cases are limited. However, I'm concerned about Relaunch skipping localization 
and {{docker start}} won't work. At this point, it seems like re-init might be 
more appropriate.

[~csingh] - based on your experience here, do you have a high level idea of how 
this will work in the Docker case?

 


was (Author: shaneku...@gmail.com):
{quote}It might help to refine relaunch logic and reuse some of the existing 
code.
{quote}
 

I think there are parts of Relaunch that could be useful in the upgrade case, 
but I believe some semantics will clash. The idea behind Relaunch is tore use 
the same NodeManager, the container id, localization, and the working 
directory. In the Docker case, we go a step further and run a {{docker start}} 
on the existing container. IIUC, the current upgrade design uses re-init which 
cleans up the existing container prior to launching with a new CLC. These are 
quite different in their approaches.

Where I see Relaunch features being beneficial is the reuse of the container 
id/NM, which may allow for reusing the IP address previously assigned. Reuse of 
the existing work dir could also be useful in some case, but I expect those 
cases are limited. However, I'm concerned about Relaunch skipping localization 
and {{docker start}} won't work. At this point, it seems like re-init might be 
more appropriate.

[~csingh] - based on your experience here, do you have a high level idea of how 
this will work in the Docker case?

 

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> 
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Chandni Singh
>Assignee: Chandni Singh
>Priority: Major
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org