[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138 ] Chandni Singh edited comment on YARN-8160 at 8/14/18 2:15 AM: -- [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is not specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes are supported by yarn service upgrade. was (Author: csingh): [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is not specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes which are supported by yarn service upgrade. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, YARN-8160.002.patch, > YARN-8160.003.patch, YARN-8160.004.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138 ] Chandni Singh edited comment on YARN-8160 at 8/14/18 2:13 AM: -- [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is not specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes which are supported by yarn service upgrade. was (Author: csingh): [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is not specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes are supported by yarn service upgrade. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, YARN-8160.002.patch, > YARN-8160.003.patch, YARN-8160.004.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579138#comment-16579138 ] Chandni Singh edited comment on YARN-8160 at 8/14/18 2:12 AM: -- [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is not specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes are supported by yarn service upgrade. was (Author: csingh): [~eyang] Changing resource for a component is not supported as part of yarn service upgrade. This is nothing in specific to docker container. For changes that are not supported by yarn service upgrade, initiation of upgrade will fail with appropriate error message. {{UpgradeComponentsFinder}} has the logic which outlines what changes are supported by yarn service upgrade. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, YARN-8160.002.patch, > YARN-8160.003.patch, YARN-8160.004.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573888#comment-16573888 ] Chandni Singh edited comment on YARN-8160 at 8/8/18 9:30 PM: - The reapContainer and cleaning of container files during cleanup can be mutually exclusive to the run. Attached is the patch for it. I am testing it. was (Author: csingh): The cleaning of container files during cleanup can be mutually exclusive to the run. Attached is the patch for it. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8610.001.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573888#comment-16573888 ] Chandni Singh edited comment on YARN-8160 at 8/8/18 9:09 PM: - The cleaning of container files during cleanup can be mutually exclusive to the run. Attached is the patch for it. was (Author: csingh): The clean of container files during container cleanup can be mutually exclusive to the run. Attached is the patch for it. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > Attachments: YARN-8160.001.patch, > container_e02_1533231998644_0009_01_03.nm.log > > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774 ] Chandni Singh edited comment on YARN-8160 at 8/6/18 10:06 PM: -- Attached are the logs of container 3 that fails to re-initialize. When it is re-initialized, the container is stopped and cleanup. This causes the container to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or {{TERMINATED}}. Since the container exits with a failure code, that is {{255}}, the status of the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to {{EXITED_WITH_FAILURE}}. Below are the relevant log stmts: 1. Reinit of the container is triggered {code:java} ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl (ContainerImpl.java:handle(2080)) - Processing container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl (ContainerImpl.java:handle(2093)) - Container container_e02_1533231998644_0009_01_03 transitioned from RUNNING to REINITIALIZING_AWAITING_KIL {code} 2. Reinit triggers cleanup of the container {code:java} ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:storeContainerKilled(555)) - storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(752)) - Marking container container_e02_1533231998644_0009_01_03 as inactive ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container container_e02_1533231998644_0009_01_03 to kill from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container container_e02_1533231998644_0009_01_03 from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as user root for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: inspect docker-command=inspect format=\{{.State.Status}} name=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) - Privileged Execution Command Array: [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: running ContainerId: container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: stop docker-command=stop name=container_e02_1533231998644_0009_01_03 {code} 3. After 10 seconds, the stop command sent to the executor completes and the container is
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774 ] Chandni Singh edited comment on YARN-8160 at 8/6/18 10:06 PM: -- Attached are the logs of container 3 that fails to re-initialize. When it is re-initialized, the container is stopped and cleanup. This causes the container to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or {{TERMINATED}}. Since the container exits with a failure code, that is {{255}}, the status of the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to {{EXITED_WITH_FAILURE}}. Below are the relevant log stmts: 1. Reinit of the container is triggered {code:java} ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl (ContainerImpl.java:handle(2080)) - Processing container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl (ContainerImpl.java:handle(2093)) - Container container_e02_1533231998644_0009_01_03 transitioned from RUNNING to REINITIALIZING_AWAITING_KIL {code} 2. Reinit triggers cleanup of the container {code:java} ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:storeContainerKilled(555)) - storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(752)) - Marking container container_e02_1533231998644_0009_01_03 as inactive ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container container_e02_1533231998644_0009_01_03 to kill from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container container_e02_1533231998644_0009_01_03 from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as user root for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: inspect docker-command=inspect format=\{{.State.Status}} name=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) - Privileged Execution Command Array: [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: running ContainerId: container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: stop docker-command=stop name=container_e02_1533231998644_0009_01_03 {code} 3. After 10 seconds, the stop command sent to the executor completes and the container is
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16570774#comment-16570774 ] Chandni Singh edited comment on YARN-8160 at 8/6/18 9:07 PM: - Attached are the logs of container 3 that fails to re-initialize. When it is re-initialized, the container is stopped and cleanup. This causes the container to exit but here it exits with code {{255}} instead of {{FORCE_KILLED}} or {{TERMINATED}}. Since the container exits with a failure code, that is {{255}}, the status of the container in NM changes from {{REINITIALIZING_AWAITING_KILL}} to {{EXITED_WITH_FAILURE}}. Below are the relevant log stmts: 1. Reinit of the container is triggered {code:java} ctr005.log:2018-08-02 22:30:41,100 DEBUG container.ContainerImpl (ContainerImpl.java:handle(2080)) - Processing container_e02_1533231998644_0009_01_03 of type REINITIALIZE_CONTAINER ctr005.log:2018-08-02 22:30:41,101 INFO container.ContainerImpl (ContainerImpl.java:handle(2093)) - Container container_e02_1533231998644_0009_01_03 transitioned from RUNNING to REINITIALIZING_AWAITING_KIL {code} 2. Reinit triggers cleanup of the container {code:java} ctr005.log:2018-08-02 22:30:41,102 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(734)) - Cleaning up container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG recovery.NMLeveldbStateStoreService (NMLeveldbStateStoreService.java:storeContainerKilled(555)) - storeContainerKilled: containerId=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(752)) - Marking container container_e02_1533231998644_0009_01_03 as inactive ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(759)) - Getting pid for container container_e02_1533231998644_0009_01_03 to kill from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1084)) - Accessing pid for container container_e02_1533231998644_0009_01_03 from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(53)) - Accessing pid from pid file /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG util.ProcessIdFileReader (ProcessIdFileReader.java:getProcessId(103)) - Got pid 364708 from path /tmp/hadoop/yarn/local/nmPrivate/application_1533231998644_0009/container_e02_1533231998644_0009_01_03/container_e02_1533231998644_0009_01_03.pid ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:getContainerPid(1096)) - Got pid 364708 for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG launcher.ContainerLaunch (ContainerLaunch.java:signalProcess(919)) - Sending signal to pid 364708 as user root for container container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,102 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: inspect docker-command=inspect format=\{{.State.Status}} name=container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,103 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:getPrivilegedOperationExecutionCommand(119)) - Privileged Execution Command Array: [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,129 DEBUG privileged.PrivilegedOperationExecutor (PrivilegedOperationExecutor.java:executePrivilegedOperation(155)) - [/hadoop_dist/hadoop-yarn/bin/container-executor, --inspect-docker-container, --format=\{{.State.Status}}, container_e02_1533231998644_0009_01_03] ctr005.log:2018-08-02 22:30:41,130 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:getContainerStatus(154)) - Container Status: running ContainerId: container_e02_1533231998644_0009_01_03 ctr005.log:2018-08-02 22:30:41,131 DEBUG docker.DockerCommandExecutor (DockerCommandExecutor.java:executeDockerCommand(89)) - Running docker command: stop docker-command=stop name=container_e02_1533231998644_0009_01_03 {code} 3. After 10 seconds, the stop command sent to the executor completes and the container is
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566144#comment-16566144 ] Chandni Singh edited comment on YARN-8160 at 8/1/18 11:27 PM: -- Thanks [~eyang]. This explanation is very helpful. Completely agree that we don't need to worry about logic to break docker image download into separate steps at this time. I will create a separate ticket for that and use this to just fix the bugs with \{{reInitializeContainer}} with docker container. was (Author: csingh): Thanks [~eyang]. This explanation is very helpful. Completely agree that we don't need to worry about logic to break docker image download into separate steps at this time. I will create a separate ticket for that and use this to just fix the bugs with \{{ reInitializeContainer}} with docker container. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 > *Background* > Container upgrade is supported by the NM via {{reInitializeContainer}} api. > {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded > container. > NM performs the following steps during {{reInitializeContainer}}: > - kills the existing process > - cleans up the container > - launches another container with the new {{ContainerLaunchContext}} > NOTE: {{ContainerLaunchContext}} holds all the information that needs to > upgrade the container. > With {{reInitializeContainer}}, the following does *NOT* change > - container ID. This is not created by NM. It is provided to it and here RM > is not creating another container allocation. > - {{localizedResources}} this stays the same if the upgrade does *NOT* > require additional resources IIUC. > > The following changes with {{reInitializeContainer}} > - the working directory of the upgraded container changes. It is *NOT* a > relaunch. > *Changes required in the case of docker container* > - {{reInitializeContainer}} seems to not be working with Docker containers. > Investigate and fix this. > - [Future change] Add an additional api to NM to pull the images and modify > {{reInitializeContainer}} to trigger docker container launch without pulling > the image first which could be based on a flag. > -- When the service upgrade is initialized, we can provide the user with > an option to just pull the images on the NMs. > -- When a component instance is upgrade, it calls the > {{reInitializeContainer}} with the flag pull-image set to false, since the NM > will have already pulled the images. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441414#comment-16441414 ] Chandni Singh edited comment on YARN-8160 at 8/1/18 12:29 AM: -- We do need to build a new CLC because upgrade lets user modify the artifacts/env/configs of their service. Likely scenario is that the version of the component (docker based) changed. I haven't tested with a docker based app, but I think this re-init should work seamlessly in the docker case. If it doesn't then that would be a bug. I don't clearly understand [~eyang]'s comments on the improvement that is being proposed. was (Author: csingh): I looked at the {{ContainerImpl}} code and I think re-init already uses the re-launch logic. As [~shaneku...@gmail.com] pointed out, re-init first will deactivate the existing container then do a relaunch with a new {{ContainerLaunchContext}}. We do need to build a new CLC because upgrade lets user modify the artifacts/env/configs of their service. Likely scenario is that the version of the component (docker based) changed. I haven't tested with a docker based app, but I think this re-init should work seamlessly in the docker case. If it doesn't then that would be a bug. I don't clearly understand [~eyang]'s comments on the improvement that is being proposed. > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > Labels: Docker > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-8160) Yarn Service Upgrade: Support upgrade of service that use docker containers
[ https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441045#comment-16441045 ] Shane Kumpf edited comment on YARN-8160 at 4/17/18 3:43 PM: {quote}It might help to refine relaunch logic and reuse some of the existing code. {quote} I think there are parts of Relaunch that could be useful in the upgrade case, but I believe some semantics will clash. The idea behind Relaunch is to reuse the same NodeManager, the container id, previous localized resources, and the working directory. In the Docker case, we go a step further and run a {{docker start}} on the existing container. IIUC, the current upgrade design uses re-init which cleans up the existing container prior to launching with a new CLC. These are quite different in their approaches. Where I see Relaunch features being beneficial is the reuse of the container id/NM, which may allow for reusing the IP address previously assigned. Reuse of the existing work dir could also be useful in some case, but I expect those cases are limited. However, I'm concerned about Relaunch skipping localization and {{docker start}} won't work. At this point, it seems like re-init might be more appropriate. [~csingh] - based on your experience here, do you have a high level idea of how this will work in the Docker case? was (Author: shaneku...@gmail.com): {quote}It might help to refine relaunch logic and reuse some of the existing code. {quote} I think there are parts of Relaunch that could be useful in the upgrade case, but I believe some semantics will clash. The idea behind Relaunch is tore use the same NodeManager, the container id, localization, and the working directory. In the Docker case, we go a step further and run a {{docker start}} on the existing container. IIUC, the current upgrade design uses re-init which cleans up the existing container prior to launching with a new CLC. These are quite different in their approaches. Where I see Relaunch features being beneficial is the reuse of the container id/NM, which may allow for reusing the IP address previously assigned. Reuse of the existing work dir could also be useful in some case, but I expect those cases are limited. However, I'm concerned about Relaunch skipping localization and {{docker start}} won't work. At this point, it seems like re-init might be more appropriate. [~csingh] - based on your experience here, do you have a high level idea of how this will work in the Docker case? > Yarn Service Upgrade: Support upgrade of service that use docker containers > > > Key: YARN-8160 > URL: https://issues.apache.org/jira/browse/YARN-8160 > Project: Hadoop YARN > Issue Type: Sub-task >Reporter: Chandni Singh >Assignee: Chandni Singh >Priority: Major > > Ability to upgrade dockerized yarn native services. > Ref: YARN-5637 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org