[ 
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566098#comment-16566098
 ] 

Eric Yang commented on YARN-8160:
---------------------------------

The current upgrade per instance command is almost working.  There seems to be 
some bugs when I test the API.  First, I launch an service that looks like this:

{code}
{
  "name": "sleeper-service",
  "kerberos_principal" : {
    "principal_name" : "hbase/[email protected]",
    "keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
  },
  "version": "1",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "artifact": {
        "id": "hadoop/centos:6",
        "type": "DOCKER"
      },
      "launch_command": "sleep,9000",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

After the application is launched, yarnfile is updated with a new docker image 
version, and launch command changed from sleep,90000 to sleep,90.

{code}
{
  "name": "sleeper-service",
  "kerberos_principal" : {
    "principal_name" : "hbase/[email protected]",
    "keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
  },
  "version": "2",
  "components" :
  [
    {
      "name": "ping",
      "number_of_containers": 2,
      "artifact": {
        "id": "hadoop/centos:latest",
        "type": "DOCKER"
      },
      "launch_command": "sleep,90",
      "resource": {
        "cpus": 1,
        "memory": "256"
      },
      "configuration": {
        "env": {
          "YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
          "YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
        },
        "properties": {
          "docker.network": "host"
        }
      }
    }
  ]
}
{code}

Then proceeded with yarn app -upgrade sleeper -initiate yarnfile.v2, and yarn 
app -upgrade sleeper -instances ping-0,ping-1.
In the container log, it shows:

{code}
Docker run command: /usr/bin/docker run 
--name=container_e02_1533070786532_0006_01_000002 --user=1013:1001 
--security-opt=no-new-privileges --net=host -v 
/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw
 -v 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw
 -v 
/tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro
 -v 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro
 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP 
--cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE 
--cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID 
--cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE 
--hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_0000026435836068142984694.env
 hadoop/centos:6 sleep 90000 
Launching docker container...
Docker run command: /usr/bin/docker run 
--name=container_e02_1533070786532_0006_01_000002 --user=1013:1001 
--security-opt=no-new-privileges --net=host -v 
/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw
 -v 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw
 -v 
/tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro
 -v 
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro
 --cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP 
--cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE 
--cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID 
--cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE 
--hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file 
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_000002254751351532328192.env
 hadoop/centos:latest sleep 90000 
{code}

The container is relaunched using centos:latest image instead of centos:6.  
This is verified using docker inspect, and docker exec to verify that container 
image has changed.  However, launch command did not reflect the correct changes.

> Yarn Service Upgrade: Support upgrade of service that use docker containers 
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8160
>                 URL: https://issues.apache.org/jira/browse/YARN-8160
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chandni Singh
>            Assignee: Chandni Singh
>            Priority: Major
>              Labels: Docker
>
> Ability to upgrade dockerized  yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api. 
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded 
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to 
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM 
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT* 
> require additional resources IIUC.
>  
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a 
> relaunch. 
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers. 
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify 
> {{reInitializeContainer}} to trigger docker container launch without pulling 
> the image first which could be based on a flag.
>     -- When the service upgrade is initialized, we can provide the user with 
> an option to just pull the images  on the NMs.
>     -- When a component instance is upgrade, it calls the 
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM 
> will have already pulled the images.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to