[
https://issues.apache.org/jira/browse/YARN-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566098#comment-16566098
]
Eric Yang commented on YARN-8160:
---------------------------------
The current upgrade per instance command is almost working. There seems to be
some bugs when I test the API. First, I launch an service that looks like this:
{code}
{
"name": "sleeper-service",
"kerberos_principal" : {
"principal_name" : "hbase/[email protected]",
"keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
},
"version": "1",
"components" :
[
{
"name": "ping",
"number_of_containers": 2,
"artifact": {
"id": "hadoop/centos:6",
"type": "DOCKER"
},
"launch_command": "sleep,9000",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
},
"properties": {
"docker.network": "host"
}
}
}
]
}
{code}
After the application is launched, yarnfile is updated with a new docker image
version, and launch command changed from sleep,90000 to sleep,90.
{code}
{
"name": "sleeper-service",
"kerberos_principal" : {
"principal_name" : "hbase/[email protected]",
"keytab" : "file:///etc/security/keytabs/hbase.service.keytab"
},
"version": "2",
"components" :
[
{
"name": "ping",
"number_of_containers": 2,
"artifact": {
"id": "hadoop/centos:latest",
"type": "DOCKER"
},
"launch_command": "sleep,90",
"resource": {
"cpus": 1,
"memory": "256"
},
"configuration": {
"env": {
"YARN_CONTAINER_RUNTIME_DOCKER_DELAYED_REMOVAL":"true",
"YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE":"true"
},
"properties": {
"docker.network": "host"
}
}
}
]
}
{code}
Then proceeded with yarn app -upgrade sleeper -initiate yarnfile.v2, and yarn
app -upgrade sleeper -instances ping-0,ping-1.
In the container log, it shows:
{code}
Docker run command: /usr/bin/docker run
--name=container_e02_1533070786532_0006_01_000002 --user=1013:1001
--security-opt=no-new-privileges --net=host -v
/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw
-v
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw
-v
/tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro
-v
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro
--cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP
--cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE
--cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID
--cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE
--hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_0000026435836068142984694.env
hadoop/centos:6 sleep 90000
Launching docker container...
Docker run command: /usr/bin/docker run
--name=container_e02_1533070786532_0006_01_000002 --user=1013:1001
--security-opt=no-new-privileges --net=host -v
/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:/usr/local/hadoop-3.2.0-SNAPSHOT/logs/userlogs/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002:rw
-v
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/appcache/application_1533070786532_0006:rw
-v
/tmp/hadoop-yarn/nm-local-dir/filecache:/tmp/hadoop-yarn/nm-local-dir/filecache:ro
-v
/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:/tmp/hadoop-yarn/nm-local-dir/usercache/hbase/filecache:ro
--cap-drop=ALL --cap-add=SYS_CHROOT --cap-add=MKNOD --cap-add=SETFCAP
--cap-add=SETPCAP --cap-add=FSETID --cap-add=CHOWN --cap-add=AUDIT_WRITE
--cap-add=SETGID --cap-add=NET_RAW --cap-add=FOWNER --cap-add=SETUID
--cap-add=DAC_OVERRIDE --cap-add=KILL --cap-add=NET_BIND_SERVICE
--hostname=ping-0.s1.hbase.ycluster --group-add 1001 --group-add 982 --env-file
/tmp/hadoop-yarn/nm-local-dir/nmPrivate/application_1533070786532_0006/container_e02_1533070786532_0006_01_000002/docker.container_e02_1533070786532_0006_01_000002254751351532328192.env
hadoop/centos:latest sleep 90000
{code}
The container is relaunched using centos:latest image instead of centos:6.
This is verified using docker inspect, and docker exec to verify that container
image has changed. However, launch command did not reflect the correct changes.
> Yarn Service Upgrade: Support upgrade of service that use docker containers
> ----------------------------------------------------------------------------
>
> Key: YARN-8160
> URL: https://issues.apache.org/jira/browse/YARN-8160
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Chandni Singh
> Assignee: Chandni Singh
> Priority: Major
> Labels: Docker
>
> Ability to upgrade dockerized yarn native services.
> Ref: YARN-5637
> *Background*
> Container upgrade is supported by the NM via {{reInitializeContainer}} api.
> {{reInitializeContainer}} does *NOT* change the ContainerId of the upgraded
> container.
> NM performs the following steps during {{reInitializeContainer}}:
> - kills the existing process
> - cleans up the container
> - launches another container with the new {{ContainerLaunchContext}}
> NOTE: {{ContainerLaunchContext}} holds all the information that needs to
> upgrade the container.
> With {{reInitializeContainer}}, the following does *NOT* change
> - container ID. This is not created by NM. It is provided to it and here RM
> is not creating another container allocation.
> - {{localizedResources}} this stays the same if the upgrade does *NOT*
> require additional resources IIUC.
>
> The following changes with {{reInitializeContainer}}
> - the working directory of the upgraded container changes. It is *NOT* a
> relaunch.
> *Changes required in the case of docker container*
> - {{reInitializeContainer}} seems to not be working with Docker containers.
> Investigate and fix this.
> - [Future change] Add an additional api to NM to pull the images and modify
> {{reInitializeContainer}} to trigger docker container launch without pulling
> the image first which could be based on a flag.
> -- When the service upgrade is initialized, we can provide the user with
> an option to just pull the images on the NMs.
> -- When a component instance is upgrade, it calls the
> {{reInitializeContainer}} with the flag pull-image set to false, since the NM
> will have already pulled the images.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]