[jira] [Assigned] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
[ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jie Yu reassigned MESOS-8158: - Assignee: (was: Gilbert Song) > Mesos Agent in docker neglects to retry discovering Task docker containers > -- > > Key: MESOS-8158 > URL: https://issues.apache.org/jira/browse/MESOS-8158 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker, executor >Affects Versions: 1.4.0 > Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4 >Reporter: Charles Allen >Priority: Major > > I have attempted to launch Mesos agents inside of a docker container in such > a way where the agent docker can be replaced and recovered. Unfortunately I > hit a major snag in the way the mesos docker launching works. > To test simple functionality a marathon app is setup that simply has the > following command: {{date && python -m SimpleHTTPServer $PORT0}} > That way the HTTP port can be accessed to assure things are being assigned > correctly, and the date is printed out in the log. > When I attempt to start this marathon app, the mesos agent (inside a docker > container) properly launches an executor which properly creates a second task > that launches the python code. Here's the output from the executor logs (this > looks correct): > {code} > I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0 > I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent > d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 > I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on > 10.0.75.2 > I1101 20:34:03.428680 68281 executor.cpp:160] Starting task > testapp.fe35282f-bf43-11e7-a24b-0242ac110002 > I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e > HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e > MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS > =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e > MARATHON_APP_RESOURCE_MEM=128.0 -e > MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e > MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e > MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA > SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e > PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v > /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp > .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox > --net host --entrypoint /bin/sh --name > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 > -c date && p > ython -m SimpleHTTPServer $PORT0 > I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container > not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > Wed Nov 1 20:34:06 UTC 2017 > {code} > But, somehow there is a TASK_FAILED message sent to marathon. > Upon further investigation, the following snippet can be found in the agent > logs (running in a docker container) > {code} > I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task > 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework > a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 > I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling > '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001' > from gc > I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling >
[jira] [Assigned] (MESOS-8158) Mesos Agent in docker neglects to retry discovering Task docker containers
[ https://issues.apache.org/jira/browse/MESOS-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gilbert Song reassigned MESOS-8158: --- Assignee: Gilbert Song > Mesos Agent in docker neglects to retry discovering Task docker containers > -- > > Key: MESOS-8158 > URL: https://issues.apache.org/jira/browse/MESOS-8158 > Project: Mesos > Issue Type: Bug > Components: agent, containerization, docker, executor >Affects Versions: 1.4.0 > Environment: Windows 10 with Docker version 17.09.0-ce, build afdb6d4 >Reporter: Charles Allen >Assignee: Gilbert Song > > I have attempted to launch Mesos agents inside of a docker container in such > a way where the agent docker can be replaced and recovered. Unfortunately I > hit a major snag in the way the mesos docker launching works. > To test simple functionality a marathon app is setup that simply has the > following command: {{date && python -m SimpleHTTPServer $PORT0}} > That way the HTTP port can be accessed to assure things are being assigned > correctly, and the date is printed out in the log. > When I attempt to start this marathon app, the mesos agent (inside a docker > container) properly launches an executor which properly creates a second task > that launches the python code. Here's the output from the executor logs (this > looks correct): > {code} > I1101 20:34:03.420210 68270 exec.cpp:162] Version: 1.4.0 > I1101 20:34:03.427455 68281 exec.cpp:237] Executor registered on agent > d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0 > I1101 20:34:03.428414 68283 executor.cpp:120] Registered docker executor on > 10.0.75.2 > I1101 20:34:03.428680 68281 executor.cpp:160] Starting task > testapp.fe35282f-bf43-11e7-a24b-0242ac110002 > I1101 20:34:03.428941 68281 docker.cpp:1080] Running docker -H > unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 -e > HOST=10.0.75.2 -e MARATHON_APP_DOCKER_IMAGE=python:2 -e > MARATHON_APP_ID=/testapp -e MARATHON_APP_LABELS= -e MARATHON_APP_RESOURCE_CPUS > =1.0 -e MARATHON_APP_RESOURCE_DISK=0.0 -e MARATHON_APP_RESOURCE_GPUS=0 -e > MARATHON_APP_RESOURCE_MEM=128.0 -e > MARATHON_APP_VERSION=2017-11-01T20:33:44.869Z -e > MESOS_CONTAINER_NAME=mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 -e > MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_TA > SK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 -e PORT=31464 -e > PORT0=31464 -e PORTS=31464 -e PORT_1=31464 -e PORT_HTTP=31464 -v > /var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001/executors/testapp > .fe35282f-bf43-11e7-a24b-0242ac110002/runs/84f9ae30-9d4c-484a-860c-ca7845b7ec75:/mnt/mesos/sandbox > --net host --entrypoint /bin/sh --name > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > --label=MESOS_TASK_ID=testapp.fe35282f-bf43-11e7-a24b-0242ac110002 python:2 > -c date && p > ython -m SimpleHTTPServer $PORT0 > I1101 20:34:03.430402 68281 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:03.520303 68286 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.021216 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.124490 68281 docker.cpp:1290] Retrying inspect with non-zero > status code. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:04.624964 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > I1101 20:34:04.934087 68286 docker.cpp:1345] Retrying inspect since container > not yet started. cmd: 'docker -H unix:///var/run/docker.sock inspect > mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75', interval: 500ms > I1101 20:34:05.435145 68288 docker.cpp:1243] Running docker -H > unix:///var/run/docker.sock inspect mesos-84f9ae30-9d4c-484a-860c-ca7845b7ec75 > Wed Nov 1 20:34:06 UTC 2017 > {code} > But, somehow there is a TASK_FAILED message sent to marathon. > Upon further investigation, the following snippet can be found in the agent > logs (running in a docker container) > {code} > I1101 20:34:00.949129 9 slave.cpp:1736] Got assigned task > 'testapp.fe35282f-bf43-11e7-a24b-0242ac110002' for framework > a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001 > I1101 20:34:00.950150 9 gc.cpp:93] Unscheduling > '/var/run/mesos/slaves/d9bb6e96-ee26-43c2-977e-0c404fdd4e81-S0/frameworks/a5eb6da1-f8ac-4642-8d66-cdd2e5b14d45-0001' > from gc > I1101 20:34:00.950225 9 gc.cpp:93] Unscheduling >