I had the same issue as you did, here is how I fixed it: http://mail-archives.apache.org/mod_mbox/mesos-user/201410.mbox/browser
On Thu, Oct 23, 2014 at 1:42 AM, Connor Doyle <[email protected]> wrote: > Hi Eduardo, > > There is a known defect in Mesos that matches your description: > https://issues.apache.org/jira/browse/MESOS-1915 > https://issues.apache.org/jira/browse/MESOS-1884 > > A fix will be included in the next release. > https://reviews.apache.org/r/26486 > > You see the killTask because the default --task_launch_timeout value for > Marathon is 60 seconds. > Created an issue to make the logging around this better: > https://github.com/mesosphere/marathon/issues/732 > > -- > Connor > > > On Oct 22, 2014, at 16:18, Eduardo Jiménez <[email protected]> wrote: > > > Hi, > > > > I've started experimenting with mesos using the docker containerizer, > and running a simple example got into a very strange state. > > > > I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux: > > > > Linux <ip> 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 > x86_64 x86_64 x86_64 GNU/Linux > > > > Docker version 1.2.0, build fa7b24f/1.2.0 > > > > I start the mesos slave with these relevant options: > > > > --cgroups_hierarchy=/cgroup > > --containerizers=docker,mesos > > --executor_registration_timeout=5mins > > --isolation=cgroups/cpu,cgroups/mem > > > > I launched a very simple app, which is from the mesosphere examples: > > > > { > > "container": { > > "type": "DOCKER", > > "docker": { > > "image": "libmesos/ubuntu" > > } > > }, > > "id": "ubuntu-docker2", > > "instances": "1", > > "cpus": "0.5", > > "mem": "512", > > "uris": [], > > "cmd": "while sleep 10; do date -u +%T; done" > > } > > > > The app launches, but then mesos states the task is KILLED, yet the > docker container is STILL running. Here's the sequence of logs from that > mesos-slave. > > > > 1) Task gets created and assigned: > > > > I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework > 20141017-172055-3489660938-5050-1603-0000 > > I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework > 20141017-172055-3489660938-5050-1603-0000 > > I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > '20141017-172055-3489660938-5050-1603-0000 > > I1022 17:44:13.989893 15195 docker.cpp:743] Starting container > 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor > 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework > '20141017-172055-3489660938-5050-1603-0000' > > > > So far so good. The log statements right next to "Starting container" is: > > > > I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 > > I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update > TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 from @0.0.0.0:0 > > W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered > executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework > 20141017-172055-3489660938-5050-1603-0000 because it has no tasks > > E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources > for container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for > terminal task, destroying container: No container found > > > > After this, there's several log messages like this: > > > > I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received > status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for > task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 > > I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding > status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for > task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050 > > W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending > status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for > task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 > > I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding > status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for > task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework > 20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050 > > > > > > Eventually the TASK_KILLED update is acked and the Mesos UI shows the > task as killed. By then, the process should be dead, but its not. > > > > $ sudo docker ps > > CONTAINER ID IMAGE COMMAND > CREATED STATUS PORTS NAMES > > f76784e1af8b libmesos/ubuntu:latest "/bin/sh -c 'while s 5 > hours ago Up 5 hours > mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978 > > > > > > The container shows in the UI like this: > > > > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 > ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 KILLED 5 hours > ago 5 hours ago > > And its been running the whole time. > > > > There's no other logging indicating why killTask was invoked, which > makes this extremely frustrating to debug. > > > > Has anyone seen something similar? > > > > Thanks, > > > > Eduardo > > > >

