Hi Eduardo,

There is a known defect in Mesos that matches your description:
https://issues.apache.org/jira/browse/MESOS-1915
https://issues.apache.org/jira/browse/MESOS-1884

A fix will be included in the next release.
https://reviews.apache.org/r/26486

You see the killTask because the default --task_launch_timeout value for 
Marathon is 60 seconds.
Created an issue to make the logging around this better:
https://github.com/mesosphere/marathon/issues/732

--
Connor


On Oct 22, 2014, at 16:18, Eduardo Jiménez <[email protected]> wrote:

> Hi,
> 
> I've started experimenting with mesos using the docker containerizer, and 
> running a simple example got into a very strange state.
> 
> I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:
> 
> Linux <ip> 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014 
> x86_64 x86_64 x86_64 GNU/Linux
> 
> Docker version 1.2.0, build fa7b24f/1.2.0
> 
> I start the mesos slave with these relevant options:
> 
> --cgroups_hierarchy=/cgroup
> --containerizers=docker,mesos
> --executor_registration_timeout=5mins
> --isolation=cgroups/cpu,cgroups/mem
> 
> I launched a very simple app, which is from the mesosphere examples:
> 
> {
>   "container": {
>     "type": "DOCKER",
>     "docker": {
>       "image": "libmesos/ubuntu"
>     }
>   },
>   "id": "ubuntu-docker2",
>   "instances": "1",
>   "cpus": "0.5",
>   "mem": "512",
>   "uris": [],
>   "cmd": "while sleep 10; do date -u +%T; done"
> }
> 
> The app launches, but then mesos states the task is KILLED, yet the docker 
> container is STILL running. Here's the sequence of logs from that mesos-slave.
> 
> 1) Task gets created and assigned:
> 
> I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
> 20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework 
> 20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> '20141017-172055-3489660938-5050-1603-0000
> I1022 17:44:13.989893 15195 docker.cpp:743] Starting container 
> 'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework 
> '20141017-172055-3489660938-5050-1603-0000'
> 
> So far so good. The log statements right next to "Starting container" is:
> 
> I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update 
> TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000 from @0.0.0.0:0
> W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered executor 
> 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework 
> 20141017-172055-3489660938-5050-1603-0000 because it has no tasks
> E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for 
> container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for 
> terminal task, destroying container: No container found
> 
> After this, there's several log messages like this:
> 
> I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050
> W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000
> I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding status 
> update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework 
> 20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050
> 
> 
> Eventually the TASK_KILLED update is acked and the Mesos UI shows the task as 
> killed. By then, the process should be dead, but its not.
> 
> $ sudo docker ps
> CONTAINER ID        IMAGE                    COMMAND                CREATED   
>           STATUS              PORTS               NAMES
> f76784e1af8b        libmesos/ubuntu:latest   "/bin/sh -c 'while s   5 hours 
> ago         Up 5 hours                              
> mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978
> 
> 
> The container shows in the UI like this:
> 
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799   
> ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799     KILLED  5 hours ago   
>   5 hours ago
> And its been running the whole time.
> 
> There's no other logging indicating why killTask was invoked, which makes 
> this extremely frustrating to debug.
> 
> Has anyone seen something similar?
> 
> Thanks,
> 
> Eduardo
> 

Reply via email to