Hi,

I've started experimenting with mesos using the docker containerizer, and
running a simple example got into a very strange state.

I have mesos-0.20.1, marathon-0.7 setup on EC2, using Amazon Linux:

Linux <ip> 3.14.20-20.44.amzn1.x86_64 #1 SMP Mon Oct 6 22:52:46 UTC 2014
x86_64 x86_64 x86_64 GNU/Linux

Docker version 1.2.0, build fa7b24f/1.2.0

I start the mesos slave with these relevant options:

--cgroups_hierarchy=/cgroup
--containerizers=docker,mesos
--executor_registration_timeout=5mins
--isolation=cgroups/cpu,cgroups/mem

I launched a very simple app, which is from the mesosphere examples:

{
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "libmesos/ubuntu"
    }
  },
  "id": "ubuntu-docker2",
  "instances": "1",
  "cpus": "0.5",
  "mem": "512",
  "uris": [],
  "cmd": "while sleep 10; do date -u +%T; done"
}

The app launches, but then mesos states the task is KILLED, yet the docker
container is STILL running. Here's the sequence of logs from that
mesos-slave.

1) Task gets created and assigned:

I1022 17:44:13.971096 15195 slave.cpp:1002] Got assigned task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework
20141017-172055-3489660938-5050-1603-0000
I1022 17:44:13.971367 15195 slave.cpp:1112] Launching task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 for framework
20141017-172055-3489660938-5050-1603-0000
I1022 17:44:13.973047 15195 slave.cpp:1222] Queuing task
'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' for executor
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
'20141017-172055-3489660938-5050-1603-0000
I1022 17:44:13.989893 15195 docker.cpp:743] Starting container
'c1fc27c8-13e9-484f-a30c-cb062ec4c978' for task
'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' (and executor
'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799') of framework
'20141017-172055-3489660938-5050-1603-0000'

So far so good. The log statements right next to "Starting container" is:

I1022 17:45:14.893309 15196 slave.cpp:1278] Asked to kill task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000
I1022 17:45:14.894579 15196 slave.cpp:2088] Handling status update
TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000 from @0.0.0.0:0
W1022 17:45:14.894798 15196 slave.cpp:1354] Killing the unregistered
executor 'ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799' of framework
20141017-172055-3489660938-5050-1603-0000 because it has no tasks
E1022 17:45:14.925014 15192 slave.cpp:2205] Failed to update resources for
container c1fc27c8-13e9-484f-a30c-cb062ec4c978 of executor
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 running task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 on status update for
terminal task, destroying container: No container found

After this, there's several log messages like this:

I1022 17:45:14.926197 15194 status_update_manager.cpp:320] Received status
update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000
I1022 17:45:14.926378 15194 status_update_manager.cpp:373] Forwarding
status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050
W1022 17:45:16.169214 15196 status_update_manager.cpp:181] Resending status
update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for task
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000
I1022 17:45:16.169275 15196 status_update_manager.cpp:373] Forwarding
status update TASK_KILLED (UUID: 660dfd13-61a0-4e3f-9590-fba0d1a42ab2) for
task ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799 of framework
20141017-172055-3489660938-5050-1603-0000 to [email protected]:5050


Eventually the TASK_KILLED update is acked and the Mesos UI shows the task
as killed. By then, the process should be dead, but its not.

$ sudo docker ps
CONTAINER ID        IMAGE                    COMMAND                CREATED
            STATUS              PORTS               NAMES
f76784e1af8b        libmesos/ubuntu:latest   "/bin/sh -c 'while s   5 hours
ago         Up 5 hours
 mesos-c1fc27c8-13e9-484f-a30c-cb062ec4c978


The container shows in the UI like this:

ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799
ubuntu-docker2.0995fb7f-5a13-11e4-a18e-56847afe9799KILLED5 hours ago5 hours
ago
And its been running the whole time.

There's no other logging indicating why killTask was invoked, which makes
this extremely frustrating to debug.

Has anyone seen something similar?

Thanks,

Eduardo

Reply via email to