Hi, Environment: - Clean vagrant install, 1 master, 1 slave (same behaviour on production cluster with 3 masters, 6 slaves) - Mesos 0.20.1 - Marathon 0.7.3 - Docker 1.2.0
Slave config: - containerizers: "docker,mesos" - executor_registration_timeout: 5mins When is start docker container tasks, they start being pulled from the HUB, but after 1 minute mesos kills them. In the background though the pull is still finishing and when everything is pulled in the docker container is started, without mesos knowing about it. When I start the same task in mesos again (after I know the pull of the image is done), they run normally. So this leaves slaves with 'dirty' docker containers, as mesos has no knowledge about them. >From the logs I get this: --- I1009 15:30:02.990291 1414 slave.cpp:1002] Got assigned task test-app.23755452-4fc9-11e4-839b-080027c4337a for framework 20140904-160348-185204746-5050-27588-0000 I1009 15:30:02.990979 1414 slave.cpp:1112] Launching task test-app.23755452-4fc9-11e4-839b-080027c4337a for framework 20140904-160348-185204746-5050-27588-0000 I1009 15:30:02.993341 1414 slave.cpp:1222] Queuing task 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor test-app.23755452-4fc9-11e4-839b-080027c4337a of framework '20140904-160348-185204746-5050-27588-0000 I1009 15:30:02.995818 1409 docker.cpp:743] Starting container '25ac3310-71e4-4d10-8a4b-38add4537308' for task 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework '20140904-160348-185204746-5050-27588-0000' I1009 15:31:07.033287 1413 slave.cpp:1278] Asked to kill task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 I1009 15:31:07.034742 1413 slave.cpp:2088] Handling status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0 W1009 15:31:07.034881 1413 slave.cpp:1354] Killing the unregistered executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework 20140904-160348-185204746-5050-27588-0000 because it has no tasks E1009 15:31:07.034945 1413 slave.cpp:2205] Failed to update resources for container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor test-app.23755452-4fc9-11e4-839b-080027c4337a running task test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal task, destroying container: No container found I1009 15:31:07.035133 1413 status_update_manager.cpp:320] Received status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 I1009 15:31:07.035210 1413 status_update_manager.cpp:373] Forwarding status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 to [email protected]:5050 I1009 15:31:07.046167 1408 status_update_manager.cpp:398] Received status update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 I1009 15:35:02.993736 1414 slave.cpp:3010] Terminating executor test-app.23755452-4fc9-11e4-839b-080027c4337a of framework 20140904-160348-185204746-5050-27588-0000 because it did not register within 5mins --- I already posted my question on the marathon board, as I first thought it was an issue on marathon's end: https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY Kind regards, Nils

