Hi,

Environment:
- Clean vagrant install, 1 master, 1 slave (same behaviour on production
cluster with 3 masters, 6 slaves)
- Mesos 0.20.1
- Marathon 0.7.3
- Docker 1.2.0

Slave config:
- containerizers: "docker,mesos"
- executor_registration_timeout: 5mins

When is start docker container tasks, they start being pulled from the HUB,
but after 1 minute mesos kills them.
In the background though the pull is still finishing and when everything is
pulled in the docker container is started, without mesos knowing about it.
When I start the same task in mesos again (after I know the pull of the
image is done), they run normally.

So this leaves slaves with 'dirty' docker containers, as mesos has no
knowledge about them.

>From the logs I get this:
---
I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
'20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
'25ac3310-71e4-4d10-8a4b-38add4537308' for task
'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
'20140904-160348-185204746-5050-27588-0000'

I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000
I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
20140904-160348-185204746-5050-27588-0000 because it has no tasks
E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources for
container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
test-app.23755452-4fc9-11e4-839b-080027c4337a running task
test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal
task, destroying container: No container found
I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received status
update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000
I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 to [email protected]:5050
I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000

I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 because it did not register
within 5mins
---

I already posted my question on the marathon board, as I first thought it
was an issue on marathon's end:
https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY


Kind regards,
Nils

Reply via email to