See also https://issues.apache.org/jira/browse/MESOS-1915
On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <[email protected]> wrote: > One gotcha - the marathon timeout is in seconds, so pass '300' in your > case. > > let us know if it works, I spotted this the other day and anecdotally > it addresses > the issue for some users, be good to get more feedback. > > On 16 October 2014 09:49, Grzegorz Graczyk <[email protected]> wrote: > > Make sure you have --task_launch_timeout in marathon set to same value as > > executor_registration_timeout. > > > https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon > > > > On 16 October 2014 10:37, Nils De Moor <[email protected]> wrote: > >> > >> Hi, > >> > >> Environment: > >> - Clean vagrant install, 1 master, 1 slave (same behaviour on production > >> cluster with 3 masters, 6 slaves) > >> - Mesos 0.20.1 > >> - Marathon 0.7.3 > >> - Docker 1.2.0 > >> > >> Slave config: > >> - containerizers: "docker,mesos" > >> - executor_registration_timeout: 5mins > >> > >> When is start docker container tasks, they start being pulled from the > >> HUB, but after 1 minute mesos kills them. > >> In the background though the pull is still finishing and when everything > >> is pulled in the docker container is started, without mesos knowing > about > >> it. > >> When I start the same task in mesos again (after I know the pull of the > >> image is done), they run normally. > >> > >> So this leaves slaves with 'dirty' docker containers, as mesos has no > >> knowledge about them. > >> > >> From the logs I get this: > >> --- > >> I1009 15:30:02.990291 1414 slave.cpp:1002] Got assigned task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework > >> 20140904-160348-185204746-5050-27588-0000 > >> I1009 15:30:02.990979 1414 slave.cpp:1112] Launching task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework > >> 20140904-160348-185204746-5050-27588-0000 > >> I1009 15:30:02.993341 1414 slave.cpp:1222] Queuing task > >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> '20140904-160348-185204746-5050-27588-0000 > >> I1009 15:30:02.995818 1409 docker.cpp:743] Starting container > >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task > >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor > >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework > >> '20140904-160348-185204746-5050-27588-0000' > >> > >> I1009 15:31:07.033287 1413 slave.cpp:1278] Asked to kill task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 > >> I1009 15:31:07.034742 1413 slave.cpp:2088] Handling status update > >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0 > >> W1009 15:31:07.034881 1413 slave.cpp:1354] Killing the unregistered > >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework > >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks > >> E1009 15:31:07.034945 1413 slave.cpp:2205] Failed to update resources > for > >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor > >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for > terminal > >> task, destroying container: No container found > >> I1009 15:31:07.035133 1413 status_update_manager.cpp:320] Received > status > >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 > >> I1009 15:31:07.035210 1413 status_update_manager.cpp:373] Forwarding > >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) > for > >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 to [email protected]:5050 > >> I1009 15:31:07.046167 1408 status_update_manager.cpp:398] Received > status > >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for > task > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 > >> > >> I1009 15:35:02.993736 1414 slave.cpp:3010] Terminating executor > >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework > >> 20140904-160348-185204746-5050-27588-0000 because it did not register > within > >> 5mins > >> --- > >> > >> I already posted my question on the marathon board, as I first thought > it > >> was an issue on marathon's end: > >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY > >> > >> > >> Kind regards, > >> Nils > >> > > >

