Thanks, I just had the same thought I'm injecting it via environment variable: MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
but I don't know how to check that the setting took On Thu, Oct 2, 2014 at 2:24 PM, Dick Davies <[email protected]> wrote: > One thing to check - have you upped > > --executor_registration_timeout > > from the default of 1min? a docker pull can easily take longer than that. > > On 2 October 2014 22:18, Michael Babineau <[email protected]> > wrote: > > I'm seeing an issue where tasks are being marked as killed but remain > > running. The tasks all run via the native Docker containerizer and are > > started from Marathon. > > > > The net result is additional, orphaned Docker containers that must be > > stopped/removed manually. > > > > Versions: > > - Mesos 0.20.1 > > - Marathon 0.7.1 > > - Docker 1.2.0 > > - Ubuntu 14.04 > > > > Environment: > > - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate > instances) > > on EC2 > > > > Here's the task in the Mesos UI: > > > > (note that stderr continues to update with the latest container output) > > > > Here's the still-running Docker container: > > $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f > > 3d451b8213ea > > > docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0 > > "\"/bin/sh -c 'java 26 minutes ago Up 26 minutes 9990/tcp > > mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f > > > > Here are the Mesos logs associated with the task: > > $ grep eda431d7-4a74-11e4-a320-56847afe9799 > /var/log/mesos/mesos-slave.INFO > > I1002 20:44:39.176024 1528 slave.cpp:1002] Got assigned task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework > > 20140919-224934-1593967114-5050-1518-0000 > > I1002 20:44:39.176257 1528 slave.cpp:1112] Launching task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework > > 20140919-224934-1593967114-5050-1518-0000 > > I1002 20:44:39.177287 1528 slave.cpp:1222] Queuing task > > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > '20140919-224934-1593967114-5050-1518-0000 > > I1002 20:44:39.191769 1528 docker.cpp:743] Starting container > > '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task > > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor > > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework > > '20140919-224934-1593967114-5050-1518-0000' > > I1002 20:44:43.707033 1521 slave.cpp:1278] Asked to kill task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > 20140919-224934-1593967114-5050-1518-0000 > > I1002 20:44:43.707811 1521 slave.cpp:2088] Handling status update > > TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > 20140919-224934-1593967114-5050-1518-0000 from @0.0.0.0:0 > > W1002 20:44:43.708273 1521 slave.cpp:1354] Killing the unregistered > > executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of > framework > > 20140919-224934-1593967114-5050-1518-0000 because it has no tasks > > E1002 20:44:43.708375 1521 slave.cpp:2205] Failed to update resources > for > > container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for > > terminal task, destroying container: No container found > > I1002 20:44:43.708524 1521 status_update_manager.cpp:320] Received > status > > update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > 20140919-224934-1593967114-5050-1518-0000 > > I1002 20:44:43.708709 1521 status_update_manager.cpp:373] Forwarding > status > > update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > 20140919-224934-1593967114-5050-1518-0000 to [email protected]:5050 > > I1002 20:44:43.728991 1526 status_update_manager.cpp:398] Received > status > > update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for > task > > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework > > 20140919-224934-1593967114-5050-1518-0000 > > I1002 20:47:05.904324 1527 slave.cpp:2538] Monitoring executor > > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework > > '20140919-224934-1593967114-5050-1518-0000' in container > > '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' > > I1002 20:47:06.311027 1525 slave.cpp:1733] Got registration for executor > > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework > > 20140919-224934-1593967114-5050-1518-0000 from executor(1)@ > 10.2.1.34:29920 > > > > I'll typically see a barrage of these in association with a Marathon app > > update (which deploys new tasks). Eventually, one container "sticks" and > we > > get a RUNNING task instead of a KILLED one. > > > > Where else can I look? >

