It doesn't appear to be related to the registration timeout; based on the logs the time between task launch and kill was only about 4.3 seconds. -- Connor
> On Oct 2, 2014, at 14:24, Dick Davies <[email protected]> wrote: > > One thing to check - have you upped > > --executor_registration_timeout > > from the default of 1min? a docker pull can easily take longer than that. > >> On 2 October 2014 22:18, Michael Babineau <[email protected]> wrote: >> I'm seeing an issue where tasks are being marked as killed but remain >> running. The tasks all run via the native Docker containerizer and are >> started from Marathon. >> >> The net result is additional, orphaned Docker containers that must be >> stopped/removed manually. >> >> Versions: >> - Mesos 0.20.1 >> - Marathon 0.7.1 >> - Docker 1.2.0 >> - Ubuntu 14.04 >> >> Environment: >> - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate instances) >> on EC2 >> >> Here's the task in the Mesos UI: >> >> (note that stderr continues to update with the latest container output) >> >> Here's the still-running Docker container: >> $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f >> 3d451b8213ea >> docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0 >> "\"/bin/sh -c 'java 26 minutes ago Up 26 minutes 9990/tcp >> mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f >> >> Here are the Mesos logs associated with the task: >> $ grep eda431d7-4a74-11e4-a320-56847afe9799 /var/log/mesos/mesos-slave.INFO >> I1002 20:44:39.176024 1528 slave.cpp:1002] Got assigned task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework >> 20140919-224934-1593967114-5050-1518-0000 >> I1002 20:44:39.176257 1528 slave.cpp:1112] Launching task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework >> 20140919-224934-1593967114-5050-1518-0000 >> I1002 20:44:39.177287 1528 slave.cpp:1222] Queuing task >> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> '20140919-224934-1593967114-5050-1518-0000 >> I1002 20:44:39.191769 1528 docker.cpp:743] Starting container >> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task >> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor >> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework >> '20140919-224934-1593967114-5050-1518-0000' >> I1002 20:44:43.707033 1521 slave.cpp:1278] Asked to kill task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> 20140919-224934-1593967114-5050-1518-0000 >> I1002 20:44:43.707811 1521 slave.cpp:2088] Handling status update >> TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> 20140919-224934-1593967114-5050-1518-0000 from @0.0.0.0:0 >> W1002 20:44:43.708273 1521 slave.cpp:1354] Killing the unregistered >> executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework >> 20140919-224934-1593967114-5050-1518-0000 because it has no tasks >> E1002 20:44:43.708375 1521 slave.cpp:2205] Failed to update resources for >> container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for >> terminal task, destroying container: No container found >> I1002 20:44:43.708524 1521 status_update_manager.cpp:320] Received status >> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> 20140919-224934-1593967114-5050-1518-0000 >> I1002 20:44:43.708709 1521 status_update_manager.cpp:373] Forwarding status >> update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> 20140919-224934-1593967114-5050-1518-0000 to [email protected]:5050 >> I1002 20:44:43.728991 1526 status_update_manager.cpp:398] Received status >> update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task >> serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework >> 20140919-224934-1593967114-5050-1518-0000 >> I1002 20:47:05.904324 1527 slave.cpp:2538] Monitoring executor >> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework >> '20140919-224934-1593967114-5050-1518-0000' in container >> '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' >> I1002 20:47:06.311027 1525 slave.cpp:1733] Got registration for executor >> 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework >> 20140919-224934-1593967114-5050-1518-0000 from executor(1)@10.2.1.34:29920 >> >> I'll typically see a barrage of these in association with a Marathon app >> update (which deploys new tasks). Eventually, one container "sticks" and we >> get a RUNNING task instead of a KILLED one. >> >> Where else can I look?

