Thanks, I just had the same thought

I'm injecting it via environment variable:
MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins

but I don't know how to check that the setting took


On Thu, Oct 2, 2014 at 2:24 PM, Dick Davies <[email protected]> wrote:

> One thing to check - have you upped
>
> --executor_registration_timeout
>
> from the default of 1min? a docker pull can easily take longer than that.
>
> On 2 October 2014 22:18, Michael Babineau <[email protected]>
> wrote:
> > I'm seeing an issue where tasks are being marked as killed but remain
> > running. The tasks all run via the native Docker containerizer and are
> > started from Marathon.
> >
> > The net result is additional, orphaned Docker containers that must be
> > stopped/removed manually.
> >
> > Versions:
> > - Mesos 0.20.1
> > - Marathon 0.7.1
> > - Docker 1.2.0
> > - Ubuntu 14.04
> >
> > Environment:
> > - 3 ZK nodes, 3 Mesos Masters, and 3 Mesos Slaves (all separate
> instances)
> > on EC2
> >
> > Here's the task in the Mesos UI:
> >
> > (note that stderr continues to update with the latest container output)
> >
> > Here's the still-running Docker container:
> > $ docker ps|grep 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
> > 3d451b8213ea
> >
> docker.thefactory.com/ace-serialization:f7aa1d4f46f72d52f5a20ef7ae8680e4acf88bc0
> > "\"/bin/sh -c 'java    26 minutes ago      Up 26 minutes       9990/tcp
> > mesos-1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f
> >
> > Here are the Mesos logs associated with the task:
> > $ grep eda431d7-4a74-11e4-a320-56847afe9799
> /var/log/mesos/mesos-slave.INFO
> > I1002 20:44:39.176024  1528 slave.cpp:1002] Got assigned task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
> > 20140919-224934-1593967114-5050-1518-0000
> > I1002 20:44:39.176257  1528 slave.cpp:1112] Launching task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 for framework
> > 20140919-224934-1593967114-5050-1518-0000
> > I1002 20:44:39.177287  1528 slave.cpp:1222] Queuing task
> > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' for executor
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > '20140919-224934-1593967114-5050-1518-0000
> > I1002 20:44:39.191769  1528 docker.cpp:743] Starting container
> > '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f' for task
> > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' (and executor
> > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799') of framework
> > '20140919-224934-1593967114-5050-1518-0000'
> > I1002 20:44:43.707033  1521 slave.cpp:1278] Asked to kill task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > 20140919-224934-1593967114-5050-1518-0000
> > I1002 20:44:43.707811  1521 slave.cpp:2088] Handling status update
> > TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > 20140919-224934-1593967114-5050-1518-0000 from @0.0.0.0:0
> > W1002 20:44:43.708273 1521 slave.cpp:1354] Killing the unregistered
> > executor 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of
> framework
> > 20140919-224934-1593967114-5050-1518-0000 because it has no tasks
> > E1002 20:44:43.708375  1521 slave.cpp:2205] Failed to update resources
> for
> > container 1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f of executor
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 running task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 on status update for
> > terminal task, destroying container: No container found
> > I1002 20:44:43.708524  1521 status_update_manager.cpp:320] Received
> status
> > update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > 20140919-224934-1593967114-5050-1518-0000
> > I1002 20:44:43.708709  1521 status_update_manager.cpp:373] Forwarding
> status
> > update TASK_KILLED (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > 20140919-224934-1593967114-5050-1518-0000 to [email protected]:5050
> > I1002 20:44:43.728991  1526 status_update_manager.cpp:398] Received
> status
> > update acknowledgement (UUID: 4f5bd9f9-0625-43de-81f6-2c3423b1ce12) for
> task
> > serialization.eda431d7-4a74-11e4-a320-56847afe9799 of framework
> > 20140919-224934-1593967114-5050-1518-0000
> > I1002 20:47:05.904324  1527 slave.cpp:2538] Monitoring executor
> > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
> > '20140919-224934-1593967114-5050-1518-0000' in container
> > '1d337fa3-8dd3-4b43-9d1e-a774cbcbc22f'
> > I1002 20:47:06.311027  1525 slave.cpp:1733] Got registration for executor
> > 'serialization.eda431d7-4a74-11e4-a320-56847afe9799' of framework
> > 20140919-224934-1593967114-5050-1518-0000 from executor(1)@
> 10.2.1.34:29920
> >
> > I'll typically see a barrage of these in association with a Marathon app
> > update (which deploys new tasks). Eventually, one container "sticks" and
> we
> > get a RUNNING task instead of a KILLED one.
> >
> > Where else can I look?
>

Reply via email to