Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Michael Babineau
I'm seeing an issue where tasks are being marked as killed but remain running. The tasks all run via the native Docker containerizer and are started from Marathon. The net result is additional, orphaned Docker containers that must be stopped/removed manually. Versions: - Mesos 0.20.1 - Marathon

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Dick Davies
One thing to check - have you upped --executor_registration_timeout from the default of 1min? a docker pull can easily take longer than that. On 2 October 2014 22:18, Michael Babineau michael.babin...@gmail.com wrote: I'm seeing an issue where tasks are being marked as killed but remain

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Michael Babineau
Thanks, I just had the same thought I'm injecting it via environment variable: MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins but I don't know how to check that the setting took On Thu, Oct 2, 2014 at 2:24 PM, Dick Davies d...@hellooperator.net wrote: One thing to check - have you upped

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Michael Babineau
Supporting the registration timeout theory, logs for this example container confirm it didn't actually start until several minutes after Mesos had marked the task as killed. On Thu, Oct 2, 2014 at 2:29 PM, Michael Babineau michael.babin...@gmail.com wrote: Thanks, I just had the same thought

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Michael Babineau
Okay, I don't think the issue is with the executor registration timeout. The timeout parameter is being passed correctly, and there is only a 4 second delay between task start and task kill: I1002 *20:44:39.176024* 1528 slave.cpp:1002] Got assigned task

Re: Orphaned Docker containers in Mesos 0.20.1

2014-10-02 Thread Connor Doyle
It doesn't appear to be related to the registration timeout; based on the logs the time between task launch and kill was only about 4.3 seconds. -- Connor On Oct 2, 2014, at 14:24, Dick Davies d...@hellooperator.net wrote: One thing to check - have you upped