[jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped

2016-06-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15337520#comment-15337520
 ] 

haosdent commented on MESOS-5188:
-

Looks like not an issue of 1.0.0, let me remove the fix version. [~liqlin] 

> docker executor thinks task is failed when docker container was stopped
> ---
>
> Key: MESOS-5188
> URL: https://issues.apache.org/jira/browse/MESOS-5188
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
>Reporter: Liqiang Lin
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER IDIMAGE   COMMAND CREATED   
>   STATUS  PORTS   NAMES
> b4813ba3ed4dubuntu  "sleep 300" 9 seconds ago 
>   Up 8 seconds
> mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 
> for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for 
> framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
>  to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 
> of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for 
> executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received 
> within 75secs
> I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max 
> allowed age: 2.342613645432778days
> I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master
> I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at 
> master@192.168.56.110:5050
> I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. 
> Attempting to register without authentication
> I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor 
> '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-'
> I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master 
> master@192.168.56.110:5050
> I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed 
> resources
> I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to 
> scheduler(1)@192.168.56.110:53375
> I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources 
> from  to
> I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor 
> '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from 
> executor(1)@192.168.56.110:40725
> I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' with resources passed to update is 
> identical to existing resources
> I0407 09:12:20.585093 32307 slave.cpp:2010] Sending queued task 
> '99ee7dc74861' to executor '99ee7dc74861' of framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- at executor(1)@192.168.56.110:40725
> I0407 09:12:21.307077 32312 slave.cpp:3195] Handling status update 
> TASK_RUNNING (UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for task 
> 99ee7dc74861 of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from 
> 

[jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped

2016-04-13 Thread Liqiang Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239074#comment-15239074
 ] 

Liqiang Lin commented on MESOS-5188:


I actually used docker containerizer {code}--containerizers=docker,mesos{code} 
in my case, rather than Mesos containerizer you said. And I also debugged Mesos 
Docker containerizer {code}DockerContainerizerProcess::launch(...){code} to 
verify build-in docker executor is created instead of customised executor 
started in a docker container.

Start default docker executor if taskInfo is set:
{code}
  if (taskInfo.isSome() && flags.docker_mesos_image.isNone()) {
// Launching task by forking a subprocess to run docker executor.
// TODO(steveniemitz): We should call 'update' to set CPU/CFS/mem
// quotas after 'launchExecutorProcess'. However, there is a race
// where 'update' can be called before mesos-docker-executor
// creates the Docker container for the task. See more details in
// the comments of r33174.
return container.get()->launch = fetch(containerId, slaveId)
  .then(defer(self(), [=]() { return pull(containerId); }))
  .then(defer(self(), [=]() {
return mountPersistentVolumes(containerId);
  }))
  .then(defer(self(), [=]() { return launchExecutorProcess(containerId); }))
  .then(defer(self(), [=](pid_t pid) {
return reapExecutor(containerId, pid);
  }));
  }
{code}

Start custom executor in a docker container:
{code}
return container.get()->launch = fetch(containerId, slaveId)
.then(defer(self(), [=]() { return pull(containerId); }))
.then(defer(self(), [=]() {
  return mountPersistentVolumes(containerId);
}))
.then(defer(self(), [=]() {
  return launchExecutorContainer(containerId, containerName);
}))
.then(defer(self(), [=](const Docker::Container& dockerContainer) {
  // Call update to set CPU/CFS/mem quotas at launch.
  // TODO(steveniemitz): Once the minimum docker version supported
  // is >= 1.7 this can be changed to pass --cpu-period and
  // --cpu-quota to the 'docker run' call in
  // launchExecutorContainer.
  return update(containerId, executorInfo.resources(), true)
.then([=]() {
  return Future(dockerContainer);
});
}))
.then(defer(self(), [=](const Docker::Container& dockerContainer) {
  return checkpointExecutor(containerId, dockerContainer);
}))
.then(defer(self(), [=](pid_t pid) {
  return reapExecutor(containerId, pid);
}));
{code}

Will investigate more about root cause of this problem.

> docker executor thinks task is failed when docker container was stopped
> ---
>
> Key: MESOS-5188
> URL: https://issues.apache.org/jira/browse/MESOS-5188
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
>Reporter: Liqiang Lin
> Fix For: 0.29.0
>
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER IDIMAGE   COMMAND CREATED   
>   STATUS  PORTS   NAMES
> b4813ba3ed4dubuntu  "sleep 300" 9 seconds ago 
>   Up 8 seconds
> mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 
> for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for 
> framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
>  to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 
> of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for 
> executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received 
> within 75secs
> I0407 

[jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped

2016-04-13 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238900#comment-15238900
 ] 

Jan Schlicht commented on MESOS-5188:
-

The executor does not stop the docker container in your particular case 
(running a docker container manually with Swarm).

The "docker executor code" you've shown above would be used if you'd have 
chosen the Mesos Docker containerizer ({{--containerizers=docker}}, instead of 
the default containerizer {{--containerizers=mesos}}. The Mesos Docker 
containerizer would take care of starting and stopping docker tasks and that's 
what you see in the above source code.

In your particular case the default Mesos containerizer is used that starts a 
process (your {{docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu 
sleep 300}}), waits for it to finish and uses the return code of the process to 
determine if the process finished successful or not. Your manual {{docker 
stop}} causes the process, monitored by the executor, to terminate with an 
return code != 0, hence the {{TASK_FAILED}}.

> docker executor thinks task is failed when docker container was stopped
> ---
>
> Key: MESOS-5188
> URL: https://issues.apache.org/jira/browse/MESOS-5188
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
>Reporter: Liqiang Lin
> Fix For: 0.29.0
>
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER IDIMAGE   COMMAND CREATED   
>   STATUS  PORTS   NAMES
> b4813ba3ed4dubuntu  "sleep 300" 9 seconds ago 
>   Up 8 seconds
> mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 
> for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for 
> framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
>  to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 
> of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for 
> executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received 
> within 75secs
> I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max 
> allowed age: 2.342613645432778days
> I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master
> I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at 
> master@192.168.56.110:5050
> I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. 
> Attempting to register without authentication
> I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor 
> '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-'
> I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master 
> master@192.168.56.110:5050
> I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed 
> resources
> I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to 
> scheduler(1)@192.168.56.110:53375
> I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources 
> from  to
> I0407 09:12:18.592793 32308 

[jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped

2016-04-12 Thread Liqiang Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238353#comment-15238353
 ] 

Liqiang Lin commented on MESOS-5188:


If that's the truth, the executor shall remove the stopped docker container in 
shutting down of executor, rather than try to stop the docker container.

> docker executor thinks task is failed when docker container was stopped
> ---
>
> Key: MESOS-5188
> URL: https://issues.apache.org/jira/browse/MESOS-5188
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
>Reporter: Liqiang Lin
> Fix For: 0.29.0
>
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER IDIMAGE   COMMAND CREATED   
>   STATUS  PORTS   NAMES
> b4813ba3ed4dubuntu  "sleep 300" 9 seconds ago 
>   Up 8 seconds
> mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 
> for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for 
> framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
>  to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 
> of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for 
> executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received 
> within 75secs
> I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max 
> allowed age: 2.342613645432778days
> I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master
> I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at 
> master@192.168.56.110:5050
> I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. 
> Attempting to register without authentication
> I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor 
> '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-'
> I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master 
> master@192.168.56.110:5050
> I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed 
> resources
> I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to 
> scheduler(1)@192.168.56.110:53375
> I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources 
> from  to
> I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor 
> '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from 
> executor(1)@192.168.56.110:40725
> I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' with resources passed to update is 
> identical to existing resources
> I0407 09:12:20.585093 32307 slave.cpp:2010] Sending queued task 
> '99ee7dc74861' to executor '99ee7dc74861' of framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- at executor(1)@192.168.56.110:40725
> I0407 09:12:21.307077 32312 slave.cpp:3195] Handling status update 
> TASK_RUNNING (UUID: a7098650-cbf6-4445-8216-b5f658d2f5f4) for 

[jira] [Commented] (MESOS-5188) docker executor thinks task is failed when docker container was stopped

2016-04-12 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237126#comment-15237126
 ] 

Jan Schlicht commented on MESOS-5188:
-

The executor will use the return code of a finished task to determine whether 
it was successful or not. By running a {{docker stop}} you'll send a 
{{SIGTERM}} to the sleep task. This will result in the sleep task terminating 
with a return code != 0 which is interpreted by the Mesos executor as failed, 
because only a return code == 0 is seen as successful.
Other tasks than {{sleep}} might work, because they could catch that 
{{SIGTERM}} and return gracefully. Hence it depends on the task you're running 
if this will result in TASK_FAILED. Something that cannot be covered by Mesos.

> docker executor thinks task is failed when docker container was stopped
> ---
>
> Key: MESOS-5188
> URL: https://issues.apache.org/jira/browse/MESOS-5188
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.28.0
>Reporter: Liqiang Lin
> Fix For: 0.29.0
>
>
> Test cases:
> 1. Launch a task with Swarm (on Mesos).
> {code}
> # docker -H 192.168.56.110:54375 run -d --cpu-shares 1 ubuntu sleep 300
> {code}
> 2. Then stop the docker container.
> {code}
> # docker -H 192.168.56.110:54375 ps
> CONTAINER IDIMAGE   COMMAND CREATED   
>   STATUS  PORTS   NAMES
> b4813ba3ed4dubuntu  "sleep 300" 9 seconds ago 
>   Up 8 seconds
> mesos1/mesos-2cd5576e-6260-4262-a62c-b0dc45c86c45-S1.1595e79b-aef2-44b6-a313-ad4ff8626958
> # docker -H 192.168.56.110:54375 stop b4813ba3ed4d
> b4813ba3ed4d
> {code}
> 3. Found the task is failed. See Mesos slave log,
> {code}
> I0407 09:10:57.606552 32307 slave.cpp:1508] Got assigned task 99ee7dc74861 
> for framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.608230 32307 slave.cpp:1627] Launching task 99ee7dc74861 for 
> framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:10:57.609979 32307 paths.cpp:528] Trying to chown 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
>  to user 'root'
> I0407 09:10:57.615881 32307 slave.cpp:5586] Launching executor 99ee7dc74861 
> of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- with resources 
> cpus(*):0.1; mem(*):32 in work directory 
> '/var/lib/mesos/slaves/2cd5576e-6260-4262-a62c-b0dc45c86c45-S0/frameworks/5b84aad8-dd60-40b3-84c2-93be6b7aa81c-/executors/99ee7dc74861/runs/250a169f-7aba-474d-a4f5-cd24ecf0e7d9'
> I0407 09:12:18.458449 32307 slave.cpp:1845] Queuing task '99ee7dc74861' for 
> executor '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c-
> I0407 09:12:18.459092 32307 slave.cpp:3711] No pings from master received 
> within 75secs
> I0407 09:12:18.460212 32307 slave.cpp:4593] Current disk usage 56.53%. Max 
> allowed age: 2.342613645432778days
> I0407 09:12:18.463484 32307 slave.cpp:928] Re-detecting master
> I0407 09:12:18.463969 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.464501 32307 slave.cpp:939] New master detected at 
> master@192.168.56.110:5050
> I0407 09:12:18.464848 32307 slave.cpp:964] No credentials provided. 
> Attempting to register without authentication
> I0407 09:12:18.465237 32307 slave.cpp:975] Detecting new master
> I0407 09:12:18.463611 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.465744 32312 status_update_manager.cpp:174] Pausing sending 
> status updates
> I0407 09:12:18.472323 32313 docker.cpp:1011] Starting container 
> '250a169f-7aba-474d-a4f5-cd24ecf0e7d9' for task '99ee7dc74861' (and executor 
> '99ee7dc74861') of framework '5b84aad8-dd60-40b3-84c2-93be6b7aa81c-'
> I0407 09:12:18.588739 32313 slave.cpp:1218] Re-registered with master 
> master@192.168.56.110:5050
> I0407 09:12:18.588927 32313 slave.cpp:1254] Forwarding total oversubscribed 
> resources
> I0407 09:12:18.589320 32313 slave.cpp:2395] Updating framework 
> 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- pid to 
> scheduler(1)@192.168.56.110:53375
> I0407 09:12:18.592079 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:18.592842 32313 slave.cpp:2534] Updated checkpointed resources 
> from  to
> I0407 09:12:18.592793 32308 status_update_manager.cpp:181] Resuming sending 
> status updates
> I0407 09:12:20.582041 32307 slave.cpp:2836] Got registration for executor 
> '99ee7dc74861' of framework 5b84aad8-dd60-40b3-84c2-93be6b7aa81c- from 
> executor(1)@192.168.56.110:40725
> I0407 09:12:20.584446 32307 docker.cpp:1308] Ignoring updating container 
>