[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2018-08-27 Thread Joseph Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594059#comment-16594059
 ] 

Joseph Wu commented on MESOS-7386:
--

I believe the problem still exists.  When the {{mesos-docker-executor}} exits 
prematurely for any reason (like someone manually killing the executor), it 
will not have the chance to stop the associated docker container.

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192698963days in the future
> slave-one_1  | I0413 12:45:18.697698  8994 status_update_manager.cpp:285] 
> Closing status update streams for framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> {noformat}
> Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running
> {noformat}
> roo

[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2018-08-27 Thread Greg Mann (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594047#comment-16594047
 ] 

Greg Mann commented on MESOS-7386:
--

[~r4um] [~kaysoky] do you guys know if this is still an issue? Came across the 
open PR while reviewing Github today.

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192698963days in the future
> slave-one_1  | I0413 12:45:18.697698  8994 status_update_manager.cpp:285] 
> Closing status update streams for framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> {noformat}
> Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running
> {noformat}
> root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182
> 8b4dd2ab340dr4um/msg

[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2018-07-21 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551912#comment-16551912
 ] 

ASF GitHub Bot commented on MESOS-7386:
---

GitHub user r4um opened a pull request:

https://github.com/apache/mesos/pull/304

[MESOS-7386] Do a stop if container is not killed

Fixes [MESOS-7386](https://issues.apache.org/jira/browse/MESOS-7386)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/r4um/mesos master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/mesos/pull/304.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #304


commit 933cd229119925d8196149cbe6f775434b6e2cfa
Author: Pranay Kanwar 
Date:   2018-07-22T06:41:14Z

[MESOS-7386] Do a stop if container is not killed




> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3

[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2017-06-28 Thread Pranay Kanwar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067662#comment-16067662
 ] 

Pranay Kanwar commented on MESOS-7386:
--

Changing 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L2213
 to {{if (!killed) {}}
seems to fix the problem (usual launch/scale up/down) and all tests pass, since 
executor exiting at 
https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L2393
passes {{false}} to {{destroy}} none of code paths in {{destroy}} and 
subsequent functions ({{_destroy}} etc) seem do a stop or kill

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192698963days in the future
> slave-one_1  | I0413 12:45:18.697698  8

[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2017-04-13 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968370#comment-15968370
 ] 

Joseph Wu commented on MESOS-7386:
--

This is probably due to a lack of a {{SIGHUP}} handler in the 
{{mesos-docker-executor}} helper binary.  When you kill the logger helper 
processes, this closes the stdout/stderr streams of the 
{{mesos-docker-executor}}, which then immediately exits.

In certain codepaths, the Docker containerizer monitors the 
{{mesos-docker-executor}} rather than the docker container.  If the executor 
exits, the containerizer does not necessarily know if a docker container was 
actually created (and hence, does not try to clean it up).

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  f

[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed

2017-04-13 Thread Pranay Kanwar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968266#comment-15968266
 ] 

Pranay Kanwar commented on MESOS-7386:
--

FYI can be reproduced without running slave/agent in docker we faced it in 
environments where agent isn't running in docker.

> Executor not cleaning up existing running docker containers if external 
> logrotate/logger processes die/killed
> -
>
> Key: MESOS-7386
> URL: https://issues.apache.org/jira/browse/MESOS-7386
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, docker, executor
>Affects Versions: 0.28.2, 1.2.0
> Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon 
> v1.1.2/v1.4.2 , ubuntu trusty 14.04, 
> org_apache_mesos_LogrotateContainerLogger, 
> org_apache_mesos_ExternalContainerLogger
>Reporter: Pranay Kanwar
>Priority: Critical
>
> if mesos-logrorate/external logger processes die/killed executor exits / task 
> fails and task is relaunched , but is unable to cleanup existing running 
> container.
> Logs 
> {noformat}
> slave-one_1  | I0413 12:45:17.707762  8989 status_update_manager.cpp:395] 
> Received status update acknowledgement (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:17.707813  8989 status_update_manager.cpp:832] 
> Checkpointing ACK for status update TASK_FAILED (UUID: 
> 7262c443-e201-45f4-8de0-825d3d92c26b) for task 
> msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.615839  8991 slave.cpp:4388] Got exited event 
> for executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.696413  8987 docker.cpp:2358] Executor for 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited
> slave-one_1  | I0413 12:45:18.696446  8987 docker.cpp:2052] Destroying 
> container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.696482  8987 docker.cpp:2179] Running docker 
> stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182
> slave-one_1  | I0413 12:45:18.697042  8994 slave.cpp:4769] Executor 
> 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0
> slave-one_1  | I0413 12:45:18.697077  8994 slave.cpp:4869] Cleaning up 
> executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471
> slave-one_1  | I0413 12:45:18.697424  8994 slave.cpp:4957] Cleaning up 
> framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> slave-one_1  | I0413 12:45:18.697530  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192952593days in the future
> slave-one_1  | I0413 12:45:18.697572  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192882963days in the future
> slave-one_1  | I0413 12:45:18.697607  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182'
>  for gc 6.9192843852days in the future
> slave-one_1  | I0413 12:45:18.697628  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5'
>  for gc 6.9192808889days in the future
> slave-one_1  | I0413 12:45:18.697649  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192731556days in the future
> slave-one_1  | I0413 12:45:18.697670  8994 gc.cpp:55] Scheduling 
> '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-'
>  for gc 6.9192698963days in the future
> slave-one_1  | I0413 12:45:18.697698  8994 status_update_manager.cpp:285] 
> Closing status update streams for framework 
> d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-
> {noformat}
> Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running
> {noformat}
> root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182
> 8b4dd2ab340dr