[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594059#comment-16594059 ] Joseph Wu commented on MESOS-7386: -- I believe the problem still exists. When the {{mesos-docker-executor}} exits prematurely for any reason (like someone manually killing the executor), it will not have the chance to stop the associated docker container. > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192698963days in the future > slave-one_1 | I0413 12:45:18.697698 8994 status_update_manager.cpp:285] > Closing status update streams for framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > {noformat} > Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running > {noformat} > roo
[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594047#comment-16594047 ] Greg Mann commented on MESOS-7386: -- [~r4um] [~kaysoky] do you guys know if this is still an issue? Came across the open PR while reviewing Github today. > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192698963days in the future > slave-one_1 | I0413 12:45:18.697698 8994 status_update_manager.cpp:285] > Closing status update streams for framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > {noformat} > Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running > {noformat} > root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182 > 8b4dd2ab340dr4um/msg
[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551912#comment-16551912 ] ASF GitHub Bot commented on MESOS-7386: --- GitHub user r4um opened a pull request: https://github.com/apache/mesos/pull/304 [MESOS-7386] Do a stop if container is not killed Fixes [MESOS-7386](https://issues.apache.org/jira/browse/MESOS-7386) You can merge this pull request into a Git repository by running: $ git pull https://github.com/r4um/mesos master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/mesos/pull/304.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #304 commit 933cd229119925d8196149cbe6f775434b6e2cfa Author: Pranay Kanwar Date: 2018-07-22T06:41:14Z [MESOS-7386] Do a stop if container is not killed > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3
[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16067662#comment-16067662 ] Pranay Kanwar commented on MESOS-7386: -- Changing https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L2213 to {{if (!killed) {}} seems to fix the problem (usual launch/scale up/down) and all tests pass, since executor exiting at https://github.com/apache/mesos/blob/master/src/slave/containerizer/docker.cpp#L2393 passes {{false}} to {{destroy}} none of code paths in {{destroy}} and subsequent functions ({{_destroy}} etc) seem do a stop or kill > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192698963days in the future > slave-one_1 | I0413 12:45:18.697698 8
[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968370#comment-15968370 ] Joseph Wu commented on MESOS-7386: -- This is probably due to a lack of a {{SIGHUP}} handler in the {{mesos-docker-executor}} helper binary. When you kill the logger helper processes, this closes the stdout/stderr streams of the {{mesos-docker-executor}}, which then immediately exits. In certain codepaths, the Docker containerizer monitors the {{mesos-docker-executor}} rather than the docker container. If the executor exits, the containerizer does not necessarily know if a docker container was actually created (and hence, does not try to clean it up). > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > f
[jira] [Commented] (MESOS-7386) Executor not cleaning up existing running docker containers if external logrotate/logger processes die/killed
[ https://issues.apache.org/jira/browse/MESOS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968266#comment-15968266 ] Pranay Kanwar commented on MESOS-7386: -- FYI can be reproduced without running slave/agent in docker we faced it in environments where agent isn't running in docker. > Executor not cleaning up existing running docker containers if external > logrotate/logger processes die/killed > - > > Key: MESOS-7386 > URL: https://issues.apache.org/jira/browse/MESOS-7386 > Project: Mesos > Issue Type: Bug > Components: agent, docker, executor >Affects Versions: 0.28.2, 1.2.0 > Environment: Mesos 0.28.2/1.2.0, docker 1.12.0/17.04.0-ce, marathon > v1.1.2/v1.4.2 , ubuntu trusty 14.04, > org_apache_mesos_LogrotateContainerLogger, > org_apache_mesos_ExternalContainerLogger >Reporter: Pranay Kanwar >Priority: Critical > > if mesos-logrorate/external logger processes die/killed executor exits / task > fails and task is relaunched , but is unable to cleanup existing running > container. > Logs > {noformat} > slave-one_1 | I0413 12:45:17.707762 8989 status_update_manager.cpp:395] > Received status update acknowledgement (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:17.707813 8989 status_update_manager.cpp:832] > Checkpointing ACK for status update TASK_FAILED (UUID: > 7262c443-e201-45f4-8de0-825d3d92c26b) for task > msg.dfb155bc-2046-11e7-8019-02427fa1c4d5 of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.615839 8991 slave.cpp:4388] Got exited event > for executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.696413 8987 docker.cpp:2358] Executor for > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 has exited > slave-one_1 | I0413 12:45:18.696446 8987 docker.cpp:2052] Destroying > container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.696482 8987 docker.cpp:2179] Running docker > stop on container 665e86c8-ef36-4be3-b56e-3ba7edc81182 > slave-one_1 | I0413 12:45:18.697042 8994 slave.cpp:4769] Executor > 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- exited with status 0 > slave-one_1 | I0413 12:45:18.697077 8994 slave.cpp:4869] Cleaning up > executor 'msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' of framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- at executor(1)@172.17.0.1:36471 > slave-one_1 | I0413 12:45:18.697424 8994 slave.cpp:4957] Cleaning up > framework d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > slave-one_1 | I0413 12:45:18.697530 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192952593days in the future > slave-one_1 | I0413 12:45:18.697572 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192882963days in the future > slave-one_1 | I0413 12:45:18.697607 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5/runs/665e86c8-ef36-4be3-b56e-3ba7edc81182' > for gc 6.9192843852days in the future > slave-one_1 | I0413 12:45:18.697628 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-/executors/msg.dfb155bc-2046-11e7-8019-02427fa1c4d5' > for gc 6.9192808889days in the future > slave-one_1 | I0413 12:45:18.697649 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192731556days in the future > slave-one_1 | I0413 12:45:18.697670 8994 gc.cpp:55] Scheduling > '/tmp/mesos/agent/meta/slaves/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-S0/frameworks/d1d616b4-1ed1-4fed-92e5-0ee3d8619be9-' > for gc 6.9192698963days in the future > slave-one_1 | I0413 12:45:18.697698 8994 status_update_manager.cpp:285] > Closing status update streams for framework > d1d616b4-1ed1-4fed-92e5-0ee3d8619be9- > {noformat} > Container 665e86c8-ef36-4be3-b56e-3ba7edc81182 is still running > {noformat} > root@orobas:/# docker ps | grep 665e86c8-ef36-4be3-b56e-3ba7edc81182 > 8b4dd2ab340dr