I should say finally I found a way to clean orphans containers. I learnt the executor will not remove its container when the task complete. Executor will stop the container and exit. The container will in exit state and stay in the slave machine until --docker_remove_delay. I set --docker_remove_delay="1mins", and restarted the slave, and killed an executor process. After 1 minute, the container left by the killed executor removed. This may not be a good way to solve my problem. But it do. Thank you haosdent. Thank you for your help. [😊]
________________________________ From: haosdent <[email protected]> Sent: Wednesday, March 23, 2016 11:17 AM To: user Subject: Re: About executor failover But I think we could make sure docker container exit when kill executor. If you have clear requirements, could you fill it in https://issues.apache.org/jira/browse/MESOS So other folks could help check whether it should be accepted or not. [https://issues.apache.org/jira/secure/projectavatar?pid=12311242&avatarId=17056&size=large]<https://issues.apache.org/jira/browse/MESOS> Mesos - ASF JIRA - issues.apache.org<https://issues.apache.org/jira/browse/MESOS> issues.apache.org A list of upcoming versions. Click on the row to display issues for that version. On Wed, Mar 23, 2016 at 7:14 PM, haosdent <[email protected]<mailto:[email protected]>> wrote: As I know, could not know orphan containers in framework now. On Wed, Mar 23, 2016 at 6:50 PM, 琪 冯 <[email protected]<mailto:[email protected]>> wrote: Many thanks for reply! I learnt the orphans containers were removed by the slave recovery. I mean, is there anything I can do from the framework, or some other monitors to remove or detect them automatically. Thanks for your helps. ________________________________ From: haosdent <[email protected]<mailto:[email protected]>> Sent: Wednesday, March 23, 2016 3:22 AM To: user Subject: Re: About executor failover Yes, in that case, these orphans containers would be recovered or killed when you restart slave. On Wed, Mar 23, 2016 at 11:13 AM, ? ? <[email protected]<mailto:[email protected]>> wrote: What if the executor process down with its docker container still alive? As I tested, I killed an executor process in one of my mesos slave machines, the process detail just like: root 17166 9569 0 Mar22 ? 00:01:39 mesos-docker-executor --container=mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa --docker=docker --docker_socket=/var/run/docker.sock --help=false --mapped_directory=/mnt/mesos/sandbox --sandbox_directory=/data/mesos/slaves/0d58cb85-e726-479a-a57a-83405e3ae580-S3/frameworks/5cfc9845-05c0-45b1-acc0-595ab92075d2-0000/executors/archtools_hearthstone.eless_eless.uwsgi.353f920b-eff6-11e5-97d3-aeb4726ea116/runs/b995031b-9c46-4713-9050-518aa306c6aa --stop_timeout=0ns The I checked the container with name "mesos-0d58cb85-e726-479a-a57a-83405e3ae580-S3.b995031b-9c46-4713-9050-518aa306c6aa" was still alive. My mesos version is 0.25.0. And the mesos slave machine kernel version is Linux 3.10.0-229.11.1.el7.x86_64. I mean if executor process crashed/killed for whatever reasons(but the container is alive), a new container will launch for the task_lost event. So a container created by the dead executor process would be undiscoverable to my framework. I want to know if I am wrong, or there is a way to handle this scenario. I hope my question is clear, if not, please let me know. Any feedback would be appreciated. [😊] -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang

