[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-19 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589394#comment-15589394
 ] 

Anindya Sinha commented on MESOS-6414:
--

Let us assume a task is launched which creates a sub-cgroup through an external 
service. So, the cgroup hierarchy is something like:
/sys/fs/cgroup/freezer/mesos//

Say the task fails, so the container exits, and when launcher->destroy() is 
called, we do a recursive cgroups::get() to get all cgroups and we get absolute 
paths for both  as well as . And then the 
TasksKiller() is initiated for  as well as  resulting 
in freeze(), thaw(), etc. for each of them in parallel, followed by a killed().

However, since the  is created by an external service, that service 
may do a cleanup of  without Mesos' knowledge.  If that happens, 
any of the cleanup operations (freeze(), thaw(), etc) for the  may 
fail in the flow of TasksKiller() for the  (since the external 
service removed /sys/fs/cgroup/freezer/mesos// before 
Mesos could do a cleanup in TasksKiller). As a result, we exit out of cleanup 
of  at that point which seems incorrect since all cleanup has 
actually happened.

To avoid this issue (ie. race of cleanup of  between the external 
service and Mesos), I am suggesting to treat failure in any of these steps as a 
failure for all cases except when the failure is due to non-existence of 
 (ie. it has already been cleaned up by an external service, so we 
treat this as a success).



> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589364#comment-15589364
 ] 

haosdent commented on MESOS-6414:
-

hi, [~gilbert] I chat with [~anindya.sinha] before. He means the cgroups 
destroy racing between docker daemon and mesos agent if launch docker in the 
mesos container. 
Let me update the ticket. 

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-19 Thread Gilbert Song (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589302#comment-15589302
 ] 

Gilbert Song commented on MESOS-6414:
-

[~anindya.sinha], would you mind providing more context about why you want a 
mesos task launched in a cgroup which is not created by mesos? The 
LinuxLauncher::destroy() would clean up all cgroups which are created by 
fork(). It assumes all cgroups under the freezerhierachy are previously created 
by Mesos.

Or as [~haosd...@gmail.com] mentioned, are  you asking for cgroup namespace 
support?

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587476#comment-15587476
 ] 

haosdent commented on MESOS-6414:
-

In additional, may you share the details about how to reproduce this? I would 
like to verify if cgroups namespace could resolve this or not [~anindya.sinha]

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587469#comment-15587469
 ] 

haosdent commented on MESOS-6414:
-

I think [~anindya.sinha] means his tasks would create cgroup during running. 
But those cgroups created by user tasks would be clean up by  
{{LinuxLauncherProcess::destroy()}}. I think the correct way to fix this 
problem is to use cgroups namespaces. 

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6414) Task cleanup fails when the containers includes cgroups not owned by Mesos

2016-10-18 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586919#comment-15586919
 ] 

Jie Yu commented on MESOS-6414:
---

I cannot fully understand the problem. What do you mean by " a mesos task is 
launched in a cgroup outside of the context of Mesos"?

> Task cleanup fails when the containers includes cgroups not owned by Mesos
> --
>
> Key: MESOS-6414
> URL: https://issues.apache.org/jira/browse/MESOS-6414
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>Priority: Minor
>
> If a mesos task is launched in a cgroup outside of the context of Mesos,  
> Mesos is unaware of that cgroup created in the task context.
> Now when the Mesos task terminates: Mesos tries to cleanup all cgroups within 
> the top level cgroup it knows about. If the cgroup created in the task 
> context exists when LinuxLauncherProcess::destroy() is called but is 
> eventually cleaned up by the container before we do a freeze() or thaw() or 
> remove(), it fails at those stages leading to an incomplete cleanup of the 
> container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)