[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-31 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622873#comment-15622873
 ] 

Yan Xu commented on MESOS-6489:
---

+1 on passing a single cgroup to Destroyer.

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups
>Reporter: Jie Yu
>  Labels: cgroups
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-28 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617022#comment-15617022
 ] 

Jie Yu commented on MESOS-6489:
---

+1 on moving all destroy logic to Destroyer

1. I suggest we pass a single 'cgroup' to Destroyer as well as TasksKiller and 
let them decide how to deal with nested containers. I.e., let them call 
cgroups::get(), instead of doing that in top level cgroups::destroy. This 
semantics is also pretty clear: destroy this cgroup (or kill all tasks in a 
cgroup) for me, no matter if there are nested cgroups or not. More importantly, 
this allows us to extend the support to destroy a cgroup if we run docker 
daemon inside a container. (i.e., from top down).

2. Destroyer will call TasksKiller first trying to kill all tasks first, 
including tasks in nested container. It can recursively instantiate TasksKiller 
for nested cgroups. In that way, it can do top down easily. This will be useful 
for the daemon in Mesos container case because we no longer need 
`continueOnError` if no third party entity is modifying the cgroup from outside.

3. To support you guy's use case, we need to add a hack in 'Destroyer' as you 
suggested (continueOnError).

4. then, as YanX suggest, add a repair in caller.

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-27 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613942#comment-15613942
 ] 

Yan Xu commented on MESOS-6489:
---

A slightly different proposal for discussion:

h4. Part one
Give {{Future cgroups::destroy(const string& hierarchy, const string& 
cgroup)}} a {{bool continueOnError = true}} argument. In it:

1. Extract all cgroups (including sub cgroups) in bottom up fashion via 
{{cgroups::get(hierarchy, cgroup)}}
2. Hand over things to the {{Destroyer}} (i.e., don't remove cgroups in 
{{cgroups::destroy}} itself. Let {{Destroyer}} abstract this part out, it still 
does its job without a freezer subsystem)
3. Destroyer first tries to kill tasks. If freezer subsystem is available, 
it'll kill the tasks using TaskKillers; if not this step is a noop. 
4. TaskKiller for an individual cgroup will fail if it the cgroup is missing. 
It's OK if we let it fail. We don't need to change the logic here.
5. The Destroyer is given a {{bool continueOnError}} mode, if {{true}}, it 
{{awaits}} for all futures (instead of {{collect}}) and tries to remove cgroups 
recursively regardless of failed futures. This is safe because if there are 
running tasks in a cgroup, {{remove()}} would just fail. We are also not 
singling out "if somebody has removed cgroup in a race condition" but rather 
giving Destroyer a more aggressive mode.

If Docker removed a nested cgroup and caused an error, we still propagate the 
error after a more aggressive cleanup which still ends up destroying 
everything. The caller can decide how to deal with it the error.

h4. Part two
{{repair}} the Future returned by {{cgroups::destroy(...)}} in 
{{LinuxLauncherProcess::destroy(...)}}:

With the above, the "hacky" part is left to {{Future 
LinuxLauncherProcess::destroy(const ContainerID& containerId)}} where you can 
repair the destroy:

{code:title=}
return cgroups::destroy(
  freezerHierarchy,
  cgroup(container->id),
  cgroups::DESTROY_TIMEOUT)
  .repair([](const Future& result) {
// Comments explaining this.
return cgroups::exists(cgroup(container->id)) ? result : Nothing(); 
  });
{code}

--- 

As a separate improvement, we can freeze the cgroups top-down by reversing the 
list returned by {{cgroups::get()}} when launching {{TaskKillers}}. 
{{TaskKillers}} currently in parallel but it should be OK.

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.

2016-10-27 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611011#comment-15611011
 ] 

Anindya Sinha commented on MESOS-6489:
--

Jotting down some thoughts based on our previous conversation:

In {{Future destroy(const string& hierarchy, const string& cgroup)}}

1. Extract all cgroups (including sub cgroups) in bottom up fashion via 
{{cgroups::get(hierarchy, cgroup)}}

2. If freezer is available:
2a. We use {{TasksKiller}} to freeze cgroups, {{SIGKILL}} all tasks, and thaw 
cgroups (may be in top down fashion). However, we add a new attribute to this 
class {{bool ignoreMissingCgroup}}. If that is set, we ignore any error for 
cgroups that do not exist in {{TasksKiller::finished()}}.
2b. At this point, we remove the cgroups in bottom up fashion incase there is 
no error reported in {{TasksKiller}}. We bail out as an error if there is any 
failure in removal of cgroups. Similar to step #2a, we ignore errors for 
cgroups that do not exist.

3. If freezer is unavailable, we remove the cgroups starting from bottom up 
using {{cgroups::remove(hierarchy, cgroup)}}. If remove fails due to 
non-presence of the cgroup, we ignore that failure,

We will have the "ignore error due to missing cgroup" in 2 places, viz. 
{{TasksKiller::finished()}} and in {{cgroups::destroy}}

> Better support for containers that want to manage their own cgroup.
> ---
>
> Key: MESOS-6489
> URL: https://issues.apache.org/jira/browse/MESOS-6489
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Some containers want to manage their cgroup by sub-dividing the cgroup that 
> Mesos allocates to them into multiple sub-cgroups and put subprocess into the 
> corresponding sub-cgroups.
> For instance, someone wants to run Docker daemon in a Mesos container. Docker 
> daemon will manage the cgroup assigned to it by Mesos (with the help , for 
> example, cgroups namespace).
> Problems arise during the teardown of the container because two entities 
> might be manipulating the same cgroup simultaneously. For example, the Mesos 
> cgroups::destroy might fail if the task running inside is trying to delete 
> the same nested cgroup at the same time.
> To support that case, we should consider kill all the processes in the Mesos 
> cgroup first, making sure that no one will be creating sub-cgroups and moving 
> new processes into sub-cgroups. And then, destroy the cgroups recursively.
> And we need freezer because we want to make sure all processes are stopped 
> while we are sending kill signals to avoid TOCTTOU race problem. I think it 
> makes more sense to freezer the cgroups (and sub-cgroups) from top down 
> (rather than bottom up because typically, processes in the parent cgroup 
> manipulate sub-cgroups).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)