[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.
[ https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622873#comment-15622873 ] Yan Xu commented on MESOS-6489: --- +1 on passing a single cgroup to Destroyer. > Better support for containers that want to manage their own cgroup. > --- > > Key: MESOS-6489 > URL: https://issues.apache.org/jira/browse/MESOS-6489 > Project: Mesos > Issue Type: Improvement > Components: cgroups >Reporter: Jie Yu > Labels: cgroups > > Some containers want to manage their cgroup by sub-dividing the cgroup that > Mesos allocates to them into multiple sub-cgroups and put subprocess into the > corresponding sub-cgroups. > For instance, someone wants to run Docker daemon in a Mesos container. Docker > daemon will manage the cgroup assigned to it by Mesos (with the help , for > example, cgroups namespace). > Problems arise during the teardown of the container because two entities > might be manipulating the same cgroup simultaneously. For example, the Mesos > cgroups::destroy might fail if the task running inside is trying to delete > the same nested cgroup at the same time. > To support that case, we should consider kill all the processes in the Mesos > cgroup first, making sure that no one will be creating sub-cgroups and moving > new processes into sub-cgroups. And then, destroy the cgroups recursively. > And we need freezer because we want to make sure all processes are stopped > while we are sending kill signals to avoid TOCTTOU race problem. I think it > makes more sense to freezer the cgroups (and sub-cgroups) from top down > (rather than bottom up because typically, processes in the parent cgroup > manipulate sub-cgroups). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.
[ https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617022#comment-15617022 ] Jie Yu commented on MESOS-6489: --- +1 on moving all destroy logic to Destroyer 1. I suggest we pass a single 'cgroup' to Destroyer as well as TasksKiller and let them decide how to deal with nested containers. I.e., let them call cgroups::get(), instead of doing that in top level cgroups::destroy. This semantics is also pretty clear: destroy this cgroup (or kill all tasks in a cgroup) for me, no matter if there are nested cgroups or not. More importantly, this allows us to extend the support to destroy a cgroup if we run docker daemon inside a container. (i.e., from top down). 2. Destroyer will call TasksKiller first trying to kill all tasks first, including tasks in nested container. It can recursively instantiate TasksKiller for nested cgroups. In that way, it can do top down easily. This will be useful for the daemon in Mesos container case because we no longer need `continueOnError` if no third party entity is modifying the cgroup from outside. 3. To support you guy's use case, we need to add a hack in 'Destroyer' as you suggested (continueOnError). 4. then, as YanX suggest, add a repair in caller. > Better support for containers that want to manage their own cgroup. > --- > > Key: MESOS-6489 > URL: https://issues.apache.org/jira/browse/MESOS-6489 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu > > Some containers want to manage their cgroup by sub-dividing the cgroup that > Mesos allocates to them into multiple sub-cgroups and put subprocess into the > corresponding sub-cgroups. > For instance, someone wants to run Docker daemon in a Mesos container. Docker > daemon will manage the cgroup assigned to it by Mesos (with the help , for > example, cgroups namespace). > Problems arise during the teardown of the container because two entities > might be manipulating the same cgroup simultaneously. For example, the Mesos > cgroups::destroy might fail if the task running inside is trying to delete > the same nested cgroup at the same time. > To support that case, we should consider kill all the processes in the Mesos > cgroup first, making sure that no one will be creating sub-cgroups and moving > new processes into sub-cgroups. And then, destroy the cgroups recursively. > And we need freezer because we want to make sure all processes are stopped > while we are sending kill signals to avoid TOCTTOU race problem. I think it > makes more sense to freezer the cgroups (and sub-cgroups) from top down > (rather than bottom up because typically, processes in the parent cgroup > manipulate sub-cgroups). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.
[ https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613942#comment-15613942 ] Yan Xu commented on MESOS-6489: --- A slightly different proposal for discussion: h4. Part one Give {{Future cgroups::destroy(const string& hierarchy, const string& cgroup)}} a {{bool continueOnError = true}} argument. In it: 1. Extract all cgroups (including sub cgroups) in bottom up fashion via {{cgroups::get(hierarchy, cgroup)}} 2. Hand over things to the {{Destroyer}} (i.e., don't remove cgroups in {{cgroups::destroy}} itself. Let {{Destroyer}} abstract this part out, it still does its job without a freezer subsystem) 3. Destroyer first tries to kill tasks. If freezer subsystem is available, it'll kill the tasks using TaskKillers; if not this step is a noop. 4. TaskKiller for an individual cgroup will fail if it the cgroup is missing. It's OK if we let it fail. We don't need to change the logic here. 5. The Destroyer is given a {{bool continueOnError}} mode, if {{true}}, it {{awaits}} for all futures (instead of {{collect}}) and tries to remove cgroups recursively regardless of failed futures. This is safe because if there are running tasks in a cgroup, {{remove()}} would just fail. We are also not singling out "if somebody has removed cgroup in a race condition" but rather giving Destroyer a more aggressive mode. If Docker removed a nested cgroup and caused an error, we still propagate the error after a more aggressive cleanup which still ends up destroying everything. The caller can decide how to deal with it the error. h4. Part two {{repair}} the Future returned by {{cgroups::destroy(...)}} in {{LinuxLauncherProcess::destroy(...)}}: With the above, the "hacky" part is left to {{Future LinuxLauncherProcess::destroy(const ContainerID& containerId)}} where you can repair the destroy: {code:title=} return cgroups::destroy( freezerHierarchy, cgroup(container->id), cgroups::DESTROY_TIMEOUT) .repair([](const Future& result) { // Comments explaining this. return cgroups::exists(cgroup(container->id)) ? result : Nothing(); }); {code} --- As a separate improvement, we can freeze the cgroups top-down by reversing the list returned by {{cgroups::get()}} when launching {{TaskKillers}}. {{TaskKillers}} currently in parallel but it should be OK. > Better support for containers that want to manage their own cgroup. > --- > > Key: MESOS-6489 > URL: https://issues.apache.org/jira/browse/MESOS-6489 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu > > Some containers want to manage their cgroup by sub-dividing the cgroup that > Mesos allocates to them into multiple sub-cgroups and put subprocess into the > corresponding sub-cgroups. > For instance, someone wants to run Docker daemon in a Mesos container. Docker > daemon will manage the cgroup assigned to it by Mesos (with the help , for > example, cgroups namespace). > Problems arise during the teardown of the container because two entities > might be manipulating the same cgroup simultaneously. For example, the Mesos > cgroups::destroy might fail if the task running inside is trying to delete > the same nested cgroup at the same time. > To support that case, we should consider kill all the processes in the Mesos > cgroup first, making sure that no one will be creating sub-cgroups and moving > new processes into sub-cgroups. And then, destroy the cgroups recursively. > And we need freezer because we want to make sure all processes are stopped > while we are sending kill signals to avoid TOCTTOU race problem. I think it > makes more sense to freezer the cgroups (and sub-cgroups) from top down > (rather than bottom up because typically, processes in the parent cgroup > manipulate sub-cgroups). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6489) Better support for containers that want to manage their own cgroup.
[ https://issues.apache.org/jira/browse/MESOS-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15611011#comment-15611011 ] Anindya Sinha commented on MESOS-6489: -- Jotting down some thoughts based on our previous conversation: In {{Future destroy(const string& hierarchy, const string& cgroup)}} 1. Extract all cgroups (including sub cgroups) in bottom up fashion via {{cgroups::get(hierarchy, cgroup)}} 2. If freezer is available: 2a. We use {{TasksKiller}} to freeze cgroups, {{SIGKILL}} all tasks, and thaw cgroups (may be in top down fashion). However, we add a new attribute to this class {{bool ignoreMissingCgroup}}. If that is set, we ignore any error for cgroups that do not exist in {{TasksKiller::finished()}}. 2b. At this point, we remove the cgroups in bottom up fashion incase there is no error reported in {{TasksKiller}}. We bail out as an error if there is any failure in removal of cgroups. Similar to step #2a, we ignore errors for cgroups that do not exist. 3. If freezer is unavailable, we remove the cgroups starting from bottom up using {{cgroups::remove(hierarchy, cgroup)}}. If remove fails due to non-presence of the cgroup, we ignore that failure, We will have the "ignore error due to missing cgroup" in 2 places, viz. {{TasksKiller::finished()}} and in {{cgroups::destroy}} > Better support for containers that want to manage their own cgroup. > --- > > Key: MESOS-6489 > URL: https://issues.apache.org/jira/browse/MESOS-6489 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu > > Some containers want to manage their cgroup by sub-dividing the cgroup that > Mesos allocates to them into multiple sub-cgroups and put subprocess into the > corresponding sub-cgroups. > For instance, someone wants to run Docker daemon in a Mesos container. Docker > daemon will manage the cgroup assigned to it by Mesos (with the help , for > example, cgroups namespace). > Problems arise during the teardown of the container because two entities > might be manipulating the same cgroup simultaneously. For example, the Mesos > cgroups::destroy might fail if the task running inside is trying to delete > the same nested cgroup at the same time. > To support that case, we should consider kill all the processes in the Mesos > cgroup first, making sure that no one will be creating sub-cgroups and moving > new processes into sub-cgroups. And then, destroy the cgroups recursively. > And we need freezer because we want to make sure all processes are stopped > while we are sending kill signals to avoid TOCTTOU race problem. I think it > makes more sense to freezer the cgroups (and sub-cgroups) from top down > (rather than bottom up because typically, processes in the parent cgroup > manipulate sub-cgroups). -- This message was sent by Atlassian JIRA (v6.3.4#6332)