On Mon, Dec 01, 2014 at 12:06:03PM +0100, Martin Pitt wrote: > Hello all, > > In my efforts to make user LXC containers work I noticed that under a > "real" desktop (not just nspawn with VT login or ssh logins) my > carefully set up cgroups in the non-systemd controllers get reverted. > I. e. I put the session leader (and all other pids) of logind sessions > (/user.slice/user-1000.slice/session-XX.scope) into all cgroup > controllers, but a second after they are all back in the / cgroups (or > sometimes in /user.slice). After some days of debugging (during which > I learned a lot about the innards of systemd :-) I finally found out > why: > > During unit cgroup initialization (unit_realize_cgroup), sibling > cgroups are queued instead of initialized immediately. > unit_create_cgroups() makes an attempt to avoid initializing and > migrating a cgroup more than once: > > path = unit_default_cgroup_path(u); > [...] > r = hashmap_put(u->manager->cgroup_unit, path, u); > if (r < 0) { > log_error(r == -EEXIST ? "cgroup %s exists already: %s" : > "hashmap_put failed for %s: %s", path, strerror(-r)); > return r; > } > > But this doesn't work: "path" always gets allocated freshly in that > function, so the pointer is never in the hashmap and the -EEXISTS > never actually hits. This causes -.slice to be initialized and > recursively migrated a gazillion times, which explains the random > punting of sub-cgroup PIDs back to /. > > I fixed this with an explicit hashmap_get() call, which compares > values instead of pointers. > > I also added some debugging to that function: > > log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", u->id, > path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path)); > > (right before the hashmap_put() above), which illustrates the > problem: > > systemd[1]: Starting Root Slice. > systemd[1]: unit_create_cgroups -.slice: path= realized 0 hashmap (nil) > systemd[1]: Created slice Root Slice. > [...] > pid1 systemd[1]: unit_create_cgroups session-c1.scope: > path=/user.slice/user-1000.slice/session-c1.scope realized 0 hashmap (nil) > systemd[1]: Started Session c1 of user martin. > [... later on when most things have been started ...] > systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap > 0x7f0e14aa4850 > systemd[1]: unit_create_cgroups -.slice: cgroup exists already > systemd[1]: Failed to realize cgroups for queued unit user.slice: File exists > systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap > 0x7f0e14aa4850 > systemd[1]: unit_create_cgroups -.slice: cgroup exists already > systemd[1]: Failed to realize cgroups for queued unit grub-common.slice: File > exists > systemd[1]: unit_create_cgroups -.slice: path= realized 1 hashmap > 0x7f0e14aa4850 > systemd[1]: unit_create_cgroups -.slice: cgroup exists already > systemd[1]: Failed to realize cgroups for queued unit networking.slice: File > exists > > ... and so on, basically once for each .service. > > Initializing -.slice so many times is certainly an unintended effect > of the peer cgroup creation. Thus the patch fixes the multiple > initialization/creation, but a proper fix should also quiesce these > error messages. The condition could be checked explicitly, i. e. we > skip the "Failed to realize..." error for EEXISTS, or we generally > tone this down to log_debug. I'm open to suggestions. And of course > the log_debug should be removed; it's nice to illustrate the problem, > but doesn't really belong into production code. > > Thanks, > > Martin
Also this looks like a possible fix to the problem I tried to describe in, http://lists.freedesktop.org/archives/systemd-devel/2014-November/025607.html Michal > > -- > Martin Pitt | http://www.piware.de > Ubuntu Developer (www.ubuntu.com) | Debian Developer (www.debian.org) > From fd2f8a444c9c644a39dd3b619934e8768e03c2a3 Mon Sep 17 00:00:00 2001 > From: Martin Pitt <martin.p...@ubuntu.com> > Date: Mon, 1 Dec 2014 10:50:06 +0100 > Subject: [PATCH] Do not realize and migrate cgroups multiple times > > unit_create_cgroups() tries to check if a cgroup already exists. But has the > destination path is always allocated dynamically as a new string, that pointer > will never already be in the hashmap, thus hashmap_put() will never actually > fail with EEXISTS. Thus check for the existance of the cgroup path explicitly. > > Before this, "-.slice" got initialized and its child PIDs migrated many times > through queuing the realization of sibling units; thiss caused any cgroup > controller layout from sub-cgroups to be reverted and their pids moved back to > the root cgroup in all controllers (except systemd). > --- > src/core/cgroup.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/src/core/cgroup.c b/src/core/cgroup.c > index 8820bee..3d36080 100644 > --- a/src/core/cgroup.c > +++ b/src/core/cgroup.c > @@ -614,6 +614,13 @@ static int unit_create_cgroups(Unit *u, > CGroupControllerMask mask) { > if (!path) > return log_oom(); > > + log_debug("unit_create_cgroups %s: path=%s realized %i hashmap %p", > u->id, path, u->cgroup_realized, hashmap_get(u->manager->cgroup_unit, path)); > + > + if (hashmap_get(u->manager->cgroup_unit, path)) { > + log_error("unit_create_cgroups %s: cgroup %s exists > already", u->id, u->cgroup_path); > + return -EEXIST; > + } > + > r = hashmap_put(u->manager->cgroup_unit, path, u); > if (r < 0) { > log_error(r == -EEXIST ? "cgroup %s exists already: %s" : > "hashmap_put failed for %s: %s", path, strerror(-r)); > -- > 2.1.3 > > _______________________________________________ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel