I would really appreciate some help with this from someone who's familiar with the systemd internals.
What mechanism to prevent cg_trim from removing a cgroup before the newly created child has completed cg_set_task_access? I've created bug 63080 for this as well. /Anders > -----Original Message----- > From: systemd-devel- > [email protected] [mailto:systemd- > [email protected]] On Behalf > Of Anders Olofsson > Sent: den 27 mars 2013 13:58 > To: [email protected] > Subject: Re: [systemd-devel] Possible race condition for setting cgroup sticky > bit > > I just tested it with systemd 199 and the problem still occurs. > > However it now fails with " Failed at step CGROUP spawning /etc/init.d/rc: > No such file or directory" just like in 197 and not with a segfault as I saw > (at > least sometimes) with 198. > > /Anders > > > -----Original Message----- > > From: systemd-devel- > > [email protected] > [mailto:systemd- > > [email protected]] On > Behalf > > Of Anders Olofsson > > Sent: den 26 mars 2013 13:43 > > To: [email protected] > > Subject: [systemd-devel] Possible race condition for setting cgroup sticky > bit > > > > I'm seeing a problem with a service sometimes failing to start due to a > > missing cgroup. > > After some debugging I've made the following observations: > > > > After exec_spawn() forks, the child will set the sticky bit for the cgroup > > (in > > cg_set_task_access) but sometimes, the cgroup is missing (lstat returns > "No > > such file or directory"). > > > > The cgroup is always created, but the main process will call cg_trim (from > > cgroup_bonding_trim <- cgroup_bonding_trim_list <- > cgroup_notify_empty > > <- private_bus_message_filter ...) which will remove the cgroup if the > sticky > > bit isn't set. > > > > This seems to be a race condition. > > If the child sets the sticky bit first, the parent will leave the cgroup > > alone. > But > > if the main process gets to cg_trim first, the cgroup is removed and the > child > > fails. > > > > We're using systemd 197. I've tried using 198, but there the child dies with > > SIGSEGV so it's harder to debug what's happening. > > The problem appeared when we switched from Linux 3.4 to 3.7, but as this > > looks like a race in systemd so I'm not sure if our local kernel tree is to > blame > > or if the version bump just changed the timing to trigger the race in > systemd. > > > > Since I'm not familiar with the systemd internals and cgroups I would > > appreciate some help to resolve this. > > > > I can reproduce this pretty easy, usually within 5-10 boots. It's always the > > same service that fails and the services before it never fails. > > > > /Anders > > _______________________________________________ > > systemd-devel mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/systemd-devel > _______________________________________________ > systemd-devel mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/systemd-devel _______________________________________________ systemd-devel mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/systemd-devel
