Re: [systemd-devel] Automatically moving forked processes in a different cgroup based on children's UID
On Fri, Jan 7, 2022 at 4:11 PM Michal Koutný wrote: > More viable way seems to me to modify the apache2-mpm-itk to put > children into respective cgroups. > I'm assuming that the resource usage primarily comes from something like webapps running via mod_php, rather than Apache itself, in which case a better approach would be to move the webapps out of apache entirely. At work we also looked into mpm-itk for a student "shared hosting" server, but instead ended up with a setup where each user automatically gets their own PHP-FPM pool (with each vhost configured to talk to their own PHP-FPM socket), so resource limits could be set either at PHP-FPM level, or if needed multiple php-fpm@.service instances could be run with their own cgroups. -- Mantas Mikulėnas
Re: [systemd-devel] Automatically moving forked processes in a different cgroup based on children's UID
Hello Wadih. On Sat, Jan 01, 2022 at 04:41:12PM -0500, Wadih wrote: > Is there a way to automatically classify child processes of a process > in a different cgroup than the spawning process with systemd based on > the children's new UID? I know apache2-mpm-itk calls setuid() on its > children, so we would have to somehow hook on that. You can summon the whole PAM machinery and include pam_systemd in the stack which would create a new session scope for the user. (Or do it yourself from the process via DBus call org.freedesktop.systemd1.Manager.StartTransientUnit() that gives you more freedom for that). (Note that to keep the service lifecycle tracking under the name of apache2.service, the forked children should not reparent under PID 1 so that service parent can properly track them.) > I'd like to have the child processes that apache2-mpm-itk spawns go > under their respective user, e.g. > [...] > system.slice/apache2.service/vhosts/%UID% That's an alternative of maintaining the (relative) (sub)hierachy yourself (and it doesn't require special treating wrt apache2.service lifecycle). Note that for this cgroup tree you'd need to specify apache2.service Delegate= directive though. > I've been able to do this with cgrulesengd and cgconfigparser for 3 > years, it's been rock solid. I'm glad it work(s|ed) for you. The asynchronous classification via cgrulesengd is racy and may not be always reliable (wrt resource control). It's much better to do fork-classify-exec or fork(CLONE_INTO_CGROUP)-exec synchronously in the migrated task. > Would the only solution for me to create a daemon which monitors for > setuid() calls of the parent apache process, and classify the children > as per the new setuid user? I'd discourage you of going the path of cgrulesengd again. (And cgroupify too :-p) > Or perhaps, I think root parent processes spawning specific UID > children is a common security practise, perhaps there should be > something in systemd out of the box for classifying the children under > their respective cgroups? Yes, on the low level it's the StartTransientUnit() DBus call or its specialized extensions for logind or machinectl. > If my only solution is to create a daemon which monitors for setuid() > I'll do it, although I've never done it before, not sure where I'd have > to start. Any guidance would be great! More viable way seems to me to modify the apache2-mpm-itk to put children into respective cgroups. HTH, Michal
Re: [systemd-devel] Automatically moving forked processes in a different cgroup based on children's UID
Hi, systemd will not help you with managing the cgroup sub-hierarchy underneath the daemon. I suppose the most generic solution would be something like cgrulesengd for cgroup v2. No idea if something like that exists. I assume you have had a look at https://systemd.io/CGROUP_DELEGATION/#three-scenarios and other parts of that document. And that you are choosing option #2 for good reasons. Managing the cgroup hierarchy is quite simple in principle (mkdir and then a write to cgroup.procs). Or, even better by using CLONE_INTO_CGROUP when creating the processes. It is not that hard to write small daemon that does this. If you want to do so, then you could look into the cgroupify hack[1] that is in uresourced to move each process into its own cgroup. This is done for browsers in Fedora as systemd-oomd always kills an entire cgroup. That said, it is not perfect and you'll need a different logic overall. But it may be a good reference if you want to implement something similar yourself. Having the cgroup management inside apache itself would probably be better overall and may not be much harder. Benjamin [1] https://gitlab.freedesktop.org/benzea/uresourced/-/blob/master/cgroupify/cgroupify.c Startup works by installing a small template service https://gitlab.freedesktop.org/benzea/uresourced/-/blob/master/data/user/cgroup...@.service.in and a simple drop-in unit for every service that should be managed https://gitlab.freedesktop.org/benzea/uresourced/-/blob/master/data/user/cgroupify.service On Sat, 2022-01-01 at 16:41 -0500, Wadih wrote: > Hi, > > I've been using apache2-mpm-itk with cgrulesengd in cgroupv1 to > automatically classify the child processes that apache2-mpm-itk > spawns when servicing web requests for different vhosts for about 3 > years, and it's been working great, when a vhost starts using up too > much CPU/RAM, oom killer takes care of that specific vhost and leaves > the others alone, as well as the parent process. > > I'm now preparing to move to Debian 11 as part of my yearly updates, > and I'm finding out that I need to use cgroup v2 now. So I'm trying > to bring my resource control solution to the new world. > > When I create my e.g. /etc/systemd/system/user- > UID.slice.d/override.conf with the resource controls for that user, > they don't apply to the forked processes, as cgrulesengd used to be > able to do, as I am confirming with systemd-cgls. Instead, the parent > and all its children all still belong to the same apache2.service > slice. Which makes sense since it wasn't systemd that spawned the > child processes. > > Is there a way to automatically classify child processes of a process > in a different cgroup than the spawning process with systemd based on > the children's new UID? I know apache2-mpm-itk calls setuid() on its > children, so we would have to somehow hook on that. > > By default, the processes are now all in : > > system.slice/apache2.service > > I'd like to have the child processes that apache2-mpm-itk spawns go > under their respective user, e.g. > > system.slice/apache2.service/vhosts/%UID% > > And then I would set a memory limit of 1G on > system.slice/apache2.service/vhosts > > Then when the sum of the memory usage of the vhosts goes above 1G, > oom killer will choose the biggest offending group under > system.slice/apache2.service/vhosts/ and terminate that group, > without touching the others nor the parent process. I've been able to > do this with cgrulesengd and cgconfigparser for 3 years, it's been > rock solid. I'm trying to bring that to the new systemd world. > > Would the only solution for me to create a daemon which monitors for > setuid() calls of the parent apache process, and classify the > children as per the new setuid user? > > Or perhaps, I think root parent processes spawning specific UID > children is a common security practise, perhaps there should be > something in systemd out of the box for classifying the children > under their respective cgroups? > > If my only solution is to create a daemon which monitors for setuid() > I'll do it, although I've never done it before, not sure where I'd > have to start. Any guidance would be great! > > Thank you so much, > > Wadih Maalouf signature.asc Description: This is a digitally signed message part
[systemd-devel] Automatically moving forked processes in a different cgroup based on children's UID
Hi, I've been using apache2-mpm-itk with cgrulesengd in cgroupv1 to automatically classify the child processes that apache2-mpm-itk spawns when servicing web requests for different vhosts for about 3 years, and it's been working great, when a vhost starts using up too much CPU/RAM, oom killer takes care of that specific vhost and leaves the others alone, as well as the parent process. I'm now preparing to move to Debian 11 as part of my yearly updates, and I'm finding out that I need to use cgroup v2 now. So I'm trying to bring my resource control solution to the new world. When I create my e.g. /etc/systemd/system/user- UID.slice.d/override.conf with the resource controls for that user, they don't apply to the forked processes, as cgrulesengd used to be able to do, as I am confirming with systemd-cgls. Instead, the parent and all its children all still belong to the same apache2.service slice. Which makes sense since it wasn't systemd that spawned the child processes. Is there a way to automatically classify child processes of a process in a different cgroup than the spawning process with systemd based on the children's new UID? I know apache2-mpm-itk calls setuid() on its children, so we would have to somehow hook on that. By default, the processes are now all in : system.slice/apache2.service I'd like to have the child processes that apache2-mpm-itk spawns go under their respective user, e.g. system.slice/apache2.service/vhosts/%UID% And then I would set a memory limit of 1G on system.slice/apache2.service/vhosts Then when the sum of the memory usage of the vhosts goes above 1G, oom killer will choose the biggest offending group under system.slice/apache2.service/vhosts/ and terminate that group, without touching the others nor the parent process. I've been able to do this with cgrulesengd and cgconfigparser for 3 years, it's been rock solid. I'm trying to bring that to the new systemd world. Would the only solution for me to create a daemon which monitors for setuid() calls of the parent apache process, and classify the children as per the new setuid user? Or perhaps, I think root parent processes spawning specific UID children is a common security practise, perhaps there should be something in systemd out of the box for classifying the children under their respective cgroups? If my only solution is to create a daemon which monitors for setuid() I'll do it, although I've never done it before, not sure where I'd have to start. Any guidance would be great! Thank you so much, Wadih Maalouf