[systemd-devel] Rationale for mirroring cpu and systemd cgroup subsystems

2014-11-05 Thread Umut Tezduyar Lindskog
Hi,

What is the reasoning for not joining cpu subsystem with systemd subsystem?

There are couple ways you can mirror [1] cpu and systemd subsystems
and doing so can result completely different cpu bandwidth for
processes.

I am wondering why we don't mirror them by default.

Not mirroring them results PID 1, each kernel thread and each user
space task having the same cpu bandwidth (/sys/fs/cgroup/cpu/tasks).
Even worse is the cpu bandwidth PID 1 gets goes down with the number
of processes spawned, possibly opening ways to DOS.

[1] - Simple changes that alter the entire cpu bandwidth processes get

a) DefaultCPUAccounting=yes will change the entire cpu bandwidth
allocation due to JoinControllers=cpu,cpuacct
b) Dropping a .slice and adding even only 1 service in it.
c) systemctl set-property system.slice CPUShares=1024 (Even though
1024 is the default cpu weight)

Umut
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Rationale for mirroring cpu and systemd cgroup subsystems

2014-11-05 Thread Lennart Poettering
On Wed, 05.11.14 13:41, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

 Hi,
 
 What is the reasoning for not joining cpu subsystem with systemd subsystem?
 
 There are couple ways you can mirror [1] cpu and systemd subsystems
 and doing so can result completely different cpu bandwidth for
 processes.
 
 I am wondering why we don't mirror them by default.

Because simply enabling a cpu controller for a unit already has
effects on the processes running it. For example, you don't get RT
anymore, and the general scheduling is altered to schedule your entire
group evenly against the all groups on the same level.

systemd will mirror a cgroup in the cpu hierarchy as soon as you
set a property on it that requires the cpu or cpuacct hierarchy,
for example CPUAccounting=, CPUShares= or CPUQuota.

Bu the general rule is: don't enable a controller for a unit, unless
we really need to. We must make sure the tree is always as minimal as
possible.

 Not mirroring them results PID 1, each kernel thread and each user
 space task having the same cpu bandwidth (/sys/fs/cgroup/cpu/tasks).
 Even worse is the cpu bandwidth PID 1 gets goes down with the number
 of processes spawned, possibly opening ways to DOS.

There has been a plan to introduce CPUFairScheduling= that you can set
on a slice, and that will turn on the cpu controller for all children
of that slice. Setting that on system.slice should have the desired
effect.

Regarding PID1: with the unified cgroup hierarchy it will not be
possible to have both populated subcgroups and processes in the same
cgroup. This means we will have to move PID 1 out of the root cgroup
anyway, probably into some unit in system.slice. This should fix
your problem, I figure? This would also allow applying cgroup resource
limits to PID 1 itself, for example to control the way it is scheduled
against other proceses.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Rationale for mirroring cpu and systemd cgroup subsystems

2014-11-05 Thread Umut Tezduyar Lindskog
On Wed, Nov 5, 2014 at 2:05 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Wed, 05.11.14 13:41, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

 Hi,

 What is the reasoning for not joining cpu subsystem with systemd subsystem?

 There are couple ways you can mirror [1] cpu and systemd subsystems
 and doing so can result completely different cpu bandwidth for
 processes.

 I am wondering why we don't mirror them by default.

 Because simply enabling a cpu controller for a unit already has
 effects on the processes running it. For example, you don't get RT
 anymore, and the general scheduling is altered to schedule your entire
 group evenly against the all groups on the same level.

Doesn't it make sense to turn it on by default and let users wanting
RT disable it? Seems like this was the case at some point -
http://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/
(Very much outdated article, we don't have ControlGroup= anymore)


 systemd will mirror a cgroup in the cpu hierarchy as soon as you
 set a property on it that requires the cpu or cpuacct hierarchy,
 for example CPUAccounting=, CPUShares= or CPUQuota.

You can turn on mirroring during runtime but as far as I know there is
no way going back without rebooting right?


 Bu the general rule is: don't enable a controller for a unit, unless
 we really need to. We must make sure the tree is always as minimal as
 possible.

 Not mirroring them results PID 1, each kernel thread and each user
 space task having the same cpu bandwidth (/sys/fs/cgroup/cpu/tasks).
 Even worse is the cpu bandwidth PID 1 gets goes down with the number
 of processes spawned, possibly opening ways to DOS.

 There has been a plan to introduce CPUFairScheduling= that you can set
 on a slice, and that will turn on the cpu controller for all children
 of that slice. Setting that on system.slice should have the desired
 effect.

 Regarding PID1: with the unified cgroup hierarchy it will not be
 possible to have both populated subcgroups and processes in the same
 cgroup. This means we will have to move PID 1 out of the root cgroup
 anyway, probably into some unit in system.slice. This should fix
 your problem, I figure? This would also allow applying cgroup resource
 limits to PID 1 itself, for example to control the way it is scheduled
 against other proceses.

We discussed putting systemd in to its own cgroup in Germany during
hack fest. It would solve the problem I have mentioned.

Umut


 Lennart

 --
 Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Rationale for mirroring cpu and systemd cgroup subsystems

2014-11-05 Thread Lennart Poettering
On Wed, 05.11.14 16:00, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:

 On Wed, Nov 5, 2014 at 2:05 PM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Wed, 05.11.14 13:41, Umut Tezduyar Lindskog (u...@tezduyar.com) wrote:
 
  Hi,
 
  What is the reasoning for not joining cpu subsystem with systemd subsystem?
 
  There are couple ways you can mirror [1] cpu and systemd subsystems
  and doing so can result completely different cpu bandwidth for
  processes.
 
  I am wondering why we don't mirror them by default.
 
  Because simply enabling a cpu controller for a unit already has
  effects on the processes running it. For example, you don't get RT
  anymore, and the general scheduling is altered to schedule your entire
  group evenly against the all groups on the same level.
 
 Doesn't it make sense to turn it on by default and let users wanting
 RT disable it? Seems like this was the case at some point -
 http://www.freedesktop.org/wiki/Software/systemd/MyServiceCantGetRealtime/
 (Very much outdated article, we don't have ControlGroup= anymore)

Yeah, I really need to update that article.

Generally we should try hard to keep the tree minimal. Resource
control enforcement is not free, and hence it should be opt-in, not
opt-out. This is something Tejun pretty explicitly asked us for: he
wants the most shallow tree that does what is needed.

  systemd will mirror a cgroup in the cpu hierarchy as soon as you
  set a property on it that requires the cpu or cpuacct hierarchy,
  for example CPUAccounting=, CPUShares= or CPUQuota.
 
 You can turn on mirroring during runtime but as far as I know there is
 no way going back without rebooting right?

In current versions it should correctly turn mirroring off again when
you reset the props to their defaults.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel