On Thu, 30.04.15 15:42, Alban Crequy (al...@endocode.com) wrote: > > systemd-nspawn nowadays mounts all hierarchies into the container, but > > mounts all controller hierarchies read-only, and of the name=systemd > > hierarchy mounts everything read-only, except the subtree the > > container is allowed to manage. That way only the cgroup tree the > > container needs access to is writable to it. That solution however > > does not hide the cgroup tree. A process running inside the container > > can still go an explore the tree and its attributes. However, all > > other groups will appear empty to it, since processes not in the > > container PID namespaces will be suppressed when reading the member > > process list. > > To sum up what systemd-nspawn is currently mounting in the container: > - /sys/fs/cgroup/systemd/ --> mounted RO > - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/ --> mounted RW > - /sys/fs/cgroup/cpu,cpuacct/ --> mounted RO > - etc. for other cgroup hierarchies --> mounted RO
Correct. > In order to let systemd in the container restrict cpu, memory, etc. on > some of its services (see manpage systemd.resource-control(5)), rkt > would like systemd-nspawn to mount a subtree of some hierarchy > (cpu,cpuacct, memory) in read-write mode. That's really not a safe thing to do right now... the kernel isn't ready for this, as cgroups access is an all-or-nothing thing currently: if you have access to a cgroup and cane creat child cgroups in it you have access to *all* attributes you like, the dangerous ones as well as the not so dangerous ones. > Is there any issues with changing the systemd-nspawn mounts in the > following way: > - /sys/fs/cgroup/systemd/ --> mounted RO > - /sys/fs/cgroup/systemd/machine.slice/machine-xxx.scope/ --> mounted RW > - /sys/fs/cgroup/cpu,cpuacct/ --> mounted RO > - /sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-xxx.scope/ --> mounted RW > - etc. for other cgroup hierarchies. > > Iago wrote two experimental patches on systemd-nspawn to try that and > it worked. Delegate=yes was enabled in systemd-nspawn in order to test > this: > https://github.com/endocode/systemd/commits/iaguis/delegate > > But I would like to know what is missing to make this safe (or if it > is already safe to do). Well, nspawn does actually not make any guarantees about security currently. Since we pass CAP_SYS_ADMIN by default to the contaienrs people can mount whatever they want and remount things freely from within. Hence, opening this up would not make things much worse. That said, I am a bit concerned about opening this up by default. Even though containers are insecure we should try to be safe wherever we can if it doesn't affect usability too much. Adding a new cmdline switch for all of this sounds not too attractive though, but maybe a --delegate switch would be OK, which would open up all controllers to the containers.... It would have a similar effect then on the containers as Delegate=yes has for service processes... Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel