On Thu, Jul 18, 2024, 15:43 Thomas Köller <tho...@koeller.dyndns.org> wrote:
> Am 18.07.24 um 14:04 schrieb Mantas Mikulėnas: > > Yes, but namespace persistence actually relies on filesystem access – > > it's implemented as a bind-mount of the namespace file descriptor (onto > > /run/netns for the 'ip netns' tool), as otherwise namespaces only exist > > as long as processes that hold them. > > > > So if you have any service options that cause a new *mount* namespace to > > be created (preventing its filesystem mounts from being visible outside > > the unit), then it cannot pin persistent network namespaces. > > Quoting the manual page: > ProtectSystem= > Takes a boolean argument or the special values "full" or > "strict". If true, mounts the /usr/ and the boot loader directories > (/boot and /efi) read-only for processes invoked by this unit. If set > to "full", the /etc/ directory is mounted read-only, too. > > No mention of /var or /run. It still works this way whether it's mentioned or not. Once the unit's process is put in a new mount namespace, the entire `/` is marked private so that any mounts made underneath `/` remain visible only in that namespace. This equally affects the "read-only /etc" mount done by systemd itself as well as the /run/netns mount done by 'ip' or any other mounts done anywhere else. In theory it would be possible to carve out exceptions such as marking /run shared again, but then /run/systemd would need to be marked private again, etc. – and mount propagation across namespaces is complex enough as it is. Also, note that the bind mounts in in > /var/run/netns and /run/netns are actually created by 'ip netns add', > they just are't usable. > No, the mount *points* in /run/netns are created (as regular empty files), but they don't become actual mounts, that's why they're not usable. There's a distinction between mount points (files or directories seen in `ls`) and mounts (seen in `findmnt`) – make your service script log its findmnt output to a file and compare it to findmnt output seen from the outside. (ember) /home/grawity $ mount | grep netns tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64) (ember) /home/grawity $ sudo systemd-run --shell -p ProtectSystem=full Running as unit: run-u1253.service; invocation ID: 9d4675b9ef7c40d68486b3058ee8a60b Press ^] three times within 1s to disconnect TTY. root@ember /home/grawity # mount | grep netns tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64) root@ember /home/grawity # ip netns add foo root@ember /home/grawity # mount | grep netns tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64) nsfs on /run/netns/foo type nsfs (rw) root@ember /home/grawity # exit Finished with result: success Main processes terminated with: code=exited, status=0/SUCCESS Service runtime: 18.451s (ember) /home/grawity $ mount | grep netns tmpfs on /run/netns type tmpfs (rw,nosuid,nodev,size=3268196k,nr_inodes=819200,mode=755,inode64) (ember) /home/grawity $ (The non-systemd rough equivalent is `unshare --mount --propagation=private`, and you can attach to a namespace using `nsenter` – an "ip netns exec" is approximately an `nsenter --net`.) >