On Wednesday 2012-01-11 15:26, Lennart Poettering wrote: >On Wed, 11.01.12 14:44, Jan Engelhardt (jeng...@medozas.de) wrote: > >> >> Forcing the use of @ introduces a policy, which should preferably not be >> >> done. Since programs started from the initrd obviously should be having >> >> a /proc/*/{cwd,exe} symlinks pointing to the initramfs vfsmount. >> > >> >They are in a different namespace, so that wouldn't work. >> >> Namespace as in clone(2)'s CLONE_NEWNS? > >No, my expression was a bit unclean there. > >What I meant is that the path in argv[0] and similar stops making sense >after the switch to the root fs, since we did a MS_MOVE there, which >invalidates all old paths...
Yeah, I was not talking about argv[0], since that is user-changable anywhow. My words were about /proc/self/exe, which is a link to the absolute path - and may not be the same as argv[0]. >But yeah, there's no new vfs namespace opened, just some major changes in >what means what in the original namespace. Since everybody seems to have a brainknot right now, let's attempt to shed some more light. A mount namespace is a set of vfsmounts. The vfsmounts in your current mount namespace can be obtained through /proc/self/mounts or others like mountinfo. CLONE_NEWNS in a clone(2) call creates a new namespace, inheriting all vfsmounts and their positions, and this is the only way (I know) to create one. chroot does _NOT_ create a new mount namespace, because a vfsmount created within the chroot can be unmounted from a different process not inside the chroot jail. Since the system initialization procedures in dracut and systemd don't issue CLONE_NEWNS as I gather, we can completely ignore namespaces this instant. Now, the kernel has a rootfs-type vfsmount initially mounted on /. This is where your initramfs cpio is extracted to. You can see it as being the first entry in /proc/self/mounts ("rootfs / rootfs rw 0 0"). Since commands tell more than a thousand words: /bin/sleep 99999 & pid=$! mount /dev/sda3 /mnt mkdir /mnt/var/run/rootfs cd /mnt pivot_root /mnt /mnt/var/run/rootfs readlink -f /proc/$pid/exe => should now yield /var/run/rootfs/bin/sleep Therefore you can detect which programs where started inside the rootfs vfsmount. That information can then influence killing decisions as needed. Now, Kay Sievers claims (on IRC) pivot_root is "10 years ago stuff" and points to util-linux's switchroot function for how things are supposedly to be done today. But, as we look at http://git.kernel.org/?p=utils/util-linux/util-linux.git;a=blob;f=sys-utils/switch_root.c;hb=HEAD#l150 what can really be seen there is that the new root (/dev/sda3 in my previous commands example) is just mounted atop the rootfs-type vfsmount, thereby concealing it. (That is not a replacement for what pivot_root does, really.) Of course, if you conceal the rootfs-type vfsmount, there is no way that the proc trick is going to work -- which is why I proposed using pivot_root instead of {MS_MOVE + chroot} and *keeping* the rootfs vfsmount around, in a visible fashion. Similarly, when systemd wants to return to the initramfs, it can just pivot_root again, this time by cd /var/run/rootfs pivot_root /var/run/rootfs /var/run/rootfs/mnt (or the C equivalent using pivot_root(2) of course.) _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel