On 1/8/26 11:37, Dima Zavin wrote:
> Commit c04b565204eb6b7e3508ac8dd42539ab97752635> reworked how switch_root moves mounts into the new root, but it
inadvertently removed the moving of the root itself onto / for the mount
namespace before chrooting.

This confuses future users of the mount namespaces since root mount gets
preserved and thus entering any derived mount namespace retains the pre-chroot
structure.

Sigh, switch_root is one of the commands I need to get scripts/test.sh to run under mkroot to automatically regression test.

Found in yocto (scarthgap, toybox 0.8.11) where the mount namespaces
contained just /rootfs in their /. Repro is simple:

Before:

```
% sudo nsenter -m -t $$
nsenter: failed to execute /bin/sh: No such file or directory

Huh, does nsenter -m effectively do a chdir / ? Does it _always_ break out of a normal chroot?

  $ cd toybox/root/x86_64
  $ sudo chroot fs
  password:
  $ mount -t proc proc /proc
  $ nsenter -m -t $$ /bin/sh
  # ls
  # head -n 1 /etc/os-release
  PRETTY_NAME="Devuan GNU/Linux 5 (daedalus)"

Apparently so. Good to know, I guess. (Dear lkml: what the? I know you refused to patch the cd ../../../.. hole but this is just silly.)

% sudo nsenter -m -t $$ /rootfs/usr/lib64/ld-linux-x86-64.so.2 \
    --library-path /rootfs/lib:/rootfs/lib64:/rootfs/usr/lib64:/rootfs/usr/lib \
   /rootfs/usr/sbin/chroot.coreutils /rootfs

You manually ran the dynamic linker against chroot.coreutils, to chroot into /rootfs, within which I'm assuming it ran /bin/sh. Not sure what that proved, you just chrooted _back_ without the mount --move a second time.

#
```

After:
```
% sudo nsenter -m -t $$
#
```

Fixes #557

Signed-off-by: Dima Zavin <[email protected]>
---
 toys/other/switch_root.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/toys/other/switch_root.c b/toys/other/switch_root.c
index 1c750608f..b63b92ec3 100644
--- a/toys/other/switch_root.c
+++ b/toys/other/switch_root.c
@@ -97,6 +97,12 @@ void switch_root_main(void)
   // Ok, enough safety checks: wipe root partition.
   dirtree_read("/", del_node);
+ // Fix the appearance of the mount table in the newroot chroot
+  if (mount(".", "/", NULL, MS_MOVE, NULL)) {
+    perror_msg("mount");
+    goto panic;
+  }
+
   // Enter the new root before starting init
   if (chroot(".")) {
     perror_msg("chroot");

In theory the dirtree_read("/") is supposed to operate on "/" as well as the children. In practice there's a sequencing issue with mounts being under other mounts (which this is a trivial case of). if you have two mount points arranged dir1/dir2, you need to move dir2 to /tmp, move dir1 new, and them move /tmp to new/dir1/dir2. (There's no MS_MOVE_ALL flag I'm aware of.)

The easy fix for the current case is to DIRTREE_COMEAGAIN and handle all the moves in the second callback, that way all children are handled before their parents. (This avoids adding a second explicit mount() call when the first mount() call can theoretically already handle it. Single Point of Truth and all that...)

This doesn't solve the larger problem (ala /dev being a devtmpfs and /dev/pts being a devpts), but might address _this_ issue without adding significant code.

Do I _want_ to try to fix the larger issue? I'd need an arbitrary number of mountpoints to hold arbitrarily deep trees while moving them, and I'm not guaranteed to have any writeable space to mkdir in. That's why I didn't try to tackle it before. In theory "switch_root before doing your setup" has been the order of the day... in which case you don't need to care about any child mounts, you just want to swap two mounts the way pivot_root does.

Would the simpler non-recursive version break anybody? I have no idea. You'd want to move /dev is if CONFIG_DEVTMPFS_MOUNT worked but the kernel guys have refused https://landley.net/bin/mkroot/0.8.13/linux-patches/0003-Wire-up-CONFIG_DEVTMPFS_MOUNT-to-initramfs.patch and friends for NINE YEARS now. (Which is why the stupid "static initramfs has no stdin/stdout/stderr when it launches PID 1" bug keeps cropping back up, because the kernel has inconsistent behavior in different codepaths...)

Are there existing users would be broken by doing less, or is everybody just calling switch_root as the first thing and then have the "real" init script live in the new filesystem?

Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to