[Bug 1824407] Re: why does booting any livefs squashfs has kernel complaining about unable to read metadata something rather

Dimitri John Ledkov Tue, 22 Oct 2019 07:21:34 -0700

** Description changed:

- Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: squashfs_read_data 
failed to read block 0x6ff3660032757063
- Apr 11 18:32:52 ubuntu-server kernel: SQUASHFS error: Unable to read metadata 
cache entry [6ff3660032757063]
- Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: squashfs_read_data 
failed to read block 0x6261746d79732e
- Apr 11 18:32:55 ubuntu-server kernel: SQUASHFS error: Unable to read metadata 
cache entry [6261746d79732e]
- Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: squashfs_read_data 
failed to read block 0x6ff366df00333a37
- Apr 11 18:33:05 ubuntu-server kernel: SQUASHFS error: Unable to read metadata 
cache entry [6ff366df00333a37]
+ 1) Download focal subiquity daily image
+ 2) boot, and press ESC and edit boot command line (F6 in bios, e in UEFI)
+ 3) Before --- insert the following options
+  bebroken debug init=/bin/bash 
+ 4) Continue boot (Enter in BIOS, ctrl+x in UEFI)
  
- Happens when booting e.g. subiquity disco image. v5.0.0-8-generic kernel
+ 5) you will be dropped into pivoted root filesystem, before systemd is execed 
as pid one
+ 6) /run/initramfs/ will contain a debug log, showing how everything was 
mounted. Ie. cdrom mounted, squashfs losetup from there, then multilower 
overlay setup from them, moved to /root, and then pivot-root to /root done to 
finally end up as /. Underlying layers are moved into /cow for your convenience.
+ 
+ 7) At this point modifying zero-byte length files, that exist in the
+ lowest layer, but not the middle one, in certain ways, will results in
+ them to be corrupted, after / is remounted.
+ 
+ 8) Exhibit A:
+ $ cat /etc/machine-id
+ (no output)
+ $ systemd-machine-id-setup
+ $ cat /etc/machine-id
+ (some machine id)
+ $ mount -o remount /
+ $ cat /etc/machine-id
+ I/O error
+ with overlay errors in dmesg
+ 
+ Similarly one can reproduce this with /etc/.pwd.lock & executing
+ systemd-sysusers.
+ 
+ systemd-machine-id-setup is probably the easiest to trace. It does a
+ simply open, truncate, lseek, write. On boot, actuall remount is done by
+ the starting a unit which calls /lib/systemd/systemd-remount-fs
+ 
+ Lots of things break once machine-id and .pwd.lock are corrupted. I.e.
+ unable to dhcp, connect to dbus, add/remove/change users or groups, etc.
+ 
+ We were unable to recreate the issue outside of booting things with
+ casper. Ie. statically on a regular host machine without pivot-root. But
+ hopefully booting to a quite state with nothing running is sufficient to
+ reproduce this.
+ 
+ Instead of booting with `bebroken init=/bin/bash` you can boot with
+ `bebroken systemd.mask=systemd-remount-fs.service` this will complete
+ the boot, with /etc/machine-id & .pwd.lock modified, meaning that
+ remount of / will cause IO errors on those files.
+ 
+ Currently, we are shipping two hacks in casper to "rm" the offending
+ files, and create them again on the upper rw layer. They then survive
+ remount without i/o errors. However, we'd rather not ship those hacks,
+ and have kernel overlay fixed to work correctly with multi-lower-dir and
+ not corrupt files upon remounting /.


** Changed in: linux (Ubuntu)
       Status: Incomplete => New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1824407

Title:
   why does booting any livefs squashfs has kernel complaining about
  unable to read metadata something rather

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1824407/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1824407] Re: why does booting any livefs squashfs has kernel complaining about unable to read metadata something rather

Reply via email to