[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
I ran the above test: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240123_035147_6470b@/log.gz It does appear that systemd-udevd is trying to scan partitions at the same time as losetup: 1599s ++ losetup --show -f -P -v binary/boot/disk-uefi.ext4 1600s + loop_device=/dev/loop0 1600s + '[' '!' -b /dev/loop0 ']' 1600s + rootfs_dev_mapper=/dev/loop0p1 1600s + '[' '!' -b /dev/loop0p1 ']' 1600s + echo '/dev/loop0p1 is not a block device' 1600s /dev/loop0p1 is not a block device 1600s + echo '=== dmesg ===' 1600s === dmesg === 1600s + dmesg -c 1600s [ 986.014824] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. 1600s [ 992.684380] EXT4-fs (loop0p1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. 1600s [ 1043.171603] loop0: detected capacity change from 0 to 4612096 1600s [ 1043.171924] loop0: p1 p14 p15 1600s [ 1043.190421] loop0: p1 p14 p15 1600s + cat /sys/kernel/debug/tracing/trace 1600s # tracer: function 1600s # 1600s # entries-in-buffer/entries-written: 2/2 #P:4 1600s # 1600s #_-=> irqs-off 1600s # / _=> need-resched 1600s # | / _---=> hardirq/softirq 1600s # || / _--=> preempt-depth 1600s # ||| / _-=> migrate-disable 1600s # / delay 1600s # TASK-PID CPU# | TIMESTAMP FUNCTION 1600s # | | | | | | 1600s losetup-50167 [002] . 1043.176845: bdev_disk_changed <-loop_reread_partitions 1600ssystemd-udevd-321 [000] . 1043.195003: bdev_disk_changed <-blkdev_get_whole 1600s + echo 0 1600s + ls -l /dev/loop0p1 1600s brw--- 1 root root 259, 3 Jan 23 03:51 /dev/loop0p1 1600s + exit 1 1600s + clean_loops Maybe we just need something like this? diff --git a/drivers/block/loop.c b/drivers/block/loop.c index 48c530b83000e..52fda87f5d674 100644 --- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1366,13 +1366,13 @@ static int loop_configure(struct loop_device *lo, fmode_t mode, if (partscan) lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN; - /* enable and uncork uevent now that we are done */ - dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0); - loop_global_unlock(lo, is_loop); if (partscan) loop_reread_partitions(lo); + /* enable and uncork uevent now that we are done */ + dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0); + if (!(mode & FMODE_EXCL)) bd_abort_claiming(bdev, loop_configure); -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
I ran into this on jammy/amd64: https://autopkgtest.ubuntu.com/results/autopkgtest- jammy/jammy/amd64/l/livecd-rootfs/20240121_173406_e4f9a@/log.gz I downloaded all of the amd64 failures and searched for this failure pattern. These were the kernels that were running at the time: "Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023" "Linux 6.2.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 14 12:34:02 UTC 2023" "Linux 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun 8 16:02:30 UTC 2023" "Linux 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 7 01:35:40 UTC 2023" "Linux 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 2023" Here's the count of failures per image type: 12 017-disk-image-uefi.binary 3 018-disk-image.binary 3 020-kvm-image.binary 1 023-vagrant.binary 1 024-vagrant.binary I can confirm that /dev/loop0p1 is created by devtmpfs. This surprised me because I'd never actually need to know what devtmpfs was, and I saw devices being created even though I had SIGSTOP'd systemd-udevd. But watching udevadm monitor and forktrace output convinced me. I had a theory that something was opening the first created partition before all partitions were created. loop_reread_partitions() can fail without returning an error to userspace: https://elixir.bootlin.com/linux/v5.15.147/source/drivers/block/loop.c#L676 that could happen if bdev_disk_changed() aborts because it finds another partition on the device is open: https://elixir.bootlin.com/linux/v5.15.147/source/block/partitions/core.c#L662 But then we should see this in dmesg: pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n" I added dmesg calls to check that: https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240122_161631_62ecd@/log.gz .. but no such message appeared, so that's not it. But what *is* interesting there is that it shows *2* partition scan lines: 1248s [ 990.855361] loop0: detected capacity change from 0 to 4612096 1248s [ 990.855628] loop0: p1 p14 p15 1248s [ 990.874241] loop0: p1 p14 p15 Previously we just saw 1: 1189s [ 932.268459] loop0: detected capacity change from 0 to 4612096 1189s [ 932.268715] loop0: p1 p14 p15 That only gets printed when bdev_disk_changed() is called. So do we have 2 racing callers? One thing that seems off is that loop_configure() unsuppresses uevents for the full device before the partition scan, but loop_change_fd() waits until the partition scan is complete. Shouldn't they be following the same pattern? I wonder if that could cause the following race: [livecd-rootfs] losetup creates /dev/loop0 [livecd-rootfs] kernel sends uevent for /dev/loop0 [livecd-rootfs] /dev/loop0p* appear in devtmpfs [udev] receives uevent for loop0 [udev] partprobe /dev/loop0 [livecd-rootfs] losetup exit(0) [partprobe] /dev/loop0p* cleared [livecd-rootfs] check for /dev/loop0p1 FAILS [partprobe] /dev/loop0p* recreated I tried checking for this using ftrace in a local jammy VM. I haven't been able to reproduce this in a local VM, but I wanted to see what happens in a normal losetup.. er... setup. > First I used losetup to create the device: root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# loopdev="$(losetup --show -f -P -v /home/ubuntu/disk.img)" root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 1/1 #P:1 # #_-=> irqs-off # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # / delay # TASK-PID CPU# | TIMESTAMP FUNCTION # | | | | | | losetup-1996[000] . 657.573994: bdev_disk_changed <-loop_reread_partitions > Only the expected bdev_disk_change() call > Then I remove the device: root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# losetup -v -d $loopdev root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 3/3 #P:1 # #_-=> irqs-off # / _=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # / delay # TASK-PID CPU# | TIMESTAMP FUNCTION # | | | | | | losetup-1996[000] . 657.573994: bdev_disk_changed <-loop_reread_partitions systemd-udevd-2524[000] . 680.555336: bdev_disk_changed <-blkdev_get_whole
Re: [Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
> Is only asking kernel to scan the device; to then generate "kernel udev" > events; for then udev to wakeup and process/emit "udev udev" events; and > create the required device nodes. > It's not udev that creates nodes like /dev/loop1p1 though is it? That's devtmpfs surely. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
I don't have a good explanation, but in the past I've "fixed" such races by adding a `sync "$loop_device"` before using any of the newly created partitons. Maybe it's worth trying. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
my expectation is that udev should be running (somewhere, not sure if it needs to be both the host and the lxd guest) and that it should process the device using locks https://systemd.io/BLOCK_DEVICE_LOCKING/. After that is done, the device should be safe to operate on, in a consistent manner. After all, -P, --partscan Force the kernel to scan the partition table on a newly created loop device. Note that the partition table parsing depends on sector sizes. The default is sector size is 512 bytes, otherwise you need to use the option --sector-size together with --partscan. Is only asking kernel to scan the device; to then generate "kernel udev" events; for then udev to wakeup and process/emit "udev udev" events; and create the required device nodes. We have always been fixing and supporting running udev inside the lxd containers, because of such things (in contexts of priviledged containers, but outside of lp-buildd) to make all of this work. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
Oh. To the question of whether there was a systemd change in this window: yes absolutely, because this is the point at which the riscv64 builders moved from lgw manually-operated qemu with a 20.04 guest image, to bos03 openstack-operated qemu with a 22.04 guest image. Which is also why we've moved from 5.13.0-1019-generic to 5.19.0-1021-generic. But again, it was my understanding that these devices are supposed to be created synchronously WITHOUT the involvement of udev. In fact, we had to make launchpad-buildd changes to make use of these devices at all because udev would NOT set them up for us. So if these are now being set up via udev, that's a significant departure from expectations and it's not clear we even CAN have synchronous behavior given that they would be set up by the host udev and not the udev in the lxd container! -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
Re: [Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
On Sat, Dec 09, 2023 at 05:13:28PM -, Andy Whitcroft wrote: > Was there any systemd/udev change in this timeframe? As the device > files are very much connected to those. My understanding is that these devices are supposed to be created directly by the kernel on devtmpfs and NOT via udev, which is part of how we expected to fix the earlier races. And systemd did not change in this time frame in any release. If there was a change to the HOST udev in this timeframe causing a regression because a new base image was published that includes a newer udev, we don't have visibility on it. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
Was there any systemd/udev change in this timeframe? As the device files are very much connected to those. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/544490 is a log from a build with a new livecd-rootfs that spits out more debugging info on failure. + sgdisk binary/boot/disk-uefi.ext4 --print Disk binary/boot/disk-uefi.ext4: 9437184 sectors, 4.5 GiB Sector size (logical): 512 bytes Disk identifier (GUID): CD1DD3AE-E4C8-4C5F-BD64-9236C39B9824 Partition table holds up to 128 entries Main partition table begins at sector 2 and ends at sector 33 First usable sector is 34, last usable sector is 9437150 Partitions will be aligned on 2-sector boundaries Total free space is 0 sectors (0 bytes) Number Start (sector)End (sector) Size Code Name 1 235520 9437150 4.4 GiB 8300 12 227328 235519 4.0 MiB 8300 CIDATA 13 342081 1024.0 KiB loader1 142082 10239 4.0 MiB loader2 15 10240 227327 106.0 MiB EF00 + mount_image binary/boot/disk-uefi.ext4 1 + trap clean_loops EXIT + backing_img=binary/boot/disk-uefi.ext4 + local rootpart=1 ++ losetup --show -f -P -v binary/boot/disk-uefi.ext4 + loop_device=/dev/loop5 + '[' '!' -b /dev/loop5 ']' + rootfs_dev_mapper=/dev/loop5p1 + '[' '!' -b /dev/loop5p1 ']' + echo '/dev/loop5p1 is not a block device' /dev/loop5p1 is not a block device + ls -l /dev/loop5p1 /dev/loop5p12 brw--- 1 root root 259, 2 Dec 9 04:16 /dev/loop5p1 brw--- 1 root root 259, 3 Dec 9 04:16 /dev/loop5p12 + exit 1 This clearly shows that: - there are 5 partitions on the image being passed to losetup - after losetup exits, /dev/loop5p1 is not present - after this check fails, an ls of /dev/loop5p* shows devices present for two of the partitions - including /dev/loop5p1 that we were looking for in the first place - but not all 5. So this definitely means we have a race after calling losetup -P. Is this the expected behavior from the kernel? How do we make this race-free? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
Failing build had kernel Kernel version: Linux bos03-riscv64-014 5.19.0-1021-generic #23~22.04.1-Ubuntu SMP Thu Jun 22 12:49:35 UTC 2023 riscv64 The build immediately before the first failure had kernel Kernel version: Linux riscv64-qemu-lgw01-069 5.13.0-1019-generic #21~20.04.1-Ubuntu SMP Thu Mar 24 22:36:01 UTC 2022 riscv64 So maybe this is a kernel regression? -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp
[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble
November 16 was 2 days after livecd-rootfs 24.04.4 landed in the noble release pocket, superseding 24.04.2. The code delta between 24.04.2 and 24.04.4 includes removal of support for "legacy" images (SUBPROJECT=legacy) which doesn't apply here; and some reorganization of code related to "preinstalled" images which could affect the riscv64+generic image, that is a preinstalled image using the cpc project, but there were no code changes touching any of the image partitioning code so it's unclear how those code changes could have introduced this bug. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to util-linux in Ubuntu. https://bugs.launchpad.net/bugs/2045586 Title: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble Status in linux package in Ubuntu: New Status in livecd-rootfs package in Ubuntu: New Status in util-linux package in Ubuntu: New Bug description: In mantic, we migrated livecd-rootfs to use losetup -P instead of kpartx, with the expectation that this would give us a reliable, race- free way of loop-mounting partitions from a disk image during image build. In noble, we are finding that it is no longer reliable, and in fact fails rather often. It is most noticeable with riscv64 builds, which is the architecture where we most frequently ran into problems before with kpartx. The first riscv64+generic build in noble where the expected loop partition device is not available is https://launchpad.net/~ubuntu- cdimage/+livefs/ubuntu/noble/cpc/+build/531790 The failure is however not unique to riscv64, and the autopkgtest for the latest version of livecd-rootfs (24.04.7) - an update that specifically tries to add more debugging code for this scenario - has also failed on ppc64el. https://autopkgtest.ubuntu.com/packages/l/livecd- rootfs/noble/ppc64el The first failure happened on November 16. While there has been an update to the util-linux package in noble, this did not land until November 23. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp