[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2024-01-22 Thread dann frazier
I ran the above test:
  
https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240123_035147_6470b@/log.gz

It does appear that systemd-udevd is trying to scan partitions at the
same time as losetup:

1599s ++ losetup --show -f -P -v binary/boot/disk-uefi.ext4
1600s + loop_device=/dev/loop0
1600s + '[' '!' -b /dev/loop0 ']'
1600s + rootfs_dev_mapper=/dev/loop0p1
1600s + '[' '!' -b /dev/loop0p1 ']'
1600s + echo '/dev/loop0p1 is not a block device'
1600s /dev/loop0p1 is not a block device
1600s + echo '=== dmesg ==='
1600s === dmesg ===
1600s + dmesg -c
1600s [  986.014824] EXT4-fs (loop0p1): mounted filesystem with ordered data 
mode. Opts: (null). Quota mode: none.
1600s [  992.684380] EXT4-fs (loop0p1): mounted filesystem with ordered data 
mode. Opts: (null). Quota mode: none.
1600s [ 1043.171603] loop0: detected capacity change from 0 to 4612096
1600s [ 1043.171924]  loop0: p1 p14 p15
1600s [ 1043.190421]  loop0: p1 p14 p15
1600s + cat /sys/kernel/debug/tracing/trace
1600s # tracer: function
1600s #
1600s # entries-in-buffer/entries-written: 2/2   #P:4
1600s #
1600s #_-=> irqs-off
1600s #   / _=> need-resched
1600s #  | / _---=> hardirq/softirq
1600s #  || / _--=> preempt-depth
1600s #  ||| / _-=> migrate-disable
1600s #   / delay
1600s #   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
1600s #  | | |   | | |
1600s  losetup-50167   [002] .  1043.176845: bdev_disk_changed 
<-loop_reread_partitions
1600ssystemd-udevd-321 [000] .  1043.195003: bdev_disk_changed 
<-blkdev_get_whole
1600s + echo 0
1600s + ls -l /dev/loop0p1
1600s brw--- 1 root root 259, 3 Jan 23 03:51 /dev/loop0p1
1600s + exit 1
1600s + clean_loops


Maybe we just need something like this?

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 48c530b83000e..52fda87f5d674 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1366,13 +1366,13 @@ static int loop_configure(struct loop_device *lo, 
fmode_t mode,
if (partscan)
lo->lo_disk->flags &= ~GENHD_FL_NO_PART_SCAN;
 
-   /* enable and uncork uevent now that we are done */
-   dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
-
loop_global_unlock(lo, is_loop);
if (partscan)
loop_reread_partitions(lo);
 
+   /* enable and uncork uevent now that we are done */
+   dev_set_uevent_suppress(disk_to_dev(lo->lo_disk), 0);
+
if (!(mode & FMODE_EXCL))
bd_abort_claiming(bdev, loop_configure);

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2024-01-22 Thread dann frazier
I ran into this on jammy/amd64:
https://autopkgtest.ubuntu.com/results/autopkgtest-
jammy/jammy/amd64/l/livecd-rootfs/20240121_173406_e4f9a@/log.gz

I downloaded all of the amd64 failures and searched for this failure
pattern. These were the kernels that were running at the time:

"Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023"
"Linux 6.2.0-21-generic #21-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 14 12:34:02 UTC 
2023"
"Linux 6.3.0-7-generic #7-Ubuntu SMP PREEMPT_DYNAMIC Thu Jun  8 16:02:30 UTC 
2023"
"Linux 6.5.0-9-generic #9-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct  7 01:35:40 UTC 
2023"
"Linux 6.6.0-14-generic #14-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 30 10:27:29 UTC 
2023"

Here's the count of failures per image type:
 12 017-disk-image-uefi.binary
  3 018-disk-image.binary
  3 020-kvm-image.binary
  1 023-vagrant.binary
  1 024-vagrant.binary

I can confirm that /dev/loop0p1 is created by devtmpfs. This surprised
me because I'd never actually need to know what devtmpfs was, and I saw
devices being created even though I had SIGSTOP'd systemd-udevd. But
watching udevadm monitor and forktrace output convinced me.

I had a theory that something was opening the first created partition before 
all partitions were created. loop_reread_partitions() can fail without 
returning an error to userspace:
 https://elixir.bootlin.com/linux/v5.15.147/source/drivers/block/loop.c#L676

that could happen if bdev_disk_changed() aborts because it finds another 
partition on the device is open:
 https://elixir.bootlin.com/linux/v5.15.147/source/block/partitions/core.c#L662

But then we should see this in dmesg:
  pr_warn("%s: partition scan of loop%d (%s) failed (rc=%d)\n"

I added dmesg calls to check that:
https://autopkgtest.ubuntu.com/results/autopkgtest-jammy-dannf-test/jammy/amd64/l/livecd-rootfs/20240122_161631_62ecd@/log.gz

.. but no such message appeared, so that's not it. But what *is*
interesting there is that it shows *2* partition scan lines:

1248s [  990.855361] loop0: detected capacity change from 0 to 4612096
1248s [  990.855628]  loop0: p1 p14 p15
1248s [  990.874241]  loop0: p1 p14 p15

Previously we just saw 1:

1189s [  932.268459] loop0: detected capacity change from 0 to 4612096
1189s [  932.268715]  loop0: p1 p14 p15

That only gets printed when bdev_disk_changed() is called. So do we have
2 racing callers?

One thing that seems off is that loop_configure() unsuppresses uevents
for the full device before the partition scan, but loop_change_fd()
waits until the partition scan is complete. Shouldn't they be following
the same pattern? I wonder if that could cause the following race:

[livecd-rootfs] losetup creates /dev/loop0
[livecd-rootfs] kernel sends uevent for /dev/loop0
[livecd-rootfs] /dev/loop0p* appear in devtmpfs
  [udev] receives uevent for loop0
  [udev] partprobe /dev/loop0
[livecd-rootfs] losetup exit(0)
  [partprobe] /dev/loop0p* cleared
[livecd-rootfs] check for /dev/loop0p1 FAILS
  [partprobe] /dev/loop0p* recreated

I tried checking for this using ftrace in a local jammy VM. I haven't been able 
to reproduce this in a local VM, but I wanted to see what happens in a normal 
losetup.. er... setup.
 
> First I used losetup to create the device:

root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# loopdev="$(losetup 
--show -f -P -v /home/ubuntu/disk.img)"
root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 1/1   #P:1
#
#_-=> irqs-off
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / _-=> migrate-disable
#   / delay
#   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
#  | | |   | | |
 losetup-1996[000] .   657.573994: bdev_disk_changed 
<-loop_reread_partitions

> Only the expected bdev_disk_change() call
> Then I remove the device:

root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# losetup -v -d $loopdev
root@dannf-livecd-rootfs-debug:/sys/kernel/debug/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 3/3   #P:1
#
#_-=> irqs-off
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / _-=> migrate-disable
#   / delay
#   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
#  | | |   | | |
 losetup-1996[000] .   657.573994: bdev_disk_changed 
<-loop_reread_partitions
   systemd-udevd-2524[000] .   680.555336: bdev_disk_changed 
<-blkdev_get_whole
 

Re: [Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-11 Thread Michael Hudson-Doyle
> Is only asking kernel to scan the device; to then generate "kernel udev"
> events; for then udev to wakeup and process/emit "udev udev" events; and
> create the required device nodes.
>

It's not udev that creates nodes like /dev/loop1p1 though is it? That's
devtmpfs surely.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-11 Thread Emil Renner Berthing
I don't have a good explanation, but in the past I've "fixed" such races
by adding a `sync "$loop_device"` before using any of the newly created
partitons. Maybe it's worth trying.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-11 Thread Dimitri John Ledkov
my expectation is that udev should be running (somewhere, not sure if it
needs to be both the host and the lxd guest) and that it should process
the device using locks https://systemd.io/BLOCK_DEVICE_LOCKING/.

After that is done, the device should be safe to operate on, in a
consistent manner.

After all,

   -P, --partscan
   Force the kernel to scan the partition table on a newly created
   loop device. Note that the partition table parsing depends on
   sector sizes. The default is sector size is 512 bytes, otherwise
   you need to use the option --sector-size together with --partscan.

Is only asking kernel to scan the device; to then generate "kernel udev"
events; for then udev to wakeup and process/emit "udev udev" events; and
create the required device nodes.

We have always been fixing and supporting running udev inside the lxd
containers, because of such things (in contexts of priviledged
containers, but outside of lp-buildd) to make all of this work.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-09 Thread Steve Langasek
Oh.  To the question of whether there was a systemd change in this
window: yes absolutely, because this is the point at which the riscv64
builders moved from lgw manually-operated qemu with a 20.04 guest image,
to bos03 openstack-operated qemu with a 22.04 guest image.

Which is also why we've moved from 5.13.0-1019-generic to
5.19.0-1021-generic.

But again, it was my understanding that these devices are supposed to be
created synchronously WITHOUT the involvement of udev.  In fact, we had
to make launchpad-buildd changes to make use of these devices at all
because udev would NOT set them up for us.

So if these are now being set up via udev, that's a significant
departure from expectations and it's not clear we even CAN have
synchronous behavior given that they would be set up by the host udev
and not the udev in the lxd container!

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


Re: [Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-09 Thread Steve Langasek
On Sat, Dec 09, 2023 at 05:13:28PM -, Andy Whitcroft wrote:
> Was there any systemd/udev change in this timeframe?  As the device
> files are very much connected to those.

My understanding is that these devices are supposed to be created directly
by the kernel on devtmpfs and NOT via udev, which is part of how we expected
to fix the earlier races.

And systemd did not change in this time frame in any release.  If there was
a change to the HOST udev in this timeframe causing a regression because a
new base image was published that includes a newer udev, we don't have
visibility on it.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-09 Thread Andy Whitcroft
Was there any systemd/udev change in this timeframe?  As the device
files are very much connected to those.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-08 Thread Steve Langasek
https://launchpad.net/~ubuntu-
cdimage/+livefs/ubuntu/noble/cpc/+build/544490 is a log from a build
with a new livecd-rootfs that spits out more debugging info on failure.

+ sgdisk binary/boot/disk-uefi.ext4 --print
Disk binary/boot/disk-uefi.ext4: 9437184 sectors, 4.5 GiB
Sector size (logical): 512 bytes
Disk identifier (GUID): CD1DD3AE-E4C8-4C5F-BD64-9236C39B9824
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 9437150
Partitions will be aligned on 2-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)End (sector)  Size   Code  Name
   1  235520 9437150   4.4 GiB 8300  
  12  227328  235519   4.0 MiB 8300  CIDATA
  13  342081   1024.0 KiB    loader1
  142082   10239   4.0 MiB   loader2
  15   10240  227327   106.0 MiB   EF00  
+ mount_image binary/boot/disk-uefi.ext4 1
+ trap clean_loops EXIT
+ backing_img=binary/boot/disk-uefi.ext4
+ local rootpart=1
++ losetup --show -f -P -v binary/boot/disk-uefi.ext4
+ loop_device=/dev/loop5
+ '[' '!' -b /dev/loop5 ']'
+ rootfs_dev_mapper=/dev/loop5p1
+ '[' '!' -b /dev/loop5p1 ']'
+ echo '/dev/loop5p1 is not a block device'
/dev/loop5p1 is not a block device
+ ls -l /dev/loop5p1 /dev/loop5p12
brw--- 1 root root 259, 2 Dec  9 04:16 /dev/loop5p1
brw--- 1 root root 259, 3 Dec  9 04:16 /dev/loop5p12
+ exit 1

This clearly shows that:
- there are 5 partitions on the image being passed to losetup
- after losetup exits, /dev/loop5p1 is not present
- after this check fails, an ls of /dev/loop5p* shows devices present for two 
of the partitions - including /dev/loop5p1 that we were looking for in the 
first place - but not all 5.

So this definitely means we have a race after calling losetup -P.

Is this the expected behavior from the kernel?  How do we make this
race-free?

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-04 Thread Steve Langasek
Failing build had kernel

Kernel version: Linux bos03-riscv64-014 5.19.0-1021-generic
#23~22.04.1-Ubuntu SMP Thu Jun 22 12:49:35 UTC 2023 riscv64

The build immediately before the first failure had kernel

Kernel version: Linux riscv64-qemu-lgw01-069 5.13.0-1019-generic
#21~20.04.1-Ubuntu SMP Thu Mar 24 22:36:01 UTC 2022 riscv64

So maybe this is a kernel regression?

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2045586] Re: livecd-rootfs uses losetup -P for theoretically reliable/synchronous partition setup but it's not reliable in noble

2023-12-04 Thread Steve Langasek
November 16 was 2 days after livecd-rootfs 24.04.4 landed in the noble
release pocket, superseding 24.04.2.

The code delta between 24.04.2 and 24.04.4 includes removal of support
for "legacy" images (SUBPROJECT=legacy) which doesn't apply here; and
some reorganization of code related to "preinstalled" images which could
affect the riscv64+generic image, that is a preinstalled image using the
cpc project, but there were no code changes touching any of the image
partitioning code so it's unclear how those code changes could have
introduced this bug.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to util-linux in Ubuntu.
https://bugs.launchpad.net/bugs/2045586

Title:
  livecd-rootfs uses losetup -P for theoretically reliable/synchronous
  partition setup but it's not reliable in noble

Status in linux package in Ubuntu:
  New
Status in livecd-rootfs package in Ubuntu:
  New
Status in util-linux package in Ubuntu:
  New

Bug description:
  In mantic, we migrated livecd-rootfs to use losetup -P instead of
  kpartx, with the expectation that this would give us a reliable, race-
  free way of loop-mounting partitions from a disk image during image
  build.

  In noble, we are finding that it is no longer reliable, and in fact
  fails rather often.

  It is most noticeable with riscv64 builds, which is the architecture
  where we most frequently ran into problems before with kpartx.  The
  first riscv64+generic build in noble where the expected loop partition
  device is not available is

https://launchpad.net/~ubuntu-
  cdimage/+livefs/ubuntu/noble/cpc/+build/531790

  The failure is however not unique to riscv64, and the autopkgtest for
  the latest version of livecd-rootfs (24.04.7) - an update that
  specifically tries to add more debugging code for this scenario - has
  also failed on ppc64el.

https://autopkgtest.ubuntu.com/packages/l/livecd-
  rootfs/noble/ppc64el

  The first failure happened on November 16.  While there has been an
  update to the util-linux package in noble, this did not land until
  November 23.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2045586/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp