> Ah, no I think this might be along right lines: udev is calling blkid
> on the _partition_ of course, so it can probe for filesystem etc without
> looking at the partition table. After it's done that, it does look for
> the partition table so it can read the ID_PART_ENTRY_* values from it,
> but if it fails to load the partition table it just gives up and still
> returns success.

Ah so this was in the right direction, but not completely right: the
failure is not in reading the partition table of the disk being probed,
but failing to figure out which entry in that table corresponds to the
partition being probed:

Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: 
LOWPROBE: trying to convert devno 0x811 to partition
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: 
LOWPROBE: searching by offset/size
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: 
LOWPROBE: not found partition for device
Feb 06 00:37:42 test-xrdpdnvfctsofyygmzan systemd-udevd[556]: 556: libblkid: 
LOWPROBE: parts: end probing for partition entry [nothing]

The function of interest in libblkid here is
blkid_partlist_devno_to_partition, which (unless some apis have very
misleading names) is reading the offset and size of the partition from
sysfs. What must be happening here is that udev sees the disk being
closed by gdisk and somehow runs the builting blkid command on the
partition before the kernel has been informed of the resize of the
partition. And indeed, we can see when the kernel notices this:

Feb 06 00:37:43 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resizing 
filesystem from 548091 to 7836155 blocks
Feb 06 00:37:44 test-xrdpdnvfctsofyygmzan kernel: EXT4-fs (sdb1): resized 
filesystem to 7836155

At least a second later. (In passing_journalctl_output.txt, this message
is printed before udev even gets the inotify event about sdb being
closed).

So there's always a race here but I don't know why we only see this on
Azure with newer releases. A proper fix would be to get sgdisk to call
partx_resize_partition before closing the disk but it would also be
interesting to see why it takes so long to get to that part sometimes.
growpart is too much shell for me, but maybe it's possible to get it to
run partx under strace and get the output of that out somehow?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to