[N.B. I wrote the below before I saw Ryan's comment, so there is some repetition.]
OK, I've spent some time catching up on this properly so I can summarise: per comment #24, the issue is that when udev processes the events emitted by the kernel, it (sometimes) doesn't determine the correct partition information. The kernel _does_ emit all the events we would expect, and udev _does_ handle all the events we would expect (which is to say that `udevadm settle` doesn't change behaviour here, it merely ensures that the broken behaviour has completed before we proceed). The hypothesised race condition is somewhere between the kernel and udev: I believe the kernel event is emitted before the partition table has necessarily been fully updated so when udev processes the event and reads the partition table, sometimes it finds the partition and sometimes it doesn't. To be clear, the kernel event generation and the buggy udev event handling all happens as a result of the resize command, _not_ as a result of anything else cloud-init runs subsequently. So as far as I can tell, this bug would occur regardless of what runs the resize command, and no matter what commands are executed after the resize command. (It might be possible to work around this bug by issuing commands that force a re-read of the partition table on a disk, for example, but this bug _would_ still have occurred before then.) cloud-init could potentially work around a (kernel|systemd) that isn't handling partitions correctly, but we really shouldn't have to. Until we're satisfied that they cannot actually be fixed, we shouldn't do that. (I am _not_ convinced that this cannot be fixed in (the kernel|systemd), because using a different kernel and using a different udevadm have both caused the issue to stop reproducing.) So, let me be a little more categorical. The information we have at the moment indicates an issue in the interactions between the kernel and udev on partition resize. cloud-init's involvement is merely as the initiator of that resize. Until we have more information that indicates the issue to be in cloud-init, this isn't a valid cloud-init issue. Once we have more information from the kernel and/or systemd folks, if it indicates that cloud-init _is_ at fault, please move this back to New. -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to systemd in Ubuntu. https://bugs.launchpad.net/bugs/1834875 Title: cloud-init growpart race with udev Status in cloud-init: Incomplete Status in systemd package in Ubuntu: New Bug description: On Azure, it happens regularly (20-30%), that cloud-init's growpart module fails to extend the partition to full size. Such as in this example: ======================================== 2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True) 2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds 2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always 2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed 2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed Traceback (most recent call last): File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules freq=freq) File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run return self._runners.run(name, functor, args, freq, clear_on_fail) File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run results = functor(*args) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle func=resize_devices, args=(resizer, devices)) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time ret = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices (old, new) = resizer.resize(disk, ptnum, blockdev) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize return (before, get_size(partdev)) File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size fd = os.open(filename, os.O_RDONLY) FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3' ======================================== @rcj suggested this is a race with udev. This seems to only happen on Cosmic and later. To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp