[N.B. I wrote the below before I saw Ryan's comment, so there is some
repetition.]

OK, I've spent some time catching up on this properly so I can
summarise: per comment #24, the issue is that when udev processes the
events emitted by the kernel, it (sometimes) doesn't determine the
correct partition information.  The kernel _does_ emit all the events we
would expect, and udev _does_ handle all the events we would expect
(which is to say that `udevadm settle` doesn't change behaviour here, it
merely ensures that the broken behaviour has completed before we
proceed).  The hypothesised race condition is somewhere between the
kernel and udev: I believe the kernel event is emitted before the
partition table has necessarily been fully updated so when udev
processes the event and reads the partition table, sometimes it finds
the partition and sometimes it doesn't.  To be clear, the kernel event
generation and the buggy udev event handling all happens as a result of
the resize command, _not_ as a result of anything else cloud-init runs
subsequently.

So as far as I can tell, this bug would occur regardless of what runs
the resize command, and no matter what commands are executed after the
resize command.  (It might be possible to work around this bug by
issuing commands that force a re-read of the partition table on a disk,
for example, but this bug _would_ still have occurred before then.)

cloud-init could potentially work around a (kernel|systemd) that isn't
handling partitions correctly, but we really shouldn't have to.  Until
we're satisfied that they cannot actually be fixed, we shouldn't do
that.  (I am _not_ convinced that this cannot be fixed in (the
kernel|systemd), because using a different kernel and using a different
udevadm have both caused the issue to stop reproducing.)

So, let me be a little more categorical.  The information we have at the
moment indicates an issue in the interactions between the kernel and
udev on partition resize.  cloud-init's involvement is merely as the
initiator of that resize.  Until we have more information that indicates
the issue to be in cloud-init, this isn't a valid cloud-init issue.
Once we have more information from the kernel and/or systemd folks, if
it indicates that cloud-init _is_ at fault, please move this back to
New.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in systemd package in Ubuntu:
  New

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  ========================================

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', 
'--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, 
capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', 
'/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: 
init-network/config-growpart: FAIL: running config-growpart with frequency 
always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in 
_run_modules
      freq=freq)
    File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
      results = functor(*args)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
351, in handle
      func=resize_devices, args=(resizer, devices))
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in 
log_time
      ret = func(*args, **kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
298, in resize_devices
      (old, new) = resizer.resize(disk, ptnum, blockdev)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
159, in resize
      return (before, get_size(partdev))
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
198, in get_size
      fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: 
'/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  ========================================

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to