A couple of comments on the suggested path:

> Imho the sequency of commands should be:
> * take flock on the device, to neutralise udev

+1  on this approach. Do you know if the flock will block
systemd's inotify write watch on the block device which triggers
udevd?  This is the typical race we see with partition creation
and rules executing.

> * modify device with sfdisk
> * reread partitions tables (i would say with blockdev --rereadpt, rather than 
> partx/partprobe)

I'm not sure we can use blockdev --rereadpt as we are operating upon the
root disk we're booted on and my understanding is that the ioctl that
partx uses is the only way to update the kernel partition table while
the disk is in use, otherwise we'd see the normal warning message like
when you fdisk your booted device and it says the disk is busy and
cannot read the partition table.

> * release the flock

+1

> * udevadm trigger --action=add --wait device (or trigger && settle)

I don't relish the idea of *re-adding* actions on the disk again since the 
partx update
should have already emitted the uevents associated with the new partitions.  
However,
we could do this as a way to force reloading of everything.  I'd like to 
withhold
judgement on whether we need this after testing with use of flock on the device.

> And like have a canary "only use locked codepath on this region" such
that we can assert through testing that this no longer happens with new
code, but does with old code.

The change to cloud-utils growpart could add a flag (--use-flock) so
cloud-init could emit different log messages on which path it takes
(including a warning if we cannot use flock (ie, you may race).

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

Status in cloud-init:
  Incomplete
Status in cloud-utils:
  New
Status in linux-azure package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  On Azure, it happens regularly (20-30%), that cloud-init's growpart
  module fails to extend the partition to full size.

  Such as in this example:

  ========================================

  2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', 
'--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, 
capture=True)
  2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', 
'/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
  2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
  2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: 
init-network/config-growpart: FAIL: running config-growpart with frequency 
always
  2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 
'cloudinit.config.cc_growpart' from 
'/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in 
_run_modules
      freq=freq)
    File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
      results = functor(*args)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
351, in handle
      func=resize_devices, args=(resizer, devices))
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in 
log_time
      ret = func(*args, **kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
298, in resize_devices
      (old, new) = resizer.resize(disk, ptnum, blockdev)
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
159, in resize
      return (before, get_size(partdev))
    File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 
198, in get_size
      fd = os.open(filename, os.O_RDONLY)
  FileNotFoundError: [Errno 2] No such file or directory: 
'/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'

  ========================================

  @rcj suggested this is a race with udev. This seems to only happen on
  Cosmic and later.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to