On Mon, Aug 26, 2019 at 4:05 AM Tobias Koch <1834...@bugs.launchpad.net>
wrote:

> > (Odds are that whatever causes it to be recreated later in boot would be
> > blocked by cloud-init waiting.)
>
> But that's not happening. The instance does boot normally, the only
> service degraded is cloud-init and there is no significant delay either.
>

> So conversely, if I put a loop into cloud-init and just waited on the
> symlink to appear and if that worked with minimal delay, would that
> refute the above?
>

That's still a workaround for something we don't exactly know why is racing
nor why this isn't more widespread.  The code in cloud-init and growpart,
sgdisk
and partx are stable (the code has not changed significantly much in some
time).

We don't have root cause for the race at this time.  When cloud-init
invokes growpart
the symlink exists, and when growpart returns sometimes it does not.  If
anything growpart
should address the race itself; and at this point, it would have to pickup
a workaround as well.

Let's at least make sure we understand the actual race before we look
further into workarounds.

>From what I can see in what growpart is doing, the sgdisk command will
clear the partition tables (this involves removing the partition and then
re-adding it, which triggers udev.  Further, Dan's show that partx --update
can also trigger a remove and an add.  Looking at the partx update code;
*sometimes* it will remove and add, however, if the partition to be updated
*exists* then it will instead issue an update IOCTL which only updates the
size value in sysfs.

https://github.com/karelzak/util-
linux/blob/53ae7d60cfeacd4e87bfe6fcc015b58b78ef4555/disk-
utils/partx.c#L451

Which makes me think that in the successful path, we're seeing partx
--update take the partx_resize_partition path, which submits the resize
IOCTL

https://github.com/karelzak/util-
linux/blob/917f53cf13c36d32c175f80f2074576595830573/include/partx.h#L54

which in linux kernel does:

https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L100

and just updates the size value in sysfs:

https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L146

which AFAICT does not emit any new uevents;


Lastly, in either path (partx updates vs partx removes/adds);  invoking a
udevadm settle after the binary has exited is the reasonable way to ensure
that *if* any uevents were created, that they are processed.

growpart could add udevadm settle code; so could cloud-init.  We actually
did that in our first test package and that did not result in ensuring the
symlink was present.

All of this suggests to me that *something* isn't processing the sequence
of uevents in such a way that the once they've all been processed we have
the symlink.

We must be missing some other bit of information in the failing path where
the symlink is eventually recreated (possibly due to some other write or
close on the disk on the disk which re-triggers rules).



> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1834875
>
> Title:
>   cloud-init growpart race with udev
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1834875

Title:
  cloud-init growpart race with udev

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to