Hello Dave, or anyone else affected, Accepted livecd-rootfs into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd- rootfs/2.525.39 in a few hours, and then in the -proposed repository.
Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping! N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days. ** Description changed: - We've just come across a situation where cloud-init races systemd to - mount a partition, with the result that either cloud-init fails to load - its seed, or a relatively important partition fails to mount. I'm not - entirely sure this is a bug - it could well be argued this is mis- - configuration on our part, and there's a trivial workaround, but I'll - leave the cloud-init devs to make that call. + [Impact] + + We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. + This can basically lead to situations where the device will not boot. + + [Test Case] + + Boot the image built using the bionic-proposed livecd-rootfs multiple + times and make sure it is booting up correctly. + + TODO: find a better test case? + + [Regression Potential] + + The change seems relatively safe and doesn't seem like there are any + obvious regressions it could cause, but one should be on a lookout for + any cloud-init weirdness or issues on boot-time. + + [Original Description] + + + We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. I'm not entirely sure this is a bug - it could well be argued this is mis-configuration on our part, and there's a trivial workaround, but I'll leave the cloud-init devs to make that call. The configuration is as follows: 1. On recent Ubuntu classic images for the Raspberry Pi, cloud-init's seed is located on the boot partition (/dev/mmcblk0p1, labelled "system- boot"). This is desirable as it's a straight-forward FAT partition, and thus easily accessible and editable on any OS (as opposed to the original seed location on the root ext4 partition). 2. Cloud-init is configured to look for its seed with "fs_label: system- boot" 3. Systemd is configured (via /etc/fstab) to mount /dev/disk/by-label /system-boot (/dev/mmcblk0p1) on /boot/firmware The result on boot-up is one of three situations: 1. cloud-init mounts /dev/mmcblk0p1 on a temp path. While attempting to read its seed, systemd attempts to mount /dev/mmcblk0p1 on /boot/firmware but fails because the device is already mounted elsewhere. cloud-init succeeds in reading its seed, umounts /dev/mmcblk0p1 and the system boots successfully, but /boot/firmware isn't mounted (leading to issues further down the line with other things like flash-kernel that expect it to be mounted). 2. cloud-init checks for /dev/mmcblk0p1 in /proc/mounts, and doesn't see it there. It goes to mount /dev/mmcblk0p1 on a temp path, but before it can do so, systemd mounts it on /boot/firmware. cloud-init's mount fails (device already mounted), so it fails to read its seed and uses a default config instead (which isn't entirely desirable for us as the default involves a long wait if ethernet is not connected; our default seed overrides that). 3. Everything works fine. This occurs when either a) systemd mounts /boot/firmware first, then cloud-init sees this mount in /proc/mounts and uses it or b) cloud-init mounts /dev/mmcblk0p1 on a temp path, reads its seed and unmounts it, *then* systemd mounts /boot/firmware. Here's the relevant snippet of traceback from when the second result occurs: ... 2020-01-14 20:48:01,337 - util.py[DEBUG]: Read 386 bytes from /etc/os-release 2020-01-14 20:48:01,338 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,348 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,357 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,415 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,432 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=SYSTEM-BOOT', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,448 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=system-boot', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True) 2020-01-14 20:48:01,466 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/mmcblk0p1 2020-01-14 20:48:01,467 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False) 2020-01-14 20:48:01,467 - util.py[DEBUG]: Read 2062 bytes from /proc/mounts 2020-01-14 20:48:01,468 - util.py[DEBUG]: Fetched {'sysfs': {'fstype': 'sysfs', 'mountpoint': '/sys', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'proc': {'fstype': 'proc', 'mountpoint': '/proc', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'udev': {'fstype': 'devtmpfs', 'mountpoint': '/dev', 'opts': 'rw,nosuid,relatime,size=1913472k,nr_inodes=156992,mode=755'}, 'devpts': {'fstype': 'devpts', 'mountpoint': '/dev/pts', 'opts': 'rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000'}, 'tmpfs': {'fstype': 'tmpfs', 'mountpoint': '/etc/machine-id', 'opts': 'ro,relatime,size=391432k,mode=755'}, '/dev/mmcblk0p2': {'fstype': 'ext4', 'mountpoint': '/', 'opts': 'rw,relatime'}, 'securityfs': {'fstype': 'securityfs', 'mountpoint': '/sys/kernel/security', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'cgroup': {'fstype': 'cgroup', 'mountpoint': '/sys/fs/cgroup/freezer', 'opts': 'rw,nosuid,nodev,noexec,relatime,freezer'}, 'pstore': {'fstype': 'pstore', 'mountpoint': '/sys/fs/pstore', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'systemd-1': {'fstype': 'autofs', 'mountpoint': '/proc/sys/fs/binfmt_misc', 'opts': 'rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1728'}, 'debugfs': {'fstype': 'debugfs', 'mountpoint': '/sys/kernel/debug', 'opts': 'rw,relatime'}, 'mqueue': {'fstype': 'mqueue', 'mountpoint': '/dev/mqueue', 'opts': 'rw,relatime'}, 'tracefs': {'fstype': 'tracefs', 'mountpoint': '/sys/kernel/debug/tracing', 'opts': 'rw,relatime'}, 'configfs': {'fstype': 'configfs', 'mountpoint': '/sys/kernel/config', 'opts': 'rw,relatime'}, 'fusectl': {'fstype': 'fusectl', 'mountpoint': '/sys/fs/fuse/connections', 'opts': 'rw,relatime'}} mounts from proc 2020-01-14 20:48:01,468 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] with allowed return codes [0] (shell=False, capture=True) 2020-01-14 20:48:01,529 - util.py[DEBUG]: Failed mount of '/dev/mmcblk0p1' as 'auto': Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - - Stdout: + Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,530 - util.py[WARNING]: Failed to mount /dev/mmcblk0p1 when looking for data 2020-01-14 20:48:01,533 - util.py[DEBUG]: Failed to mount /dev/mmcblk0p1 when looking for data Traceback (most recent call last): - File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data - pp2d_kwargs) - File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb - (device, tmpd, failure_reason)) + File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data + pp2d_kwargs) + File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb + (device, tmpd, failure_reason)) cloudinit.util.MountFailedError: Failed mounting /dev/mmcblk0p1 to /run/cloud-init/tmp/tmp2z8rueu4 due to: Unexpected error while running command. Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] Exit code: 32 Reason: - - Stdout: + Stdout: Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware. 2020-01-14 20:48:01,549 - __init__.py[DEBUG]: Datasource DataSourceNoCloud [seed=None][dsmode=net] not updated for events: New instance first boot 2020-01-14 20:48:01,549 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud 2020-01-14 20:48:01,549 - main.py[DEBUG]: No local datasource found ... The relevant code is mount_cb() in util.py which is calling mounts() to read /proc/mounts then attempting to mount the device when it's not found (leading to a classic race between the check and the action). It might be argued that it should simply attempt the mount, and check /proc/mounts in the event of failure, but I'm wary that mounting FS' is a potentially costly exercise (can the seed be on a remote mount?), and there's probably edge cases here I'm unaware of (is it possible, perhaps dangerous, to double-mount certain devices?). Perhaps it could be adjusted to check /proc/mounts, attempt the mount, then *re-check* /proc/mounts in the case of failure? Nevertheless, there's a trivial work-around in our case: just add an override for "RequiresMountFor=/boot/firmware" to our systemd config for cloud-init-local, thus ensuring the mount is definitely available prior to cloud-init looking for its seed. This in turn argues the issue could simply be viewed as mis-configuration on our part: we *know* the seed is on /dev/mmcblk0p1 as that's where we've configured it to be; should we not also configure cloud-init in such a way that we guarantee the device containing its seed is definitely mounted prior to it running? ** Changed in: cloud-init (Ubuntu) Status: Triaged => Fix Committed ** Changed in: cloud-init (Ubuntu) Status: Fix Committed => Invalid ** Also affects: livecd-rootfs (Ubuntu) Importance: Undecided Status: New ** Changed in: livecd-rootfs (Ubuntu) Status: New => Fix Committed ** Also affects: livecd-rootfs (Ubuntu Bionic) Importance: Undecided Status: New ** Also affects: cloud-init (Ubuntu Bionic) Importance: Undecided Status: New ** Changed in: livecd-rootfs (Ubuntu Bionic) Status: New => In Progress ** Changed in: livecd-rootfs (Ubuntu Bionic) Importance: Undecided => Critical ** Changed in: livecd-rootfs (Ubuntu) Importance: Undecided => High ** Changed in: livecd-rootfs (Ubuntu Bionic) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-bionic -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1860046 Title: Race to mount seed device To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1860046/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
