Public bug reported: With the current Ubuntu resolute server dailies for Raspberry Pi, cloud- init sometimes fails to use any network resources. This bug is a bit of a scatter-gun bug as I think this is related to several changes, and may need several workarounds. But first:
[ Reproduction ] * With a Raspberry Pi 4 preferably; the Pi 5 *sometimes* exhibits this problem, sometimes doesn't if it's quick enough to do certain things, but the 4 reproduces this *fairly* reliably * Grab the current resolute daily for raspi: * wget https://cdimage.ubuntu.com/ubuntu-server/daily-preinstalled/current/resolute-preinstalled-server-arm64+raspi.img.xz * Flash the image to a spare SD card (substitute /dev/SDCARD with your card reader): * xzcat resolute-preinstalled-server-arm64+raspi.img.xz | dd of=/dev/SDCARD bs=1M status=progress * Mount the boot partition to edit the cloud-init configuration (substitute /dev/SDCARD as needed) * mkdir /mnt/boot * mount /dev/SDCARDp1 /mnt/boot * Edit /mnt/boot/network-config to the following (mandatory ethernet): network: version: 2 ethernets: eth0: dhcp4: true optional: false * Edit /mnt/boot/user-data to the following (adjust capitalized bits as needed) #cloud-config ssh_import_id: - lp:YOUR_LAUNCHPAD_ID package_update: true package_upgrade: true packages: - avahi-daemon * umount /mnt/boot * Place the card in the Pi 4, ensure ethernet is connected, and boot the machine * Login with ubuntu/ubuntu (and change password) * cat .ssh/authorized_keys * Observe that SSH keys are *not* pulled in for the default user * apt policy avahi-daemon * Observe that avahi-daemon is *not* installed by cloud-init One or both of the failures above might not happen depending on how rapidly some things come up. [ Causes ] Firstly, the network configuration above *should* require that cloud- init does not proceed until network connectivity has been established ("routable" in networkd parlance). It appears that netplan *tries* to ensure this is the case by injecting the following into systemd- networkd-wait-online: $ systemctl cat systemd-networkd-wait-online | tail -9 # /run/systemd/generator.late/systemd-networkd-wait-online.service.d/10-netplan.conf [Unit] ConditionPathIsSymbolicLink=/run/systemd/generator/network-online.target.wants/systemd-networkd-wait-online.service After=systemd-resolved.service [Service] ExecStart= ExecStart=/lib/systemd/systemd-networkd-wait-online -i eth0:degraded ExecStart=/lib/systemd/systemd-networkd-wait-online --any --dns -o routable -i eth0 However, systemd-networkd-wait-online gets skipped entirely because a condition is not met: $ systemctl status systemd-networkd-wait-online | tail -1 Jan 19 21:30:43 ubuntu systemd[1]: systemd-networkd-wait-online.service - Wait for Network to be Online was skipped because of an unmet condition check (ConditionPathExists=/run/networkd/initrd/neednet). I've not been able to find where specifically this condition is defined (it doesn't appear anywhere in the cat output for the service, or in network-online.target), but it appears that neednet is something dracut introduced (bear in mind we only made the switch to dracut on the Pi images in resolute; questing still used initramfs-tools). Re-flashing the image, and adding rd.neednet to the kernel command line (current/cmdline.txt on the boot partition) causes systemd-networkd- wait-online to wait for connectivity as it used to. But now another problem rears its head, causing network operations to fail: the Pi has no RTC (well, before the Pi 5, anyway) so all TLS connections fail until the system clock is set accurately. This is handled by chrony, which we also used in questing, but now it seems to be expecting some TLS checks to work as well: $ journalctl --unit chrony | head -20 Jan 19 21:31:09 miss-piggy systemd[1]: Starting chrony.service - chrony, an NTP client/server... Jan 19 21:31:10 miss-piggy chronyd[1589]: chronyd version 4.8 starting (+CMDMON +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +NTS +SECHASH +IPV6 -DEBUG) Jan 19 21:31:10 miss-piggy chronyd[1589]: Loaded 0 symmetric keys Jan 19 21:31:10 miss-piggy chronyd[1589]: Using leap second list /usr/share/zoneinfo/leap-seconds.list Jan 19 21:31:10 miss-piggy chronyd[1589]: Loaded seccomp filter (level 1) Jan 19 21:31:10 miss-piggy systemd[1]: Started chrony.service - chrony, an NTP client/server. Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 1.ntp.ubuntu.com Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 2.ntp.ubuntu.com Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 3.ntp.ubuntu.com Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 4.ntp.ubuntu.com Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool ntp-bootstrap.ubuntu.com Jan 19 21:31:13 miss-piggy chronyd[1589]: TLS handshake with [2620:2d:4000:1::2123]:4460 (1.ntp.ubuntu.com) failed : Error in the certificate verification. The certificate is NOT trusted. The certificate chain uses not yet valid certificate. Jan 19 21:31:13 miss-piggy chronyd[1589]: TLS handshake with [2620:2d:4000:1::3123]:4460 (2.ntp.ubuntu.com) failed : Error in the certificate verification. The certificate is NOT trusted. The certificate chain uses not yet valid certificate. Jan 19 21:31:19 miss-piggy chronyd[1589]: TLS handshake with [2620:2d:4002:1::3123]:4460 (4.ntp.ubuntu.com) failed : Error in the certificate verification. The certificate is NOT trusted. The certificate chain uses not yet valid certificate. Jan 19 21:31:20 miss-piggy chronyd[1589]: Selected source 2620:2d:4000:1::1123 (ntp-bootstrap.ubuntu.com) Jan 19 21:31:20 miss-piggy chronyd[1589]: System clock wrong by 4899846.142287 seconds Mar 17 14:35:26 miss-piggy chronyd[1589]: System clock was stepped by 4899846.142287 seconds Mar 17 14:35:26 miss-piggy chronyd[1589]: System clock TAI offset set to 37 seconds Mar 17 14:35:35 miss-piggy chronyd[1589]: NTS-KE session with [2620:2d:4002:1::2123]:4460 (3.ntp.ubuntu.com) timed out Mar 17 14:35:40 miss-piggy chronyd[1589]: Selected source 185.125.190.122 (1.ntp.ubuntu.com) Note chrony starts at 21:31:09 (according to the system clock), fails to validate various certificates (because the certificate is "not yet valid", because the system clock is way out), and it's 21:31:20 before it's figured out how wrong the system clock is, and steps it forward. I have a feeling this is another thing affected by the switch to dracut given that initramfs-tools used to have a "fixrtc" workaround which set the system clock from the last fs mount timestamp when it was in advance of the system clock. Perhaps a similar workaround is required in dracut. Finally, just adding rd.neednet into the kernel command line isn't a workaround since, in the situation where all interfaces are undefined or optional, this then causes systemd-networkd-wait-online to wait for several minutes on every boot. This didn't happen under questing and I'm still trying to understand the differences here: under questing with all interfaces optional / undefined, the systemd-networkd-wait-online service is simply disabled; with mandatory interfaces it becomes "enabled-runtime". It seems in resolute it's always "enabled" but normally inhibited by neednet. Whether this is a difference in netplan or systemd's behaviour I'm still digging into. ** Affects: cloud-init (Ubuntu) Importance: Undecided Status: New ** Affects: dracut (Ubuntu) Importance: Undecided Status: New ** Affects: netplan.io (Ubuntu) Importance: Undecided Status: New ** Affects: systemd (Ubuntu) Importance: Undecided Status: New ** Also affects: dracut (Ubuntu) Importance: Undecided Status: New ** Also affects: netplan.io (Ubuntu) Importance: Undecided Status: New ** Also affects: systemd (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2144891 Title: Rpi resolute dailies fail to use network resources To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2144891/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
