Public bug reported:

With the current Ubuntu resolute server dailies for Raspberry Pi, cloud-
init sometimes fails to use any network resources. This bug is a bit of
a scatter-gun bug as I think this is related to several changes, and may
need several workarounds. But first:

[ Reproduction ]

* With a Raspberry Pi 4 preferably; the Pi 5 *sometimes* exhibits this problem, 
sometimes doesn't if it's quick enough to do certain things, but the 4 
reproduces this *fairly* reliably
* Grab the current resolute daily for raspi:
* wget 
https://cdimage.ubuntu.com/ubuntu-server/daily-preinstalled/current/resolute-preinstalled-server-arm64+raspi.img.xz
* Flash the image to a spare SD card (substitute /dev/SDCARD with your card 
reader):
* xzcat resolute-preinstalled-server-arm64+raspi.img.xz | dd of=/dev/SDCARD 
bs=1M status=progress
* Mount the boot partition to edit the cloud-init configuration (substitute 
/dev/SDCARD as needed)
* mkdir /mnt/boot
* mount /dev/SDCARDp1 /mnt/boot
* Edit /mnt/boot/network-config to the following (mandatory ethernet):

network:
  version: 2
  ethernets:
    eth0:
      dhcp4: true
      optional: false

* Edit /mnt/boot/user-data to the following (adjust capitalized bits as
needed)

#cloud-config

ssh_import_id:
- lp:YOUR_LAUNCHPAD_ID
package_update: true
package_upgrade: true
packages:
- avahi-daemon

* umount /mnt/boot
* Place the card in the Pi 4, ensure ethernet is connected, and boot the machine
* Login with ubuntu/ubuntu (and change password)
* cat .ssh/authorized_keys
* Observe that SSH keys are *not* pulled in for the default user
* apt policy avahi-daemon
* Observe that avahi-daemon is *not* installed by cloud-init

One or both of the failures above might not happen depending on how
rapidly some things come up.

[ Causes ]

Firstly, the network configuration above *should* require that cloud-
init does not proceed until network connectivity has been established
("routable" in networkd parlance). It appears that netplan *tries* to
ensure this is the case by injecting the following into systemd-
networkd-wait-online:

$ systemctl cat systemd-networkd-wait-online | tail -9
# 
/run/systemd/generator.late/systemd-networkd-wait-online.service.d/10-netplan.conf
[Unit]
ConditionPathIsSymbolicLink=/run/systemd/generator/network-online.target.wants/systemd-networkd-wait-online.service
After=systemd-resolved.service

[Service]
ExecStart=
ExecStart=/lib/systemd/systemd-networkd-wait-online -i eth0:degraded
ExecStart=/lib/systemd/systemd-networkd-wait-online --any --dns -o routable -i 
eth0

However, systemd-networkd-wait-online gets skipped entirely because a
condition is not met:

$ systemctl status systemd-networkd-wait-online | tail -1
Jan 19 21:30:43 ubuntu systemd[1]: systemd-networkd-wait-online.service - Wait 
for Network to be Online was skipped because of an unmet condition check 
(ConditionPathExists=/run/networkd/initrd/neednet).

I've not been able to find where specifically this condition is defined
(it doesn't appear anywhere in the cat output for the service, or in
network-online.target), but it appears that neednet is something dracut
introduced (bear in mind we only made the switch to dracut on the Pi
images in resolute; questing still used initramfs-tools).

Re-flashing the image, and adding rd.neednet to the kernel command line
(current/cmdline.txt on the boot partition) causes systemd-networkd-
wait-online to wait for connectivity as it used to. But now another
problem rears its head, causing network operations to fail: the Pi has
no RTC (well, before the Pi 5, anyway) so all TLS connections fail until
the system clock is set accurately. This is handled by chrony, which we
also used in questing, but now it seems to be expecting some TLS checks
to work as well:

$ journalctl --unit chrony | head -20
Jan 19 21:31:09 miss-piggy systemd[1]: Starting chrony.service - chrony, an NTP 
client/server...
Jan 19 21:31:10 miss-piggy chronyd[1589]: chronyd version 4.8 starting (+CMDMON 
+REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +NTS +SECHASH +IPV6 -DEBUG)
Jan 19 21:31:10 miss-piggy chronyd[1589]: Loaded 0 symmetric keys
Jan 19 21:31:10 miss-piggy chronyd[1589]: Using leap second list 
/usr/share/zoneinfo/leap-seconds.list
Jan 19 21:31:10 miss-piggy chronyd[1589]: Loaded seccomp filter (level 1)
Jan 19 21:31:10 miss-piggy systemd[1]: Started chrony.service - chrony, an NTP 
client/server.
Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 1.ntp.ubuntu.com
Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 2.ntp.ubuntu.com
Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 3.ntp.ubuntu.com
Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool 4.ntp.ubuntu.com
Jan 19 21:31:10 miss-piggy chronyd[1589]: Added pool ntp-bootstrap.ubuntu.com
Jan 19 21:31:13 miss-piggy chronyd[1589]: TLS handshake with 
[2620:2d:4000:1::2123]:4460 (1.ntp.ubuntu.com) failed : Error in the 
certificate verification. The certificate is NOT trusted. The certificate chain 
uses not yet valid certificate.
Jan 19 21:31:13 miss-piggy chronyd[1589]: TLS handshake with 
[2620:2d:4000:1::3123]:4460 (2.ntp.ubuntu.com) failed : Error in the 
certificate verification. The certificate is NOT trusted. The certificate chain 
uses not yet valid certificate.
Jan 19 21:31:19 miss-piggy chronyd[1589]: TLS handshake with 
[2620:2d:4002:1::3123]:4460 (4.ntp.ubuntu.com) failed : Error in the 
certificate verification. The certificate is NOT trusted. The certificate chain 
uses not yet valid certificate.
Jan 19 21:31:20 miss-piggy chronyd[1589]: Selected source 2620:2d:4000:1::1123 
(ntp-bootstrap.ubuntu.com)
Jan 19 21:31:20 miss-piggy chronyd[1589]: System clock wrong by 4899846.142287 
seconds
Mar 17 14:35:26 miss-piggy chronyd[1589]: System clock was stepped by 
4899846.142287 seconds
Mar 17 14:35:26 miss-piggy chronyd[1589]: System clock TAI offset set to 37 
seconds
Mar 17 14:35:35 miss-piggy chronyd[1589]: NTS-KE session with 
[2620:2d:4002:1::2123]:4460 (3.ntp.ubuntu.com) timed out
Mar 17 14:35:40 miss-piggy chronyd[1589]: Selected source 185.125.190.122 
(1.ntp.ubuntu.com)

Note chrony starts at 21:31:09 (according to the system clock), fails to
validate various certificates (because the certificate is "not yet
valid", because the system clock is way out), and it's 21:31:20 before
it's figured out how wrong the system clock is, and steps it forward.

I have a feeling this is another thing affected by the switch to dracut
given that initramfs-tools used to have a "fixrtc" workaround which set
the system clock from the last fs mount timestamp when it was in advance
of the system clock. Perhaps a similar workaround is required in dracut.

Finally, just adding rd.neednet into the kernel command line isn't a
workaround since, in the situation where all interfaces are undefined or
optional, this then causes systemd-networkd-wait-online to wait for
several minutes on every boot. This didn't happen under questing and I'm
still trying to understand the differences here: under questing with all
interfaces optional / undefined, the systemd-networkd-wait-online
service is simply disabled; with mandatory interfaces it becomes
"enabled-runtime". It seems in resolute it's always "enabled" but
normally inhibited by neednet. Whether this is a difference in netplan
or systemd's behaviour I'm still digging into.

** Affects: cloud-init (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: dracut (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: netplan.io (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: systemd (Ubuntu)
     Importance: Undecided
         Status: New

** Also affects: dracut (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: netplan.io (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: systemd (Ubuntu)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2144891

Title:
  Rpi resolute dailies fail to use network resources

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2144891/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to