Public bug reported: [Intro] Cloud-init makes use of the "netplan" systemd generator, but calls "netplan generate" manually at runtime, while currently executing the initial systemd boot transaction, instead of running it as intended via "systemctl daemon-reload" at systemd generator stage, due to restrictions it has regarding fetching of its data source (e.g. netplan YAML config).
[Problem] This leads to problems at first boot, as the systemd unit dependencies are calculated after the generator stage, but ahead of the boot transaction (e.g. via systemctl daemon-reload), therefore the new service units and its dependencies, which are generated by manually calling systemd generators are ignored during the first-boot transaction. In subsequent boots (where the cloud-init data source, netplan YAML config and unit files are already in place), everything works as expected. It is a tricky situation, as cloud-init 1/ does not have the full config to run the systemd generators (e.g. netplan YAML) yet before the systemd boot transaction. It first needs to fetch it via a DataSource, possibly via a network connection. 2/ cannot execute the generators manually (e.g. "netplan generate") during the systemd boot transaction, because this way the newly generated service units and corresponding dependencies will be ignored. 3/ cannot re-execute the systemd generators after the initial boot transaction, as it is already too late at this point and applications expect to have a readily configured network setup after cloud-final.target has been reached. [References] Such problems have been reported and discussed for WiFi on RaspberryPi (LP: #1870346) or Open vSwitch setups in MAAS (https://github.com/CanonicalLtd/netplan/pull/157), where some of the generated service units/dependencies (netplan-ovs-*.service or netplan-wpa-*.service, possibly SR-IOV units as well...) are not properly executed on first boot. [Suggestion] A possible solution I discussed with @xnox would be to re-engineer how cloud-init targets work a bit, by splitting up the cloud-init boot sequence into multiple stages, e.g.: * Start "Stage 0" systemd transaction: systemctl isolate cloud-stage0.target - execute the init local modules - setup basic networking (DHCP on eth0/ens3) - fetch data source & place netplan YAML in /etc/netplan/ * Finish "Stage 0" transaction * Call systemctl daemon-reload - This will trigger all systemd generators (incl. netplan generate) and re-calculate all dependencies * Start "Stage 1" systemd transaction: systemctl isolate default.target - execute all the normal cloud-init modules and start all the normal services, e.g. via cloud-final.target * Finish "Stage 1" transaction * System is now fully booted The idea here is to split up the boot sequence into two (or more?) systemd transactions, so we can call "systemctl daemon-reload" in between (but not within a running systemd transaction) to re-run all the generators and re-calculate all the dependencies. This way all generators would be used in their intended way and should work as expected, even on first boot. Doing that would also allow users to do interesting things with systemd via cloud-config. Like changing the default.target from multiuser.target to emergency.target, adding / masking / removing units used in early boot, and "just write fstab" and allow systemd-fstab-generator to process it, and mount things, etc... ### Config used to reproduce the problem in a LXD container: "systemctl status netplan-ovs-ovs0.service" will show that this unit has not be executed on first boot. config: user.network-config: | # cloud-config version: 2 bridges: ovs0: addresses: [10.10.10.20/24] interfaces: [eth0.21] parameters: stp: false openvswitch: {} ethernets: eth0: addresses: [10.10.10.30/24] vlans: eth0.21: id: 21 link: eth0 description: My OVS debugging profile devices: eth0: name: eth0 network: lxdbr0 type: nic root: path: / pool: default type: disk name: myovs ** Affects: cloud-init Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to cloud-init. https://bugs.launchpad.net/bugs/1892851 Title: Staged boot, to fix integration of systemd generators Status in cloud-init: New Bug description: [Intro] Cloud-init makes use of the "netplan" systemd generator, but calls "netplan generate" manually at runtime, while currently executing the initial systemd boot transaction, instead of running it as intended via "systemctl daemon-reload" at systemd generator stage, due to restrictions it has regarding fetching of its data source (e.g. netplan YAML config). [Problem] This leads to problems at first boot, as the systemd unit dependencies are calculated after the generator stage, but ahead of the boot transaction (e.g. via systemctl daemon-reload), therefore the new service units and its dependencies, which are generated by manually calling systemd generators are ignored during the first-boot transaction. In subsequent boots (where the cloud-init data source, netplan YAML config and unit files are already in place), everything works as expected. It is a tricky situation, as cloud-init 1/ does not have the full config to run the systemd generators (e.g. netplan YAML) yet before the systemd boot transaction. It first needs to fetch it via a DataSource, possibly via a network connection. 2/ cannot execute the generators manually (e.g. "netplan generate") during the systemd boot transaction, because this way the newly generated service units and corresponding dependencies will be ignored. 3/ cannot re-execute the systemd generators after the initial boot transaction, as it is already too late at this point and applications expect to have a readily configured network setup after cloud-final.target has been reached. [References] Such problems have been reported and discussed for WiFi on RaspberryPi (LP: #1870346) or Open vSwitch setups in MAAS (https://github.com/CanonicalLtd/netplan/pull/157), where some of the generated service units/dependencies (netplan-ovs-*.service or netplan-wpa-*.service, possibly SR-IOV units as well...) are not properly executed on first boot. [Suggestion] A possible solution I discussed with @xnox would be to re-engineer how cloud-init targets work a bit, by splitting up the cloud-init boot sequence into multiple stages, e.g.: * Start "Stage 0" systemd transaction: systemctl isolate cloud-stage0.target - execute the init local modules - setup basic networking (DHCP on eth0/ens3) - fetch data source & place netplan YAML in /etc/netplan/ * Finish "Stage 0" transaction * Call systemctl daemon-reload - This will trigger all systemd generators (incl. netplan generate) and re-calculate all dependencies * Start "Stage 1" systemd transaction: systemctl isolate default.target - execute all the normal cloud-init modules and start all the normal services, e.g. via cloud-final.target * Finish "Stage 1" transaction * System is now fully booted The idea here is to split up the boot sequence into two (or more?) systemd transactions, so we can call "systemctl daemon-reload" in between (but not within a running systemd transaction) to re-run all the generators and re-calculate all the dependencies. This way all generators would be used in their intended way and should work as expected, even on first boot. Doing that would also allow users to do interesting things with systemd via cloud-config. Like changing the default.target from multiuser.target to emergency.target, adding / masking / removing units used in early boot, and "just write fstab" and allow systemd- fstab-generator to process it, and mount things, etc... ### Config used to reproduce the problem in a LXD container: "systemctl status netplan-ovs-ovs0.service" will show that this unit has not be executed on first boot. config: user.network-config: | # cloud-config version: 2 bridges: ovs0: addresses: [10.10.10.20/24] interfaces: [eth0.21] parameters: stp: false openvswitch: {} ethernets: eth0: addresses: [10.10.10.30/24] vlans: eth0.21: id: 21 link: eth0 description: My OVS debugging profile devices: eth0: name: eth0 network: lxdbr0 type: nic root: path: / pool: default type: disk name: myovs To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-init/+bug/1892851/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

