This bug was fixed in the package netplan.io - 0.100-0ubuntu4

---------------
netplan.io (0.100-0ubuntu4) groovy; urgency=medium

  * debian/tests/cloud-init
    - Improve reboot test to avoid failure on arm64

 -- Lukas Märdian <[email protected]>  Mon, 21 Sep 2020
12:23:02 +0200

** Changed in: netplan.io (Ubuntu)
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1892851

Title:
  Staged boot, to fix integration of systemd generators

Status in cloud-init:
  Invalid
Status in netplan:
  Fix Committed
Status in netplan.io package in Ubuntu:
  Fix Released

Bug description:
  [Intro]
  Cloud-init makes use of the "netplan" systemd generator, but calls "netplan 
generate" manually at runtime, while currently executing the initial systemd 
boot transaction, instead of running it as intended via "systemctl 
daemon-reload" at systemd generator stage, due to restrictions it has regarding 
fetching of its data source (e.g. netplan YAML config).

  [Problem]
  This leads to problems at first boot, as the systemd unit dependencies are 
calculated after the generator stage, but ahead of the boot transaction (e.g. 
via systemctl daemon-reload), therefore the new service units and its 
dependencies, which are generated by manually calling systemd generators are 
ignored during the first-boot transaction. In subsequent boots (where the 
cloud-init data source, netplan YAML config and unit files are already in 
place), everything works as expected.

  It is a tricky situation, as cloud-init
   1/ does not have the full config to run the systemd generators (e.g. netplan 
YAML) yet before the systemd boot transaction. It first needs to fetch it via a 
DataSource, possibly via a network connection.
   2/ cannot execute the generators manually (e.g. "netplan generate") during 
the systemd boot transaction, because this way the newly generated service 
units and corresponding dependencies will be ignored.
   3/ cannot re-execute the systemd generators after the initial boot 
transaction, as it is already too late at this point and applications expect to 
have a readily configured network setup after cloud-final.target has been 
reached.

  [References]
  Such problems have been reported and discussed for WiFi on RaspberryPi (LP: 
#1870346) or Open vSwitch setups in MAAS 
(https://github.com/CanonicalLtd/netplan/pull/157), where some of the generated 
service units/dependencies (netplan-ovs-*.service or netplan-wpa-*.service, 
possibly SR-IOV units as well...) are not properly executed on first boot.

  [Suggestion]
  A possible solution I discussed with @xnox would be to re-engineer how 
cloud-init targets work a bit, by splitting up the cloud-init boot sequence 
into multiple stages, e.g.:

  * Start "Stage 0" systemd transaction: systemctl isolate cloud-stage0.target
    - execute the init local modules
    - setup basic networking (DHCP on eth0/ens3)
    - fetch data source & place netplan YAML in /etc/netplan/
  * Finish "Stage 0" transaction
  * Call systemctl daemon-reload
    - This will trigger all systemd generators (incl. netplan generate) and 
re-calculate all dependencies
  * Start "Stage 1" systemd transaction: systemctl isolate default.target
    - execute all the normal cloud-init modules and start all the normal 
services, e.g. via cloud-final.target
  * Finish "Stage 1" transaction
  * System is now fully booted

  The idea here is to split up the boot sequence into two (or more?)
  systemd transactions, so we can call "systemctl daemon-reload" in
  between (but not within a running systemd transaction) to re-run all
  the generators and re-calculate all the dependencies. This way all
  generators would be used in their intended way and should work as
  expected, even on first boot.

  Doing that would also allow users to do interesting things with
  systemd via cloud-config. Like changing the default.target from
  multiuser.target to emergency.target, adding / masking / removing
  units used in early boot, and "just write fstab" and allow systemd-
  fstab-generator to process it, and mount things, etc...

  
  ### Config used to reproduce the problem in a LXD container:
  "systemctl status netplan-ovs-ovs0.service" will show that this unit has not 
be executed on first boot.

  config:
    user.network-config: |
      # cloud-config
      version: 2
      bridges:
        ovs0:
          addresses: [10.10.10.20/24]
          interfaces: [eth0.21]
          parameters:
            stp: false
          openvswitch: {}
      ethernets:
        eth0:
          addresses: [10.10.10.30/24]
      vlans:
        eth0.21:
          id: 21
          link: eth0
  description: My OVS debugging profile
  devices:
    eth0:
      name: eth0
      network: lxdbr0
      type: nic
    root:
      path: /
      pool: default
      type: disk
  name: myovs

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1892851/+subscriptions

-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to