For the bug verification, note that:
- instead of enabling -proposed and installing open-iscsi, we needed to 
generate images with the package pre-installed, to verify behavior on first 
boot. All tests have a call to `apt policy` which shows the same version string 
for the installed open-iscsi as the one in -proposed. Although it does not 
guarantee that it's the same package, this is the best effort we could do, and 
we assure it is the same package.

- There is a systemd failue for systemd-networkd-wait-online.service in all of 
the Baremetal logs. This is not caused by this change; the following logs (from 
Noble) are an example of how this is already present in the instances with the 
open-iscsi version from the archive:
https://pastebin.ubuntu.com/p/Jm9dPCTZrc/

- Each of the network configuration logs in the assertions for the first 
scenario (keeping current behavior without the flag) targets an instance type: 
The /run/net files will not be present in the non-ISCSI instances (as there is 
no initramfs network configuration). Those will have the cloud-init network 
config logs as described. ISCSI instances (baremetal) will be configured in 
initramfs so /run/net files will be present and cloud-init shows no warnings.

** Description changed:

  [ Impact ]
  Oracle Cloud provides users with baremetal instances, and two types of VM 
instances (native and paravirtualized). Native VMs and baremetal use ISCSI, 
while the paravirtualized VMs don't.
  Oracle requires a single image which can run in all instance types, so it's 
not possible to provide an image with ISCSI enabled only for the instances that 
boot from it. Our images set ISCSI_AUTO to be compatible with those. 
Additionally, clouds generally don't specify command line args at boot so they 
can't simply enable or disable ISCSI on a per instance basis.
  
  Oracle now has IPV6-only instances. On fully virtualized instances there
  is no IP configuration coming from ibft, and configure_networking() is
  trying to get network information through DHCP in initramfs, but
  starting with IPv4. That generates a significant delay (up to 5 minutes)
  when booting. Even the IPv6 address the instance gets is not useful, as
  the network can be configured later through cloud-init.
  
  The fix here skips configure_networking(), delegating it to cloud-init,
  and speeding up the boot process on Oracle Cloud instances.
  
  [ Test Plan ]
  Thanks to Alec Warren <[email protected]> for the detailed test plan.
  
  1. Maintains current behaviour by default when cmdline arg is NOT set
-   a. Test setup:
-     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
-     - New cmdline arg "iscsi_auto_skip_initramfs_networking" NOT set
-     - Instance configurations:
-       - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
-       - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
+   a. Test setup:
+     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
+     - New cmdline arg "iscsi_auto_skip_initramfs_networking" NOT set
+     - Instance configurations:
+       - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
+       - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
  
-   b. Test Assertions:
-     - Verified that the change does nothing and maintains current behavior
-     - The echo call is NOT in the serial console logs during initramfs stage
-     - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
-       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network)
-       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs)
+   b. Test Assertions:
+     - Verified that the change does nothing and maintains current behavior
+     - The echo call is NOT in the serial console logs during initramfs stage
+     - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
+       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network)
+       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs)
  
  2. Does not break ISCSI use case on ISCSI instances when enabled via cmdline 
arg
-   a. Test setup:
-     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
-     - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
-     - Instance configuration:
-       - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
+   a. Test setup:
+     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
+     - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
+     - Instance configuration:
+       - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
  
-   b.Test Assertions:
-     - The echo call is NOT in the serial console logs during initramfs stage
-     - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
-       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network)
-       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs)
+   b.Test Assertions: (edited; see comment #21)
+     - The echo call is NOT in the serial console logs during initramfs stage
+     - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
+       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network) (on non-ISCSI instances)
+       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs) (on ISCSI instances)
  
  3. Skips configuring networking on non-ISCSI instances when enabled via 
cmdline arg
-   a. Test setup:
-     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
-     - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
-     - Instance configuration:
-       - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
+   a. Test setup:
+     - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
+     - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
+     - Instance configuration:
+       - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
  
-   b. Test Assertions:
-     - The echo call IS present in the serial console logs during initramfs 
stage
-     - Instance does NOT have networking configured during initramfs and 
ephemeral networking IS needed and setup by cloud-init
-       - Verifiable via cloud-init logs (states that there is no networking 
from initramfs and sets up ephemeral network itself)
-       - Verifiable by no /run/net-* files existing (these would be created by 
configure_networking in initramfs)
-     - Boot speed is measurably faster than normal (~10-12s instead of the 
normal 20s+)
+   b. Test Assertions:
+     - The echo call IS present in the serial console logs during initramfs 
stage
+     - Instance does NOT have networking configured during initramfs and 
ephemeral networking IS needed and setup by cloud-init
+       - Verifiable via cloud-init logs (states that there is no networking 
from initramfs and sets up ephemeral network itself)
+       - Verifiable by no /run/net-* files existing (these would be created by 
configure_networking in initramfs)
+     - Boot speed is measurably faster than normal (~10-12s instead of the 
normal 20s+)
  
  [ Where problems could occur ]
  Because this change targets a bug in a specific scenario, the check is 
explicitly applying to instances where the flag is present, ISCSI_AUTO is set 
but there is no ibft data in the system. Mistakes in the logic would make this 
change run in other scenarios, which is not the goal of this fix.
  
  Any mistake in trying to make this configuration completely opt-in would
  break existing instances in the sense that configure_networking() may
  not run when it should. To avoid that we explicitly check for the flag,
  and don't act if it is not set. The expected behavior can be verified
  using the test steps above.
  
  Usage wise, if there is any mistake in setting the flag, the worse that
  can happen is that the code won't detect it as it should, and then the
  bug triggers, and users will experience longer boot times, just as it
  happens now without the change.
  
  [ Other Info ]
  As explained above, there is a requiremen from Oracle Cloud that makes it 
impossible to just unset ISCSI configuration on the images when spinning 
non-ISCSI instances. This is the reason an opt-in flag is used to opt-out from 
the network configuration. We know it may be not ideal, but this enables our 
cloud teams to set the flag on Oracle Cloud images without harming other users 
- which just don't use it.
  
  This changeset has been forwarded to Debian, but on their side there
  were some questions and suggestions to improve the approach taken. If
  Debian ends up changing the way this situation is handled, we may change
  it in the development release to eliminate, or at least reduce, the
  delta which was introduced. However, no new SRUs should happen on this
  matter, as this change is considered maintainable for the foreseeable
  future.
  
  [ Original Description ]
  Cloud instances that configure network over DHCP in initramfs, will go 
through a "for ROUNDTTT in 30 60 90 120" loop inside configure_networking().
  
  If the DHCP server is only offering a IPv6 (no IPv4), the instance will
  take more than 5 minutes to boot, because it will first go through a
  loop trying to obtain IPv4 IP (dhcpcd -1KL -t $ROUNDTTT -4
  ${DEVICE:+"${DEVICE}"}) for 30+60+90+120 seconds (total 300 seconds - 5
  minutes), which won't work, until it times out, and then resume the boot
  process.
  
  In https://bugs.launchpad.net/ubuntu/+source/initramfs-
  tools/+bug/2091904 initramfs-tools improved this situation, looking for
  IPv6 information in /sys/firmware/ibft/ethernet*/ip-addr to decide
  whether to look for IPv6 or IPv4, however that assumes that IP
  information will be available through ibft, which is not always true.
  
  If no IP information is available through ibft, we still go through this
  incorrect loop, delaying the boot process.
  
  Example from an instance booting through virtual disks, with no ibft,
  and IPv6-only on Oracle Cloud:
  
  ```
  [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.12.0-1001-oracle 
root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 
nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 crash_kexec_post_notifiers
  [...]
  Begin: Running /scripts/init-premount ... done.
  Begin: Mounting root file system ... Begin: Running /scripts/local-top ... [  
  2.863248] No iBFT detected.
  Could not setup fw entries.
  Begin: Waiting up to 180 secs for any network device to become available ... 
done.
  dhcpcd-10.1.0 starting
  dev: loaded udev
  [    2.906793] 8021q: 802.1Q VLAN Support v1.8
  [    2.917496] 8021q: adding VLAN 0 to HW filter on device enp0s5
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: carrier acquired
  enp0s5: IAID 17:36:95:6d
  [    2.983134] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 7 
times, consider switching to WQ_UNBOUND
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  no search or nameservers found in /run/net-.conf /run/net-*.conf 
/run/net6-*.conf
  [  303.057039] Loading iSCSI transport class v2.0-870.
  [  303.069113] iscsi: registered transport (tcp)
  Could not get boot entry.
  done.
  ```
  
  Full log: https://pastebin.ubuntu.com/p/Sk5dcvpPyY/
  
  We can see such loop between lines 1136 and 1176.

** Description changed:

  [ Impact ]
  Oracle Cloud provides users with baremetal instances, and two types of VM 
instances (native and paravirtualized). Native VMs and baremetal use ISCSI, 
while the paravirtualized VMs don't.
  Oracle requires a single image which can run in all instance types, so it's 
not possible to provide an image with ISCSI enabled only for the instances that 
boot from it. Our images set ISCSI_AUTO to be compatible with those. 
Additionally, clouds generally don't specify command line args at boot so they 
can't simply enable or disable ISCSI on a per instance basis.
  
  Oracle now has IPV6-only instances. On fully virtualized instances there
  is no IP configuration coming from ibft, and configure_networking() is
  trying to get network information through DHCP in initramfs, but
  starting with IPv4. That generates a significant delay (up to 5 minutes)
  when booting. Even the IPv6 address the instance gets is not useful, as
  the network can be configured later through cloud-init.
  
  The fix here skips configure_networking(), delegating it to cloud-init,
  and speeding up the boot process on Oracle Cloud instances.
  
  [ Test Plan ]
  Thanks to Alec Warren <[email protected]> for the detailed test plan.
  
  1. Maintains current behaviour by default when cmdline arg is NOT set
    a. Test setup:
      - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
      - New cmdline arg "iscsi_auto_skip_initramfs_networking" NOT set
      - Instance configurations:
        - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
        - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
  
-   b. Test Assertions:
+   b. Test Assertions:(edited; see comment #21)
      - Verified that the change does nothing and maintains current behavior
      - The echo call is NOT in the serial console logs during initramfs stage
      - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
-       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network)
-       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs)
+       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network) (on non-ISCSI instances)
+       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs) (on ISCSI instances)
  
  2. Does not break ISCSI use case on ISCSI instances when enabled via cmdline 
arg
    a. Test setup:
      - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
      - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
      - Instance configuration:
        - ISCSI instance on Oracle Cloud (native VM and Baremetal instance)
  
-   b.Test Assertions: (edited; see comment #21)
+   b.Test Assertions: 
      - The echo call is NOT in the serial console logs during initramfs stage
      - Instance DOES have networking configured during initramfs and ephemeral 
networking is NOT needed by cloud-init
-       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network) (on non-ISCSI instances)
-       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs) (on ISCSI instances)
+       - Verifiable via cloud-init logs (states that network is configured and 
does not need to setup ephemeral network) 
+       - Verifiable by presence of /run/net-* files (these are created by 
configure_networking in initramfs)
  
  3. Skips configuring networking on non-ISCSI instances when enabled via 
cmdline arg
    a. Test setup:
      - Ubuntu image (which uses ISCSI_AUTO mode) containing this change
      - New cmdline arg "iscsi_auto_skip_initramfs_networking" IS set using grub
      - Instance configuration:
        - non-ISCSI instance on Oracle Cloud (paravirtualized VM)
  
    b. Test Assertions:
      - The echo call IS present in the serial console logs during initramfs 
stage
      - Instance does NOT have networking configured during initramfs and 
ephemeral networking IS needed and setup by cloud-init
        - Verifiable via cloud-init logs (states that there is no networking 
from initramfs and sets up ephemeral network itself)
        - Verifiable by no /run/net-* files existing (these would be created by 
configure_networking in initramfs)
      - Boot speed is measurably faster than normal (~10-12s instead of the 
normal 20s+)
  
  [ Where problems could occur ]
  Because this change targets a bug in a specific scenario, the check is 
explicitly applying to instances where the flag is present, ISCSI_AUTO is set 
but there is no ibft data in the system. Mistakes in the logic would make this 
change run in other scenarios, which is not the goal of this fix.
  
  Any mistake in trying to make this configuration completely opt-in would
  break existing instances in the sense that configure_networking() may
  not run when it should. To avoid that we explicitly check for the flag,
  and don't act if it is not set. The expected behavior can be verified
  using the test steps above.
  
  Usage wise, if there is any mistake in setting the flag, the worse that
  can happen is that the code won't detect it as it should, and then the
  bug triggers, and users will experience longer boot times, just as it
  happens now without the change.
  
  [ Other Info ]
  As explained above, there is a requiremen from Oracle Cloud that makes it 
impossible to just unset ISCSI configuration on the images when spinning 
non-ISCSI instances. This is the reason an opt-in flag is used to opt-out from 
the network configuration. We know it may be not ideal, but this enables our 
cloud teams to set the flag on Oracle Cloud images without harming other users 
- which just don't use it.
  
  This changeset has been forwarded to Debian, but on their side there
  were some questions and suggestions to improve the approach taken. If
  Debian ends up changing the way this situation is handled, we may change
  it in the development release to eliminate, or at least reduce, the
  delta which was introduced. However, no new SRUs should happen on this
  matter, as this change is considered maintainable for the foreseeable
  future.
  
  [ Original Description ]
  Cloud instances that configure network over DHCP in initramfs, will go 
through a "for ROUNDTTT in 30 60 90 120" loop inside configure_networking().
  
  If the DHCP server is only offering a IPv6 (no IPv4), the instance will
  take more than 5 minutes to boot, because it will first go through a
  loop trying to obtain IPv4 IP (dhcpcd -1KL -t $ROUNDTTT -4
  ${DEVICE:+"${DEVICE}"}) for 30+60+90+120 seconds (total 300 seconds - 5
  minutes), which won't work, until it times out, and then resume the boot
  process.
  
  In https://bugs.launchpad.net/ubuntu/+source/initramfs-
  tools/+bug/2091904 initramfs-tools improved this situation, looking for
  IPv6 information in /sys/firmware/ibft/ethernet*/ip-addr to decide
  whether to look for IPv6 or IPv4, however that assumes that IP
  information will be available through ibft, which is not always true.
  
  If no IP information is available through ibft, we still go through this
  incorrect loop, delaying the boot process.
  
  Example from an instance booting through virtual disks, with no ibft,
  and IPv6-only on Oracle Cloud:
  
  ```
  [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-6.12.0-1001-oracle 
root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0 
nvme.shutdown_timeout=10 libiscsi.debug_libiscsi_eh=1 crash_kexec_post_notifiers
  [...]
  Begin: Running /scripts/init-premount ... done.
  Begin: Mounting root file system ... Begin: Running /scripts/local-top ... [  
  2.863248] No iBFT detected.
  Could not setup fw entries.
  Begin: Waiting up to 180 secs for any network device to become available ... 
done.
  dhcpcd-10.1.0 starting
  dev: loaded udev
  [    2.906793] 8021q: 802.1Q VLAN Support v1.8
  [    2.917496] 8021q: adding VLAN 0 to HW filter on device enp0s5
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: carrier acquired
  enp0s5: IAID 17:36:95:6d
  [    2.983134] workqueue: drm_fb_helper_damage_work hogged CPU for >10000us 7 
times, consider switching to WQ_UNBOUND
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  dhcpcd-10.1.0 starting
  dev: loaded udev
  DUID 00:03:00:01:02:00:17:36:95:6d
  enp0s5: IAID 17:36:95:6d
  enp0s5: soliciting a DHCP lease
  timed out
  exiting due to oneshot
  dhcpcd exited
  Sleeping 0 seconds before retrying getting a DHCP lease
  no search or nameservers found in /run/net-.conf /run/net-*.conf 
/run/net6-*.conf
  [  303.057039] Loading iSCSI transport class v2.0-870.
  [  303.069113] iscsi: registered transport (tcp)
  Could not get boot entry.
  done.
  ```
  
  Full log: https://pastebin.ubuntu.com/p/Sk5dcvpPyY/
  
  We can see such loop between lines 1136 and 1176.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2098515

Title:
  [SRU] IPv6-only (single stack) instances configuring network over dhcp
  in initramfs will take a long time to boot due to loop in dhcpcd -4

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/2098515/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to