[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-04-11 Thread Launchpad Bug Tracker
This bug was fixed in the package systemd - 245.4-4ubuntu3.21

---
systemd (245.4-4ubuntu3.21) focal; urgency=medium

  * udev: avoid NIC renaming race with kernel (LP: #2002445)
Files:
- 
debian/patches/lp2002445-netlink-do-not-fail-when-new-interface-name-is-already-us.patch
- 
debian/patches/lp2002445-netlink-introduce-rtnl_get-delete_link_alternative_names.patch
- 
debian/patches/lp2002445-sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch
- 
debian/patches/lp2002445-udev-attempt-device-rename-even-if-interface-is-up.patch
- 
debian/patches/lp2002445-udev-net-allow-new-link-name-as-an-altname-before-renamin.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=69ab4a02e828e20ea0ddbd75179324df7a8d1175
  * test-seccomp: accept ENOSYS from sysctl(2) too (LP: #1933090)
Thanks to Roxana Nicolescu
File: 
debian/patches/lp1933090-test-seccomp-accept-ENOSYS-from-sysctl-2-too.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=adaddd1441370ebcdb8bc33d7406b95d85b744f9
  * debian/test: ignore systemd-remount-fs.service failure in containers (LP: 
#1991285)
File: debian/tests/boot-and-services

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=264bdc86f1e4dcd10e8d914d095581c54c33199a

 -- Nick Rosbrook   Wed, 15 Mar 2023
11:04:15 -0400

** Changed in: systemd (Ubuntu Focal)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Released
Status in systemd source package in Jammy:
  Fix Released
Status in systemd source package in Kinetic:
  Fix Released
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
   

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-04-11 Thread Launchpad Bug Tracker
This bug was fixed in the package systemd - 249.11-0ubuntu3.9

---
systemd (249.11-0ubuntu3.9) jammy; urgency=medium

  * udev: gracefully handle rename failures (LP: #2002445)
Files:
- debian/patches/lp2002445/core-device-ignore-failed-uevents.patch
- debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch
- 
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch
- 
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=a7ad4a9fc708500c61e3b8127f112d8c90049b2c

systemd (249.11-0ubuntu3.8) jammy; urgency=medium

  * network/dhcp4: accept local subnet routes from DHCP (LP: #2004478)
File: 
debian/patches/lp2004478-network-dhcp4-accept-local-subnet-routes-from-DHCP.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=96928d5f45ebbfe682b47e842d63506fa0ac9583
  * udev: avoid NIC renaming race with kernel (LP: #2002445)
Files:
- 
debian/patches/lp2002445/sd-netlink-add-a-test-for-rtnl_set_link_name.patch
- 
debian/patches/lp2002445/sd-netlink-do-not-swap-old-name-and-alternative-name.patch
- 
debian/patches/lp2002445/sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch
- 
debian/patches/lp2002445/udev-attempt-device-rename-even-if-interface-is-up.patch
- 
debian/patches/lp2002445/udev-net-allow-new-link-name-as-an-altname-before-renamin.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=20dc4d51a340669c26c446c23b5a84516e82ea74
  * network: create stacked netdevs after the underlying link is (LP: #2000880)
File: 
debian/patches/lp2000880-network-create-stacked-netdevs-after-the-underlying-link-.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ab620e709f3f62eda86af26fd66c00d6e5165a25
  * Enable /dev/sgx_vepc access for the group 'sgx' (LP: #2009502)
File: 
debian/patches/lp2009502-Enable-dev-sgx_vepc-access-for-the-group-sgx.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=434480ae4059a16ccbde9613be0c26ff1983cc3a

 -- Nick Rosbrook   Mon, 20 Mar 2023
10:32:08 -0400

** Changed in: systemd (Ubuntu Jammy)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Released
Status in systemd source package in Jammy:
  Fix Released
Status in systemd source package in Kinetic:
  Fix Released
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-04-11 Thread Launchpad Bug Tracker
This bug was fixed in the package systemd - 251.4-1ubuntu7.3

---
systemd (251.4-1ubuntu7.3) kinetic; urgency=medium

  * udev: gracefully handle rename failures (LP: #2002445)
Files:
- debian/patches/lp2002445/core-device-ignore-failed-uevents.patch
- debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch
- 
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch
- 
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=deb435fbb84fde1fd39da47231a7473fc2a412e8

systemd (251.4-1ubuntu7.2) kinetic; urgency=medium

  * network/dhcp4: accept local subnet routes from DHCP (LP: #2004478)
File: 
debian/patches/lp2004478-network-dhcp4-accept-local-subnet-routes-from-DHCP.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=751bac59b405025964d76c4ef8e0603457a605af
  * udev: avoid NIC renaming race with kernel (LP: #2002445)
Files:
- 
debian/patches/lp2002445/sd-netlink-add-a-test-for-rtnl_set_link_name.patch
- 
debian/patches/lp2002445/sd-netlink-do-not-swap-old-name-and-alternative-name.patch
- 
debian/patches/lp2002445/sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch
- 
debian/patches/lp2002445/udev-attempt-device-rename-even-if-interface-is-up.patch
- 
debian/patches/lp2002445/udev-net-allow-new-link-name-as-an-altname-before-renamin.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ffb1e85fdd3f0fe9b158b28a95cfa6d241fcbe70

 -- Nick Rosbrook   Mon, 20 Mar 2023
10:25:23 -0400

** Changed in: systemd (Ubuntu Kinetic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Committed
Status in systemd source package in Jammy:
  Fix Released
Status in systemd source package in Kinetic:
  Fix Released
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-29 Thread Nick Rosbrook
I used the test script to verify that 245.4-4ubuntu3.21 from focal-
proposed fixes the issue:

- Test run complete: 29 attempted -
Successes without rename race: 27
Successes with rename race and preserved altname: 2
Failures due to network delay: 0
Failures due to no altnames persisted: 0
===

See attached log for full output.

The customer tested systemd and udev from 245.4-4ubuntu3.21 in their own
environment and confirmed the fix as well.

** Attachment added: "sru-systemd-focal.log"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+attachment/5658701/+files/sru-systemd-focal.log

** Tags removed: verification-needed verification-needed-focal
** Tags added: verification-done verification-done-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Committed
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while len(instances) < instance_count:
  instances.append(
  client.launch(
  image_id=image_id,
  instance_type=instance_type,
  user_data=user_data,
  )
  )
  return instances

  
  def get_boot_condition(test_idx, instance):
  blame = instance.execute("systemd-analyze blame").splitlines()
  try:
  LOG.info(
  f"--- Attempt {test_idx} ssh ubuntu@{instance.ip} Blame: 
{blame[0]}"
  )
  except IndexError:
  LOG.warning("--- Attempt {test_idx} Empty blame 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-27 Thread Nick Rosbrook
** Tags removed: verification-needed-jammy verification-needed-kinetic
** Tags added: verification-done-jammy verification-done-kinetic

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Committed
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while len(instances) < instance_count:
  instances.append(
  client.launch(
  image_id=image_id,
  instance_type=instance_type,
  user_data=user_data,
  )
  )
  return instances

  
  def get_boot_condition(test_idx, instance):
  blame = instance.execute("systemd-analyze blame").splitlines()
  try:
  LOG.info(
  f"--- Attempt {test_idx} ssh ubuntu@{instance.ip} Blame: 
{blame[0]}"
  )
  except IndexError:
  LOG.warning("--- Attempt {test_idx} Empty blame {blame}?")
  LOG.info(instance.execute("systemd-analyze blame"))
  blame = [""]
  altnames_persisted = False
  ip_addr = json.loads(instance.execute("ip -j addr").stdout)
  rename_race_present = False  # set true when we see eth1 not renamed
  for d in ip_addr:
  if d["ifname"] == "eth1":
  rename_race_present = True
  if len(d.get("altnames", [])) > 1:
  LOG.info(
  f"--- SUCCESS persisting altnames {d['altnames']} due to 
rename race on resource busy on {d['ifname']}"
  )
  altnames_persisted = True
  else:
 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-27 Thread Łukasz Zemczak
Hello Nick, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.21 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
focal to verification-done-focal. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-focal. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: systemd (Ubuntu Focal)
   Status: In Progress => Fix Committed

** Tags added: verification-needed-focal

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Fix Committed
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-25 Thread Chad Smith
Kinetic success for proposed launches, rename race seen and properly handled 
without delaying boot time due to systemd-networkd-wait-online.target.

 *** 251.4-1ubuntu7.3 500   
500 http://archive.ubuntu.com/ubuntu kinetic-proposed/main amd64 
Packages
100 /var/lib/dpkg/status   
...
- Test run complete: 100 attempted -
Successes without rename race: 81   
Successes with rename race and preserved altname: 19
Failures due to network delay: 0
Failures due to no altnames persisted: 81   
===

** Attachment added: "sru-systemd-kinetic.log"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+attachment/5657411/+files/sru-systemd-kinetic.log

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while len(instances) < instance_count:
  instances.append(
  client.launch(
  image_id=image_id,
  instance_type=instance_type,
  user_data=user_data,
  )
  )
  return instances

  
  def get_boot_condition(test_idx, instance):
  blame = instance.execute("systemd-analyze blame").splitlines()
  try:
  LOG.info(

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-25 Thread Launchpad Bug Tracker
This bug was fixed in the package systemd - 252.5-2ubuntu3

---
systemd (252.5-2ubuntu3) lunar; urgency=medium

  * udev: gracefully handle rename failures (LP: #2002445)
Files:
- debian/patches/lp2002445/core-device-ignore-failed-uevents.patch
- debian/patches/lp2002445/sd-device-introduce-device_get_property_int.patch
- 
debian/patches/lp2002445/sd-device-make-device_set_syspath-clear-sysname-and-sysnu.patch
- 
debian/patches/lp2002445/udev-restore-syspath-and-properties-on-failure.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=79536dbb165dbcc402629684e0911693626df5b1

 -- Nick Rosbrook   Mon, 20 Mar 2023
10:17:24 -0400

** Changed in: systemd (Ubuntu Lunar)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while len(instances) < instance_count:
  instances.append(
  client.launch(
  image_id=image_id,
  instance_type=instance_type,
  user_data=user_data,
  )
  )
  return instances

  
  def get_boot_condition(test_idx, instance):
  blame = instance.execute("systemd-analyze blame").splitlines()
  try:
  LOG.info(
  f"--- Attempt {test_idx} ssh ubuntu@{instance.ip} Blame: 
{blame[0]}"
  )
  except IndexError:
  LOG.warning("--- Attempt {test_idx} Empty blame {blame}?")

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-25 Thread Chad Smith
Jammy success for proposed launches, rename race  seen and properly
handled without delaying boot time due to systemd-networkd-wait-
online.target.


 *** 249.11-0ubuntu3.9 500  
500 http://archive.ubuntu.com/ubuntu jammy-proposed/main amd64 Packages 
100 /var/lib/dpkg/status


- Test run complete: 100 attempted -
Successes without rename race: 93   
Successes with rename race and preserved altname: 7 
Failures due to network delay: 0
Failures due to no altnames persisted: 93   
===  


** Attachment added: "sru-systemd-jammy.log"
   
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+attachment/5657410/+files/sru-systemd-jammy.log

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import json
  import logging
  import os
  import sys
  from enum import Enum

  import pycloudlib

  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """

  #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # - apt install systemd udev -y --allow-unauthenticated

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  class BootCondition(Enum):
  SUCCESS_WITHOUT_RENAME_RACE = "network bringup success without rename 
race"
  SUCCESS_WITH_RENAME_RACE = "network bringup success rename race condition"
  ERROR_NETWORK_TIMEOUT = "error: timeout on systemd-networkd-wait-online"

  
  def batch_launch_vm(
  client, instance_type, image_id, user_data, instance_count=5
  ):
  instances = []
  while len(instances) < instance_count:
  instances.append(
  client.launch(
  image_id=image_id,
  instance_type=instance_type,
  user_data=user_data,
  )
  )
  return instances

  
  def get_boot_condition(test_idx, instance):
  blame = instance.execute("systemd-analyze blame").splitlines()
  try:
  LOG.info(
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-25 Thread Chad Smith
Test script referenced in the bug description
https://paste.ubuntu.com/p/Xkc7bSZ8fB/


** Description changed:

  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.
  
  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.
  
  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.
  
  To assert success condition during net rename busy race:
-   - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.
+   - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit).
  
  Sample script uses pycloudlib to create modified base image for test and
  launches 100 VMs of type Standard_D8ds_v5, counting both successes and
  any failures seen.
  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""
  
+ import json
  import logging
- import json
+ import os
+ import sys
+ from enum import Enum
  
  import pycloudlib
+ 
  LOG = logging.getLogger()
  
  base_cfg = """#cloud-config
- ssh-import-id: [chad.smith, enr0n]
+ ssh-import-id: [chad.smith, enr0n, falcojr, holmanb, aciba]
  """
+ 
+ #source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
+ # - apt install systemd udev -y --allow-unauthenticated
  
  apt_cfg = """
  # Add developer PPA
  apt:
-  sources:
-    systemd-testing:
-  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
+  sources:
+systemd-testing:
+  source: {source}
  # upgrade systemd after cloud-init is nearly done
  runcmd:
-  - apt install systemd udev -y --allow-unauthenticated
+  - apt install systemd udev -y --allow-unauthenticated
  """
  
  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- 
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ 
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- LogRateLimitIntervalSec=0
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ LogRateLimitIntervalSec=0
  """
  
  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg
  
- def debug_systemd_image_launch_overlake_v5_with_snapshot():
- """Test overlake v5 timeouts
- 
- test procedure:
- - Launch base focal image
- - enable ppa:enr0n/systemd-245 and systemd/udev debugging
- - cloud-init clean --logs && deconfigure waalinux agent before shutdown
- - snapshot a base image
- - launch v5 system from snapshot
- - check systemd-analyze for expected timeout
- """
- client = pycloudlib.Azure(tag="azure")
- 
- image_id = client.daily_image(release="focal")
- pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
- priv_path = "/home/ubuntu/.ssh/id_rsa"
- 
- client.use_key(pub_path, priv_path)
- 
- base_instance = client.launch(
- image_id=image_id,
- instance_type="Standard_DS1_v2",
- user_data=cloud_config.format(defer="true"),
- )
- 
- LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
- base_instance.wait()
- LOG.info(base_instance.execute("apt cache policy systemd"))
- snapshotted_image_id = client.snapshot(base_instance)
- 
- reproducer = False
- tries = 0
- success_count_with_race = 0
- success_count_no_race = 0
- failure_count_network_delay = 0
- failure_count_no_altnames = 0
- TEST_SUMMARY_TMPL = """
- - Test run complete: {tries} attempted -
- Successes without rename race: {success_count_no_race}
- Successes with rename race and preserved altname: 
{success_count_with_race}
- Failures due to network delay: 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-24 Thread Nick Rosbrook
** Changed in: systemd (Ubuntu Focal)
 Assignee: Mustafa Kemal Gilor (mustafakemalgilor) => Nick Rosbrook (enr0n)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
     systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug

  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts

  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-24 Thread Timo Aaltonen
Hello Nick, or anyone else affected,

Accepted systemd into kinetic-proposed. The package will build now and
be available at
https://launchpad.net/ubuntu/+source/systemd/251.4-1ubuntu7.3 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
kinetic to verification-done-kinetic. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-kinetic. In either case, without details of your
testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: systemd (Ubuntu Kinetic)
   Status: In Progress => Fix Committed

** Tags removed: verification-failed-kinetic
** Tags added: verification-needed-kinetic

** Changed in: systemd (Ubuntu Jammy)
   Status: In Progress => Fix Committed

** Tags removed: verification-failed-jammy
** Tags added: verification-needed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
     systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug

  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts

  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-24 Thread Mustafa Kemal Gilor
** Changed in: systemd (Ubuntu Focal)
 Assignee: (unassigned) => Mustafa Kemal Gilor (mustafakemalgilor)

** Changed in: systemd (Ubuntu Focal)
   Status: Triaged => In Progress

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  In Progress
Status in systemd source package in Jammy:
  In Progress
Status in systemd source package in Kinetic:
  In Progress
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
     systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug

  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts

  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-21 Thread Lukas Märdian
** Changed in: systemd (Ubuntu Lunar)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  In Progress
Status in systemd source package in Kinetic:
  In Progress
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
     systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug

  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts

  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  instance_type="Standard_D8ds_v5",
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-20 Thread Nick Rosbrook
** Description changed:

  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.
  
  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.
  
  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.
  
  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.
  
  Sample script uses pycloudlib to create modified base image for test and
  launches 100 VMs of type Standard_D8ds_v5, counting both successes and
  any failures seen.
- 
  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""
  
  import logging
  import json
  
  import pycloudlib
  LOG = logging.getLogger()
  
  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """
  
  apt_cfg = """
  # Add developer PPA
  apt:
-  sources:
-systemd-testing:
-  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
+  sources:
+    systemd-testing:
+  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
-  - apt install systemd udev -y --allow-unauthenticated
+  - apt install systemd udev -y --allow-unauthenticated
  """
  
  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- 
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ 
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- LogRateLimitIntervalSec=0
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ LogRateLimitIntervalSec=0
  """
  
  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg
  
+ def debug_systemd_image_launch_overlake_v5_with_snapshot():
+ """Test overlake v5 timeouts
  
- def debug_systemd_image_launch_overlake_v5_with_snapshot():
- """Test overlake v5 timeouts
- 
- test procedure:
- - Launch base focal image
- - enable ppa:enr0n/systemd-245 and systemd/udev debugging
- - cloud-init clean --logs && deconfigure waalinux agent before shutdown
- - snapshot a base image
- - launch v5 system from snapshot
- - check systemd-analyze for expected timeout
- """
- client = pycloudlib.Azure(tag="azure")
+ test procedure:
+ - Launch base focal image
+ - enable ppa:enr0n/systemd-245 and systemd/udev debugging
+ - cloud-init clean --logs && deconfigure waalinux agent before shutdown
+ - snapshot a base image
+ - launch v5 system from snapshot
+ - check systemd-analyze for expected timeout
+ """
+ client = pycloudlib.Azure(tag="azure")
  
- image_id = client.daily_image(release="focal")
- pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
- priv_path = "/home/ubuntu/.ssh/id_rsa"
+ image_id = client.daily_image(release="focal")
+ pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
+ priv_path = "/home/ubuntu/.ssh/id_rsa"
  
- client.use_key(pub_path, priv_path)
+ client.use_key(pub_path, priv_path)
  
- base_instance = client.launch(
- image_id=image_id,
- instance_type="Standard_DS1_v2",
- user_data=cloud_config.format(defer="true"),
- )
+ base_instance = client.launch(
+ image_id=image_id,
+ instance_type="Standard_DS1_v2",
+ user_data=cloud_config.format(defer="true"),
+ )
  
- LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
- base_instance.wait()
- LOG.info(base_instance.execute("apt cache policy systemd"))
- snapshotted_image_id = client.snapshot(base_instance)
+ LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
+ 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-20 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/439251

** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/439253

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  In Progress
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  In Progress
Status in systemd source package in Kinetic:
  In Progress
Status in systemd source package in Lunar:
  In Progress

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
     systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug

  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
    owner: root:root
    defer: {defer}
    content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts

  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-20 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/439250

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  In Progress
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  In Progress
Status in systemd source package in Kinetic:
  In Progress
Status in systemd source package in Lunar:
  In Progress

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-20 Thread Nick Rosbrook
The fix is incomplete for jammy and newer, and we will need additional
patches from upstream.

** Tags removed: verification-needed-jammy verification-needed-kinetic
** Tags added: verification-failed-jammy verification-failed-kinetic

** Changed in: systemd (Ubuntu Lunar)
   Status: Fix Released => In Progress

** Changed in: systemd (Ubuntu Jammy)
   Status: Fix Committed => In Progress

** Changed in: systemd (Ubuntu Kinetic)
   Status: Fix Committed => In Progress

** Changed in: systemd (Ubuntu Jammy)
 Assignee: (unassigned) => Nick Rosbrook (enr0n)

** Changed in: systemd (Ubuntu Kinetic)
 Assignee: (unassigned) => Nick Rosbrook (enr0n)

** Changed in: systemd (Ubuntu Lunar)
 Assignee: (unassigned) => Nick Rosbrook (enr0n)

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  In Progress
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  In Progress
Status in systemd source package in Kinetic:
  In Progress
Status in systemd source package in Lunar:
  In Progress

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-15 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/438988

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-10 Thread Steve Langasek
Hello Nick, or anyone else affected,

Accepted systemd into jammy-proposed. The package will build now and be
available at
https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3.8 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
jammy to verification-done-jammy. If it does not fix the bug for you,
please add a comment stating that, and change the tag to verification-
failed-jammy. In either case, without details of your testing we will
not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: systemd (Ubuntu Jammy)
   Status: Triaged => Fix Committed

** Tags added: verification-needed-jammy

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Fix Committed
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-10 Thread Steve Langasek
Hello Nick, or anyone else affected,

Accepted systemd into kinetic-proposed. The package will build now and
be available at
https://launchpad.net/ubuntu/+source/systemd/251.4-1ubuntu7.2 in a few
hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested, what testing has been
performed on the package and change the tag from verification-needed-
kinetic to verification-done-kinetic. If it does not fix the bug for
you, please add a comment stating that, and change the tag to
verification-failed-kinetic. In either case, without details of your
testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Changed in: systemd (Ubuntu Kinetic)
   Status: Triaged => Fix Committed

** Tags added: verification-needed verification-needed-kinetic

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  Fix Committed
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-08 Thread Nick Rosbrook
** Changed in: systemd (Ubuntu Kinetic)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  Triaged
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  instance_type="Standard_D8ds_v5",
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-08 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/438555

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Released
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  Triaged
Status in systemd source package in Lunar:
  Fix Released

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-06 Thread Launchpad Bug Tracker
This bug was fixed in the package systemd - 252.5-2ubuntu1

---
systemd (252.5-2ubuntu1) lunar; urgency=medium

  * Merge 252.5-2 from Debian unstable
- Drop test-handle-Debian-s-etc-default-locale-in-testsuite-74.f.patch.
  Applied upstream: https://github.com/systemd/systemd/commit/9b42646b22
  File: 
debian/patches/test-handle-Debian-s-etc-default-locale-in-testsuite-74.f.patch
  
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=1b0789416172ec60d8086fe2b458b5396bb7e857
- Drop test-make-sure-mount-point-exists-in-testsuite-64.sh.patch.
  Applied upstream: https://github.com/systemd/systemd/commit/07e4787106
  File: 
debian/patches/test-make-sure-mount-point-exists-in-testsuite-64.sh.patch
  
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f97b2d5ae1a1f35668c4648f1c7fc715a588de50
- Drop test-remove-no-longer-needed-quirk-for-set-locale-on-Debi.patch.
  Fixed upstream: 
https://github.com/systemd/systemd-stable/commit/1c325f6d7f
  File: 
debian/patches/test-remove-no-longer-needed-quirk-for-set-locale-on-Debi.patch
  
https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=5f85226d61393c08d7ea51c2f28db7fd4c79bcc6
  * udev: avoid NIC renaming race with kernel (LP: #2002445)
Files:
- 
debian/patches/lp2002445-sd-netlink-add-a-test-for-rtnl_set_link_name.patch
- 
debian/patches/lp2002445-sd-netlink-do-not-swap-old-name-and-alternative-name.patch
- 
debian/patches/lp2002445-sd-netlink-restore-altname-on-error-in-rtnl_set_link_name.patch
- 
debian/patches/lp2002445-test-network-add-a-test-for-renaming-device-to-current-al.patch
- 
debian/patches/lp2002445-udev-attempt-device-rename-even-if-interface-is-up.patch
- 
debian/patches/lp2002445-udev-net-allow-new-link-name-as-an-altname-before-renamin.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=58d29c2b376f03c44ed5a719877c95b332018cdc
  * Deny-list TEST-74-AUX-UTILS on s390x.
Since this currently is only known to fail on the autopkgtest
infrastructure, we believe this is a temporary issue.
File: debian/patches/Deny-list-TEST-74-AUX-UTILS-on-s390x.patch

https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=a3a059d86e2fe3a104419ae2afcab557171f9809

systemd (252.5-2) unstable; urgency=medium

  * Fix boot-and-services autopkgtest.

systemd (252.5-1) unstable; urgency=medium

  [ Nick Rosbrook ]
  * debian/tests: remove systemd-fsckd autopkgtest. This test never runs
in Debian autopkgtest because of missing machine isolation
requirements, and it nevers runs in Ubuntu because:  SKIP: root file
system is being checked by initramfs already Since the test is not
providing any good feedback, and generally has not been maintained,
let's just remove it.

  [ Luca Boccassi ]
  * New upstream version 252.5
  * Drop patches merged in v252.5
  * Refresh patches
  * Set default status format to 'combined': show both unit name and
description in logs/boot messages

systemd (252.4-2) unstable; urgency=medium

  [ Michael Biebl ]
  * Refresh patches
  * Tweak description of systemd and systemd-sysv package.
Remove redundancy and de-emphasize sysvinit.
  * autopkgtest: add psmsic to upstream suite.
Needed for the killall binary.
See https://github.com/systemd/systemd/pull/24569
  * autopkgtest: add xkb-data, locales and locales-all to upstream suite.
Use locales-all so all necessary locales can be installed into the test
image without having to generate them on-the-fly.
See https://github.com/systemd/systemd/pull/23709
  * autopkgtest: prefer knot-dnssecutils over knot-dnsutils for upstream
suite.
The kzonecheck utility required by TEST-75-RESOLVED was split out from
knot-dnsutils into knot-dnssecutils so update the test dependencies
accordingly. Keep knot-dnsutils as alternative dependency to make
backports easier.
  * Cherry-pick upstream fixes for TEST-74-AUX-UTILS
  * Cherry-pick upstream fix for TEST-73-LOCALE
  * Skip firstboot --prompt-keymap check in TEST-74-AUX-UTILS.
This test requires compatible keymaps from kbd which are not available
in Debian.

  [ Luca Boccassi ]
  * autopkgtest: add netlabel-tools to networkd-test.py suite.
The netlabelctl tool is needed to test the NetLabel integration.
See https://github.com/systemd/systemd/pull/23888
  * autopkgtest: add bsdutils to upstream suite.
The logger utility is now used in TEST-04-JOURNAL.
See https://github.com/systemd/systemd/pull/23086
  * autopkgtest: add knot, knot-dnsutils, bind9-dnsutils, bind9-host to
upstream suite.
Needed by TEST-75-RESOLVED.
See https://github.com/systemd/systemd/pull/23104
  * autopkgtest: add jq to upstream suite.
Needed by TEST-58-REPART.
See https://github.com/systemd/systemd/pull/24572
  * autopkgtest: add mtools to upstream suite.

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-03-02 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/438247

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.

  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.

  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.

  Sample script uses pycloudlib to create modified base image for test
  and launches 100 VMs of type Standard_D8ds_v5, counting both successes
  and any failures seen.

  
  #!/usr/bin/env python3
  # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""

  import logging
  import json

  import pycloudlib
  LOG = logging.getLogger()

  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """

  apt_cfg = """
  # Add developer PPA
  apt:
   sources:
 systemd-testing:
   source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
   - apt install systemd udev -y --allow-unauthenticated
  """

  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
owner: root:root
defer: {defer}
content: |
  [Service]
  Environment=SYSTEMD_LOG_LEVEL=debug
  LogRateLimitIntervalSec=0
  """

  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg

  
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
  """Test overlake v5 timeouts
  
  test procedure:
  - Launch base focal image
  - enable ppa:enr0n/systemd-245 and systemd/udev debugging
  - cloud-init clean --logs && deconfigure waalinux agent before shutdown
  - snapshot a base image
  - launch v5 system from snapshot
  - check systemd-analyze for expected timeout
  """
  client = pycloudlib.Azure(tag="azure")

  image_id = client.daily_image(release="focal")
  pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
  priv_path = "/home/ubuntu/.ssh/id_rsa"

  client.use_key(pub_path, priv_path)

  base_instance = client.launch(
  image_id=image_id,
  instance_type="Standard_DS1_v2",
  user_data=cloud_config.format(defer="true"),
  )

  LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
  base_instance.wait()
  LOG.info(base_instance.execute("apt cache policy systemd"))
  snapshotted_image_id = client.snapshot(base_instance)

  reproducer = False
  tries = 0
  success_count_with_race = 0
  success_count_no_race = 0
  failure_count_network_delay = 0
  failure_count_no_altnames = 0
  TEST_SUMMARY_TMPL = """
  - Test run complete: {tries} attempted -
  Successes without rename race: {success_count_no_race}
  Successes with rename race and preserved altname: 
{success_count_with_race}
  Failures due to network delay: {failure_count_network_delay}
  Failures due to no altnames persisted: {failure_count_no_altnames}
  ===
  """
  while tries < 100 and not reproducer:
  tries += 1
  new_instance = client.launch(
  image_id=snapshotted_image_id,
  

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-27 Thread Chad Smith
** Description changed:

  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.
  
  [Test Plan]
  Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.
  
  To check for failure symptom:
    - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame.
  
  To assert success condition during net rename busy race:
    - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.
  
  Sample script uses pycloudlib to create modified base image for test and
  launches 100 VMs of type Standard_D8ds_v5, counting both successes and
  any failures seen.
  
+ 
  #!/usr/bin/env python3
- 
+ # This file is part of pycloudlib. See LICENSE file for license information.
  """Basic examples of various lifecycle with an Azure instance."""
  
  import logging
  import json
  
  import pycloudlib
  LOG = logging.getLogger()
  
  base_cfg = """#cloud-config
  ssh-import-id: [chad.smith, enr0n]
  """
  
  apt_cfg = """
  # Add developer PPA
  apt:
-  sources:
-    systemd-testing:
-  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
+  sources:
+systemd-testing:
+  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
  # upgrade systemd after cloud-init is nearly done
  runcmd:
-  - apt install systemd udev -y --allow-unauthenticated
+  - apt install systemd udev -y --allow-unauthenticated
  """
  
  debug_systemd_cfg = """
  # Create systemd-udev debug override.conf in base image
  write_files:
  - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- 
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ 
  - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
-   owner: root:root
-   defer: {defer}
-   content: |
- [Service]
- Environment=SYSTEMD_LOG_LEVEL=debug
- LogRateLimitIntervalSec=0
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ LogRateLimitIntervalSec=0
  """
  
  cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
  cloud_config2 = base_cfg + debug_systemd_cfg
  
+ 
  def debug_systemd_image_launch_overlake_v5_with_snapshot():
- """Test overlake v5 timeouts
+ """Test overlake v5 timeouts
+ 
+ test procedure:
+ - Launch base focal image
+ - enable ppa:enr0n/systemd-245 and systemd/udev debugging
+ - cloud-init clean --logs && deconfigure waalinux agent before shutdown
+ - snapshot a base image
+ - launch v5 system from snapshot
+ - check systemd-analyze for expected timeout
+ """
+ client = pycloudlib.Azure(tag="azure")
  
- test procedure:
- - Launch base focal image
- - enable ppa:enr0n/systemd-245 and systemd/udev debugging
- - cloud-init clean --logs && deconfigure waalinux agent before shutdown
- - snapshot a base image
- - launch v5 system from snapshot
- - check systemd-analyze for expected timeout
- """
- client = pycloudlib.Azure(tag="azure")
+ image_id = client.daily_image(release="focal")
+ pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
+ priv_path = "/home/ubuntu/.ssh/id_rsa"
  
- image_id = client.daily_image(release="focal")
- pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
- priv_path = "/home/ubuntu/.ssh/id_rsa"
+ client.use_key(pub_path, priv_path)
  
- client.use_key(pub_path, priv_path)
+ base_instance = client.launch(
+ image_id=image_id,
+ instance_type="Standard_DS1_v2",
+ user_data=cloud_config.format(defer="true"),
+ )
  
- base_instance = client.launch(
- image_id=image_id,
- instance_type="Standard_DS1_v2",
- user_data=cloud_config.format(defer="true"),
- )
+ LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
+ base_instance.wait()
+ LOG.info(base_instance.execute("apt cache policy systemd"))
+ snapshotted_image_id = client.snapshot(base_instance)
  
- LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
- base_instance.wait()
- LOG.info(base_instance.execute("apt cache 

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-27 Thread Chad Smith
** Description changed:

  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.
  
  [Test Plan]
- Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.
+ Repeated launches of Standard_D8ds_v5 instance types will generally hit this 
race around 1 in 10 runs. Create a vm snapshot with updated systemd from 
ppa:enr0n/systemd-245. Launch 100 Standard_D8ds_v5 instances with updated 
systemd. Assert not failure in cloud-init status and no 2 minute delay in 
network-online.target.
+ 
+ 
+ To check for failure symptom:
+   - Assert that network-online.target isn't the longest pole from 
systemd-analyze blame. 
+ 
+ To assert success condition during net rename busy race:
+   - assert when "eth1" is still the primary device name, that two altnames 
are listed (preserving the altname due to the primary NIC rename being hit.
+ 
+ 
+ Sample script uses pycloudlib to create modified base image for test and
+ launches 100 VMs of type Standard_D8ds_v5, counting both successes and
+ any failures seen.
+ 
+ #!/usr/bin/env python3
+ 
+ 
+ """Basic examples of various lifecycle with an Azure instance."""
+ 
+ import logging
+ import json
+ 
+ import pycloudlib
+ LOG = logging.getLogger()
+ 
+ base_cfg = """#cloud-config
+ ssh-import-id: [chad.smith, enr0n]
+ """
+ 
+ apt_cfg = """
+ # Add developer PPA
+ apt:
+  sources:
+systemd-testing:
+  source: "deb [allow-insecure=yes] 
https://ppa.launchpadcontent.net/enr0n/systemd-245/ubuntu focal main"
+ # upgrade systemd after cloud-init is nearly done
+ runcmd:
+  - apt install systemd udev -y --allow-unauthenticated
+ """
+ 
+ debug_systemd_cfg = """
+ # Create systemd-udev debug override.conf in base image
+ write_files:
+ - path: /etc/systemd/system/systemd-networkd.service.d/override.conf
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ 
+ - path: /etc/systemd/system/systemd-udevd.service.d/override.conf
+   owner: root:root
+   defer: {defer}
+   content: |
+ [Service]
+ Environment=SYSTEMD_LOG_LEVEL=debug
+ LogRateLimitIntervalSec=0
+ """
+ 
+ cloud_config = base_cfg + apt_cfg + debug_systemd_cfg
+ cloud_config2 = base_cfg + debug_systemd_cfg
+ 
+ 
+ def debug_systemd_image_launch_overlake_v5_with_snapshot():
+ """Test overlake v5 timeouts
+ 
+ test procedure:
+ - Launch base focal image
+ - enable ppa:enr0n/systemd-245 and systemd/udev debugging
+ - cloud-init clean --logs && deconfigure waalinux agent before shutdown
+ - snapshot a base image
+ - launch v5 system from snapshot
+ - check systemd-analyze for expected timeout
+ """
+ client = pycloudlib.Azure(tag="azure")
+ 
+ image_id = client.daily_image(release="focal")
+ pub_path = "/home/ubuntu/.ssh/id_rsa.pub"
+ priv_path = "/home/ubuntu/.ssh/id_rsa"
+ 
+ client.use_key(pub_path, priv_path)
+ 
+ base_instance = client.launch(
+ image_id=image_id,
+ instance_type="Standard_DS1_v2",
+ user_data=cloud_config.format(defer="true"),
+ )
+ 
+ LOG.info(f"base instance: ssh ubuntu@{base_instance.ip}")
+ base_instance.wait()
+ LOG.info(base_instance.execute("apt cache policy systemd"))
+ snapshotted_image_id = client.snapshot(base_instance)
+ 
+ reproducer = False
+ tries = 0
+ success_count_with_race = 0
+ success_count_no_race = 0
+ failure_count_network_delay = 0
+ failure_count_no_altnames = 0
+ TEST_SUMMARY_TMPL = """
+ - Test run complete: {tries} attempted -
+ Successes without rename race: {success_count_no_race}
+ Successes with rename race and preserved altname: 
{success_count_with_race_persist_altname}
+ Failures due to network delay: {failure_count_network_delay}
+ Failures due to no altnames persisted: {failure_count_no_altnames}
+ ===
+ """
+ while tries < 10 and not reproducer:
+ tries += 1
+ new_instance = client.launch(
+ image_id=snapshotted_image_id,
+ instance_type="Standard_D8ds_v5",
+ user_data=cloud_config.format(defer="false"),
+ )
+ # breadcrumb for us pycloudlib/Azure  will reuse available IPs
+ new_instance.wait()
+ blame = new_instance.execute("systemd-analyze blame").splitlines()
+ LOG.info(f"--- Attempt {tries} ssh ubuntu@{new_instance.ip} Blame: 
{blame[0]}")
+ ip_addr = json.loads(new_instance.execute("ip -j addr").stdout)
+ for d in ip_addr:
+   

[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-27 Thread Nick Rosbrook
** Changed in: systemd (Ubuntu Lunar)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  Fix Committed
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  Fix Committed

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.

  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-23 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/437809

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  New
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  New

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.

  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-13 Thread Lukas Märdian
Fixed upstream in https://github.com/systemd/systemd/pull/25221

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  New
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  New

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.

  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-10 Thread Launchpad Bug Tracker
** Merge proposal linked:
   
https://code.launchpad.net/~enr0n/ubuntu/+source/systemd/+git/systemd/+merge/437150

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  New
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  New

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.

  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp


[Touch-packages] [Bug 2002445] Re: udev NIC renaming race with mlx5_core driver

2023-02-09 Thread Nick Rosbrook
** Also affects: systemd (Ubuntu Lunar)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: systemd (Ubuntu Kinetic)
   Importance: Undecided
   Status: New

** Description changed:

  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.
  
  [Test Plan]
- TODO
+ Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.
  
  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

** Changed in: systemd (Ubuntu Focal)
   Status: New => Triaged

** Changed in: systemd (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: systemd (Ubuntu Jammy)
   Status: New => Triaged

** Changed in: systemd (Ubuntu Jammy)
   Importance: Undecided => Medium

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/2002445

Title:
  udev NIC renaming race with mlx5_core driver

Status in systemd package in Ubuntu:
  New
Status in systemd source package in Focal:
  Triaged
Status in systemd source package in Jammy:
  Triaged
Status in systemd source package in Kinetic:
  New
Status in systemd source package in Lunar:
  New

Bug description:
  [Impact]
  On systems with mellanox NICs, udev's NIC renaming races with the mlx5_core 
driver's own configuration of subordinate interfaces. When the kernel wins this 
race, the device cannot be renamed as udev has attempted, and this causes 
systemd-network-online.target to timeout waiting for links to be configured. 
This ultimately results in boot being delayed by about 2 minutes.

  [Test Plan]
  Since this is a race condition, we need to boot many instances before we see 
the issue. The Ubuntu Server team will help coordinate the testing at scale to 
confirm the fix.

  [Where problems could occur]
  The patches effectively make it so that if a interface cannot be renamed from 
udev, then the new name is left as an alternative name as a fallback. If 
problems occur, it would be related to device renaming, and particularly 
related to the devices alternative names.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/2002445/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp