** Description changed:

  [ Impact ]
  
  * netplan-sriov-apply.service can sometimes fail to configure sriov
  interfaces.
  
  * Issue happens when netplan is performing per interface configuration and 
udev rules
-   are modifying PF interface names. If that happens netplan will fail to get 
some PF related data
-   as expected /sys/class/net/<ifname>/ directory will no longer exist.
-   
+   are modifying PF interface names. If that happens netplan will fail to get 
some PF related data
+   as expected /sys/class/net/<ifname>/ directory will no longer exist.
+ 
  * Depending on the timing between netplan-sriov-apply.service and udev rules 
execution, one or more
-   PF interfaces might be unconfigured.
-   
+   PF interfaces might be unconfigured.
+ 
  * This issue might a be root cause for following netplan bugs:
-   - https://bugs.launchpad.net/netplan/+bug/1988018
-   - https://bugs.launchpad.net/netplan/+bug/2020409
-   
+   - https://bugs.launchpad.net/netplan/+bug/1988018
+   - https://bugs.launchpad.net/netplan/+bug/2020409
+ 
  * A proposed solution is to make sure that udev rules are triggered and 
finished before netplan-sriov-apply.service
-   starts executing.
+   starts executing.
  
  * Issue was most likely introduced by 
https://bugs.launchpad.net/netplan/+bug/1988018
-   - this change introduced netplan-sriov-apply.service
-   - jammy 0.107.1-3ubuntu0.22.04.2 is still in -proposed
-   - noble/questing/resolute released it as part of v1.0
-   
+   - this change introduced netplan-sriov-apply.service
+   - jammy 0.107.1-3ubuntu0.22.04.2 is still in -proposed
+   - noble/questing/resolute released it as part of v1.0
+ 
  * Issue is reproduced when user specifies set-name config value with a name 
different than what systemd networkd generated
-   - During the boot process, interface will first be renamed to ethX, then 
networkd will apply its PCI address based naming,
-     and only then udev will process rules created by using set-name config 
value.
-   - If set-name is not used or name specified in set-name is the same as the 
one networkd generated, issue will not reproduce.
- 
+   - During the boot process, interface will first be renamed to ethX, then 
networkd will apply its PCI address based naming,
+     and only then udev will process rules created by using set-name config 
value.
+   - If set-name is not used or name specified in set-name is the same as the 
one networkd generated, issue will not reproduce.
  
  [ Test Plan ]
  
-  * Create a netplan config which modifies interface name and sets sriov 
config, for instance:
-   50-if.yaml:
-  network:
-   ethernets:
-     ens1f0:
-       match:
-         macaddress: b8:3f:d2:09:38:94
-       mtu: 1500
-       optional: true
-       set-name: ens1f0
-     ens1f1:
-       match:
-         macaddress: b8:3f:d2:09:38:94
-       mtu: 1500
-       optional: true
-       set-name: ens1f1
+  * Create a netplan config which modifies interface name and sets sriov 
config, for instance:
+   50-if.yaml:
+  network:
+   ethernets:
+     ens1f0:
+       match:
+         macaddress: b8:3f:d2:09:38:94
+       mtu: 1500
+       optional: true
+       set-name: ens1f0
+     ens1f1:
+       match:
+         macaddress: b8:3f:d2:09:38:94
+       mtu: 1500
+       optional: true
+       set-name: ens1f1
  
-  99-sriov.yaml:
-  network:
-   version: 2
-   ethernets:
-     ens1f0:
-       virtual-function-count: 32
-       embedded-switch-mode: switchdev
-       delay-virtual-functions-rebind: true
-   ethernets:
-     ens1f1:
-       virtual-function-count: 32
-       embedded-switch-mode: switchdev
-       delay-virtual-functions-rebind: true
+  99-sriov.yaml:
+  network:
+   version: 2
+   ethernets:
+     ens1f0:
+       virtual-function-count: 32
+       embedded-switch-mode: switchdev
+       delay-virtual-functions-rebind: true
+   ethernets:
+     ens1f1:
+       virtual-function-count: 32
+       embedded-switch-mode: switchdev
+       delay-virtual-functions-rebind: true
  
+ NOTE: name generated for these interfaces by networkd are ens1f0np0 and
+ ens1f1np1
  
- NOTE: name generated for these interfaces by networkd are ens1f0np0 and 
ens1f1np1
+  * Reboot the host with above config
  
-  * Reboot the host with above config
+  * After reboot verify if sriov configuration was properly applied on the 
interface.
+  Expected result:
+ Config was properly applied by netplan-sriov-apply.service
  
-  * After reboot verify if sriov configuration was properly applied on the 
interface.
-  Expected result:
- Config was properly applied by netplan-sriov-apply.service
-  
-  Actual results:
+  Actual results:
  Feb 02 12:15:49 doopliss netplan[1163]: ERROR:root:could not determine vendor 
and device ID of ens1f1np1: [Errno 2] No such file or directory: 
'/sys/class/net/ens1f1np1/device/vendor'
  Feb 02 12:15:49 doopliss systemd[1]: netplan-sriov-apply.service: Main 
process exited, code=exited, status=1/FAILURE
  Feb 02 12:15:49 doopliss systemd[1]: netplan-sriov-apply.service: Failed with 
result 'exit-code'.
  
  In this example, netplan-sriov-apply.service started around Feb 02 12:15:27, 
it properly configured first interface using old name ens1f0np0.
  Then second interface ens1f1np1 was renamed:
  Feb 02 12:15:37 doopliss kernel: mlx5_core 0000:4b:00.1 ens1f1: renamed from 
ens1f1np1
  Netplan using name ens1f1np1 failed to get 
/sys/class/net/ens1f1np1/device/vendor, as new proper path should be 
/sys/class/net/ens1f1/device/vendor
  
  This is just an example, when interface name changes when 
netplan-sriov.apply.service is running, netplan can fail in different parts of 
the code which can result in similar Error log:
  "[Errno 2] No such file or directory" such as mentioned in LP1988018:
  Apr 16 15:44:44 romano netplan[1171]: failed parsing sriov_totalvfs for 
ens7f1np1: [Errno 2] No such file or directory: 
'/sys/class/net/ens7f1np1/device/sriov_totalvfs'
  
- 
  [ Where problems could occur ]
  
-  * Proposed change is making sure that udev rules are triggered and done 
before netplan-sriov-apply.service starts.
-    Inspecting current `netplan apply` logic shows that this is already 
performed in the code for `netplan apply` command
-    but is missing from `netplan apply --sriov-only` which is called by 
netplan-sriov-apply.service.
+  * Proposed change is making sure that udev rules are triggered and done 
before netplan-sriov-apply.service starts.
+    Inspecting current `netplan apply` logic shows that this is already 
performed in the code for `netplan apply` command
+    but is missing from `netplan apply --sriov-only` which is called by 
netplan-sriov-apply.service.
  
-  * If there are any other processes which are modifying interface names,
+  * If there are any other processes which are modifying interface names,
  issue can still be reproduced.
  
-  * With new change following commands will be executed:
-    - udevadm control --reload
-    - udevadm trigger --action=add --subsystem-match=net
-    - udevadm settle
-    If any of the commands hangs, service might not start properly and leave 
interfaces unconfigured.
- 
+  * With new change following commands will be executed:
+    - udevadm control --reload
+    - udevadm trigger --action=add --subsystem-match=net
+    - udevadm settle
+    If any of the commands hangs, service might not start properly and leave 
interfaces unconfigured.
  
  [ Other Info ]
  
-  * Issue can be quite reliable reproduced on jammy-proposed
+  * Issue can be quite reliable reproduced on jammy-proposed
  
-  * I was not able to reproduce issue on Noble, when applying the same 
configuration. Once netplan-sriov-apply.service starts interfaces are already 
set to proper name. This might points to differences in systemd.
-    This also doesn't mean that issue can't be reproduced. Service requires 
already set interface names and current settings does not guarantee that.
+  * I was not able to reproduce issue on Noble, when applying the same 
configuration. Once netplan-sriov-apply.service starts interfaces are already 
set to proper name. This might points to differences in systemd.
+    This also doesn't mean that issue can't be reproduced. Service requires 
already set interface names and current settings does not guarantee that.
+ 
+ * Fix was verified on PS6 environment which reported issues in LP2020409

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2139598

Title:
  Netplan can crash when applying sriov config

To manage notifications about this bug go to:
https://bugs.launchpad.net/netplan/+bug/2139598/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to