(b) Regarding kernel, I found some oddities too [ output of dmesg | grep
-v "audit\|apparmor" ]:

GOOD: <suppressed non-related intermixed entries, like ext4's>

[ 1627.732924] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: VF slot 1 
added
[ 1627.733637] hv_pci fb9ea909-d0dd-41b6-a1c2-98b1233e987d: PCI VMBus probing: 
Using version 0x10002
[ 1627.742469] hv_pci fb9ea909-d0dd-41b6-a1c2-98b1233e987d: PCI host bridge to 
bus d0dd:00
[ 1627.742472] pci_bus d0dd:00: root bus resource [mem 0xfe0000000-0xfe00fffff 
window]
[ 1627.743208] pci d0dd:00:02.0: [15b3:1016] type 00 class 0x020000
[ 1627.747302] pci d0dd:00:02.0: reg 0x10: [mem 0xfe0000000-0xfe00fffff 64bit 
pref]
[ 1627.825728] pci d0dd:00:02.0: enabling Extended Tags
[ 1627.830106] pci d0dd:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited 
by Unknown x0 link at d0dd:00:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe 
x8 link)
[ 1627.834774] pci d0dd:00:02.0: BAR 0: assigned [mem 0xfe0000000-0xfe00fffff 
64bit pref]
[ 1627.926227] mlx5_core d0dd:00:02.0: firmware version: 14.25.8362
[ 1627.936526] mlx5_core d0dd:00:02.0: handle_hca_cap:526:(pid 619): log_max_qp 
value in current profile is 18, changing it to HCA capability limit (12)
[ 1628.131452] mlx5_core d0dd:00:02.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) 
RxCqeCmprss(0)
[ 1628.256107] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: VF 
registering: eth1
[ 1628.256147] mlx5_core d0dd:00:02.0 eth1: joined to eth0
[ 1628.257059] mlx5_core d0dd:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[ 1628.266117] mlx5_core d0dd:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[ 1628.266803] mlx5_core d0dd:00:02.0 enP53469s1np0: renamed from eth1
[ 1628.305588] mlx5_core d0dd:00:02.0 enP53469s1np0: Disabling LRO, not 
supported in legacy RQ
[ 1628.444056] mlx5_core d0dd:00:02.0 enP53469s1np0: Link up
[ 1628.445592] hv_netvsc 00224840-7fbf-0022-4840-7fbf00224840 eth0: Data path 
switched to VF: enP53469s1np0

BAD:
[    5.211059] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 
added
[    5.211618] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI VMBus probing: 
Using version 0x10002
[    5.220736] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI host bridge to 
bus 6e53:00
[    5.220739] pci_bus 6e53:00: root bus resource [mem 0xfe0000000-0xfe00fffff 
window]
[    5.221465] pci 6e53:00:02.0: [15b3:1016] type 00 class 0x020000
[    5.239844] pci 6e53:00:02.0: reg 0x10: [mem 0xfe0000000-0xfe00fffff 64bit 
pref]
[    5.317841] pci 6e53:00:02.0: enabling Extended Tags
[    5.322168] pci 6e53:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited 
by Unknown x0 link at 6e53:00:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe 
x8 link)
[    5.325766] pci 6e53:00:02.0: BAR 0: assigned [mem 0xfe0000000-0xfe00fffff 
64bit pref]
[    5.412415] mlx5_core 6e53:00:02.0: firmware version: 14.25.8102
[    5.424153] mlx5_core 6e53:00:02.0: handle_hca_cap:526:(pid 7): log_max_qp 
value in current profile is 18, changing it to HCA capability limit (12)
[    5.613614] mlx5_core 6e53:00:02.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) 
RxCqeCmprss(0)
[    5.727686] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF 
registering: eth1
[    5.727714] mlx5_core 6e53:00:02.0 eth1: joined to eth0
[    5.728473] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[    5.740989] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[    5.741670] mlx5_core 6e53:00:02.0 enP28243s1np0: renamed from eth1
[    5.785980] mlx5_core 6e53:00:02.0 enP28243s1np0: Disabling LRO, not 
supported in legacy RQ
[    8.790213] mlx5_core 6e53:00:02.0 enP28243s1np0: Link up
[    8.791599] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path 
switched to VF: enP28243s1np0
[    8.792382] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    9.759956] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path 
switched from VF: enP28243s1np0
[   10.388005] mlx5_core 6e53:00:02.0 enP28243s1np0: Link up
[   10.389894] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path 
switched to VF: enP28243s1np0
[...]
[79404.014695] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: Data path 
switched from VF: enP28243s1np0
[79404.015275] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF 
unregistering: enP28243s1np0
[79404.988951] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 
removed
[79536.467267] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 
added
[79536.467725] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI VMBus probing: 
Using version 0x10002
[79536.477073] hv_pci e94e34d7-6e53-4ab5-95d1-4328102a7c87: PCI host bridge to 
bus 6e53:00
[79536.477075] pci_bus 6e53:00: root bus resource [mem 0xfe0000000-0xfe00fffff 
window]
[79536.477816] pci 6e53:00:02.0: [15b3:1016] type 00 class 0x020000
[79536.491917] pci 6e53:00:02.0: reg 0x10: [mem 0xfe0000000-0xfe00fffff 64bit 
pref]
[79536.883663] pci 6e53:00:02.0: enabling Extended Tags
[79536.889425] pci 6e53:00:02.0: 0.000 Gb/s available PCIe bandwidth, limited 
by Unknown x0 link at 6e53:00:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe 
x8 link)
[79536.893546] pci 6e53:00:02.0: BAR 0: assigned [mem 0xfe0000000-0xfe00fffff 
64bit pref]
[79536.899810] mlx5_core 6e53:00:02.0: firmware version: 14.25.8102
[79537.097715] mlx5_core 6e53:00:02.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) 
RxCqeCmprss(0)
[79537.245178] hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF 
registering: eth1
[79537.245230] mlx5_core 6e53:00:02.0 eth1: joined to eth0
[79537.246141] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[79537.251960] mlx5_core 6e53:00:02.0 eth1: Disabling LRO, not supported in 
legacy RQ
[79537.253846] mlx5_core 6e53:00:02.0 enP28243s1np0: renamed from eth1
[79537.287953] mlx5_core 6e53:00:02.0 enP28243s1np0: Disabling LRO, not 
supported in legacy RQ
[79537.429864] mlx5_core 6e53:00:02.0 enP28243s1np0: Link up

Two interesting observations:
(1) The first "Link up" message took > 3 seconds in the BAD case, whereas it 
took < 150ms in the good case.
(2) In the BAD case we have a "datapath switch" event:

eth0: Data path switched to VF: enP28243s1np0
eth0: Data path switched from VF: enP28243s1np0
eth0: Data path switched to VF: enP28243s1np0

This is triggered in the hyper-v netvsc driver, in function netvsc_vf_changed() 
- see the comment in the function:
/* Change the data path when VF UP/DOWN/CHANGE are detected.
 *
 * Typically a UP or DOWN event is followed by a CHANGE event, so
 * net_device_ctx->data_path_is_vf is used to cache the current data path
 * to avoid the duplicate call of netvsc_switch_datapath() and the duplicate
 * message.

The event itself comes from a notifier_call mechanism.

(3) We have an event like a hotplug long after the first apparent
network setup:

[...]
hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 removed
hv_netvsc 000d3ad6-1871-000d-3ad6-1871000d3ad6 eth0: VF slot 1 added
[...]

This happens in netvsc_send_vf(), which seems to be another type of
packet processing (need to study more, seems related to something called
inband messages, a raw packet communication).

Items (2) and (3) point to some events on hypervisor level, in my
understanding.

Also, I'm curious if we could try to reproduce in an image running
Hirsute kernel 5.11 and in Groovy (running its native 5.8 kernel) - here
the issue seems to have been observed in Hirsute images running kernel
5.8.

Thanks!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1919177

Title:
  Azure: issues with accelerated networking on Hirsute

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1919177/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to