Hi All,

I am currently experimenting with a feature newly supported by DPDK 23.11, 
known as "multiport 
e-switch<https://doc.dpdk.org/guides/nics/mlx5.html#multiport-e-switch>" to 
improve communication reliability on the server side. During the trials, I 
encountered an issue in which activating multiport e-switch mode on the NIC 
disrupts the hypervisor’s software running on the second PF interface (PF1). 
More specifically, packets coming from the second PF (PF1) cannot be delivered 
to hypervisor’s kernel network stack, right after setting the multiport 
e-switch mode for the NIC as guided in documentation. A snapshot of the packet 
trace comparison on the second PF (PF1, ens2f1np1) before and after setting the 
multiport e-switch mode is attached here.  Packets marked with the gray 
color/italic in the second trace are missing under the multiport e-switch mode.

----<test environment>-----
ConnectX-6 Dx with firmware version 22.39.1002
Linux kernel version: 6.6.16
DPDK: 23.11
----</test environment>------

----<packet trace after setting multiport e-switch mode>------
14:37:24.835716 04:3f:72:e8:cf:cb > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 78: fe80::63f:72ff:fee8:cfcb > ff02::1: ICMP6, router advertisement, 
length 24

14:37:28.527829 90:3c:b3:33:83:fb > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 78: fe80::923c:b3ff:fe33:83fb > ff02::1: ICMP6, router advertisement, 
length 24

14:37:28.528359 04:3f:72:e8:cf:cb > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 94: fe80::63f:72ff:fee8:cfcb.54096 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [S], seq 2779843599, win 33120, options [mss 1440,sackOK,TS val 
1610632473 ecr 0,nop,wscale 7], length 0 // link-local addresses are used

14:37:29.559918 04:3f:72:e8:cf:cb > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 94: fe80::63f:72ff:fee8:cfcb.54096 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [S], seq 2779843599, win 33120, options [mss 1440,sackOK,TS val 
1610633505 ecr 0,nop,wscale 7], length 0

14:37:30.583925 04:3f:72:e8:cf:cb > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 94: fe80::63f:72ff:fee8:cfcb.54096 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [S], seq 2779843599, win 33120, options [mss 1440,sackOK,TS val 
1610634529 ecr 0,nop,wscale 7], length 0
----</packet trace after setting multiport e-switch mode>------

----<packet trace before setting multiport e-switch mode> ------
16:09:40.375865 90:3c:b3:33:83:fb > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), 
length 78: fe80::923c:b3ff:fe33:83fb > ff02::1: ICMP6, router advertisement, 
length 24

16:09:40.376473 fa:e4:cf:2d:11:b9 > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 94: fe80::f8e4:cfff:fe2d:11b9.36168 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [S], seq 3409227589, win 33120, options [mss 1440,sackOK,TS val 
2302010436 ecr 0,nop,wscale 7], length 0

16:09:40.376692 90:3c:b3:33:83:fb > fa:e4:cf:2d:11:b9, ethertype IPv6 (0x86dd), 
length 94: fe80::923c:b3ff:fe33:83fb.179 > fe80::f8e4:cfff:fe2d:11b9.36168: 
Flags [S.], seq 3495571820, ack 3409227590, win 63196, options [mss 
9040,sackOK,TS val 1054058675 ecr 2302010436,nop,wscale 9], length 0

16:09:40.376711 fa:e4:cf:2d:11:b9 > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 86: fe80::f8e4:cfff:fe2d:11b9.36168 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [.], ack 1, win 259, options [nop,nop,TS val 2302010436 ecr 1054058675], 
length 0

16:09:40.376865 fa:e4:cf:2d:11:b9 > 90:3c:b3:33:83:fb, ethertype IPv6 (0x86dd), 
length 193: fe80::f8e4:cfff:fe2d:11b9.36168 > fe80::923c:b3ff:fe33:83fb.179: 
Flags [P.], seq 1:108, ack 1, win 259, options [nop,nop,TS val 2302010436 ecr 
1054058675], length 107: BGP

16:09:40.376986 90:3c:b3:33:83:fb > fa:e4:cf:2d:11:b9, ethertype IPv6 (0x86dd), 
length 86: fe80::923c:b3ff:fe33:83fb.179 > fe80::f8e4:cfff:fe2d:11b9.36168: 
Flags [.], ack 108, win 124, options [nop,nop,TS val 1054058676 ecr 
2302010436], length 0
----</packet trace before setting multiport e-switch mode> ------

Attempts to ping from another directly connected host to this hypervisor also 
resulted in incoming ICMP packets not being captured, which is reproducible in 
another testing environment setup. In the end, I was able to restore 
communication on the second PF by using a vdev TAP device and performing packet 
forwarding between the TAP device and PF1, as shown in our public examplary 
code<https://github.com/byteocean/multiport-eswitch-example>.

Enabling the isolation mode on PF1 by starting testpmd or programmably using 
`rte_flow_isolate()` leads to no change from the behavior as described above, 
but only affects whether packets can be captured and processed by the DPDK 
application.
----<command to start testpmd> ------
sudo ./dpdk-testpmd -a 
3b:00.0,dv_flow_en=2,dv_esw_en=1,fdb_def_rule_en=1,representor=pf0-1vf0 -- -i 
--rxq=1 --txq=1 --flow-isolate-all
----</command to start testpmd> ------

Any experience sharing or comment on the above described issue is very 
appreciated. Thanks a lot in advance.

Best regards,
Tao Li



Reply via email to