On 2/7/25 06:46, Mike Pattrick wrote:
> On Thu, Feb 6, 2025 at 9:15 AM David Marchand <[email protected]> 
> wrote:
>>
>> Hello,
>>
>> On Wed, Feb 5, 2025 at 1:55 PM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 1/23/25 16:56, David Marchand wrote:
>>>> Rather than drop all pending Tx offloads on recirculation,
>>>> preserve inner offloads (and mark packet with outer Tx offloads)
>>>> when parsing the packet again.
>>>>
>>>> Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on 
>>>> recirculation.")
>>>> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
>>>> Signed-off-by: David Marchand <[email protected]>
>>>> ---
>>>> Changes since v1:
>>>> - rebased,
>>>> - dropped API change on miniflow_extract(), rely on tunnel offloading
>>>>   flag presence instead,
>>>> - introduced dp_packet_reset_outer_offsets,
>>>>
>>>> ---
>>>>  lib/dp-packet.h   | 23 +++++++++++------------
>>>>  lib/dpif-netdev.c | 27 ---------------------------
>>>>  lib/flow.c        | 34 ++++++++++++++++++++++++++++------
>>>>  3 files changed, 39 insertions(+), 45 deletions(-)
>>>
>>> Hi, David.  Thanks for the patch!
>>>
>>> Did you run some performance tests with this change?  It touches the very
>>> core of packet parsing, so we need to check how that impacts normal V2V or
>>> PVP scenarios even without tunneling.
>>
>> I would be surprised those added branches add much to the already good
>> number of branches in miniflow_extract.
>> Though I can understand a concern of decreased performance.
>>
>>
>> I did a "simple" test with testpmd as a tgen and a simple port0 ->
>> port1 and port1 -> port0 openflow rules.
>> 1 pmd thread per port on isolated cpu, no thread sibling.
>>
>> I used current main branch:
>> 481bc0979 - (HEAD, origin/main, origin/HEAD) route-table: Allow
>> parsing routes without nexthop. (7 days ago) <Martin Kalcok>
>>
>> Unexpectedly, I see a slight improvement (I repeated builds,
>> configuration and tests a few times).
> 
> Hello David,
> 
> I also did a few performance tests. In all tests below I generated
> traffic in a VM with iperf3, transited a netdev datapath OVS, and
> egressed through an i40e network card. All tests were repeated 10
> times and I restarted OVS in between some tests.
> 
> First I tested with tso + tunnel encapsulation with a vxlan tunnel.
> 
> Without patch:
> Mean: 6.09 Gbps
> Stdev: 0.098
> 
> With patch:
> Mean: 6.20 Gbps
> Stdev: 0.097
> 
> From this it's clear in the tunnel + TSO case there is a noticeable 
> improvement!
> 
> Next I just tested just a straight path from the VM, through OVS, to the nic.
> 
> Without patch:
> Mean: 16.81 Gbps
> Stdev: 0.86
> 
> With patch:
> Mean: 17.68 Gbps
> Stdev: 0.91
> 
> Again we see the small but paradoxical performance improvement with
> the patch. There weren't a lot of samples overall, but I ran a t-test
> and found a p value of 0.045 suggesting significance.
> 
> Cheers,
> M
> 
>>
>>
>> - testpmd (txonly) mlx5 -> mlx5 OVS mlx5 <-> mlx5 testpmd (mac)
>> * Before patch:
>> flow-dump from pmd on cpu core: 6
>> ufid:5ba3b6ab-7595-4904-aeb3-410ec10f0f84,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:100320113, bytes:6420487232, used:0.000s, dp:ovs,
>> actions:dpdk0, dp-extra-info:miniflow_bits(4,1)
>> flow-dump from pmd on cpu core: 4
>> ufid:3627c676-e0f9-4293-b86b-6824c35f9a6c,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:106807423, bytes:6835675072, used:0.000s, dp:ovs,
>> actions:dpdk1, dp-extra-info:miniflow_bits(4,1)
>>
>>   Rx-pps:     11367442          Rx-bps:   5820130688
>>   Tx-pps:     11367439          Tx-bps:   5820128800
>>
>> * After patch:
>> flow-dump from pmd on cpu core: 6
>> ufid:41a51bc1-f6cb-4810-8372-4a9254a1db52,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:32408002, bytes:2074112128, used:0.000s, dp:ovs,
>> actions:dpdk0, dp-extra-info:miniflow_bits(4,1)
>> flow-dump from pmd on cpu core: 4
>> ufid:115e4654-1e01-467b-9360-de75eb1e872b,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:37689559, bytes:2412131776, used:0.000s, dp:ovs,
>> actions:dpdk1, dp-extra-info:miniflow_bits(4,1)
>>
>>   Rx-pps:     12084135          Rx-bps:   6187077192
>>   Tx-pps:     12084135          Tx-bps:   6187077192
>>
>>
>> - testpmd (txonly) virtio-user -> vhost-user OVS vhost-user ->
>> virtio-user testpmd (mac)
>> * Before patch:
>> flow-dump from pmd on cpu core: 6
>> ufid:79248354-3697-4d2e-9d70-cc4df5602ff9,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:23402111, bytes:1497735104, used:0.000s, dp:ovs,
>> actions:vhost0, dp-extra-info:miniflow_bits(4,1)
>> flow-dump from pmd on cpu core: 4
>> ufid:ca8974b4-2c7e-49c1-bdc6-5d90638997b6,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:23402655, bytes:1497769920, used:0.001s, dp:ovs,
>> actions:vhost1, dp-extra-info:miniflow_bits(4,1)
>>
>>   Rx-pps:      6022487          Rx-bps:   3083513840
>>   Tx-pps:      6022487          Tx-bps:   3083513840
>>
>> * After patch:
>> flow-dump from pmd on cpu core: 6
>> ufid:c2bac91a-d8a6-4a96-9d56-aee133d1f047,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:53921535, bytes:3450978240, used:0.000s, dp:ovs,
>> actions:vhost0, dp-extra-info:miniflow_bits(4,1)
>> flow-dump from pmd on cpu core: 4
>> ufid:c4989fca-2662-4645-8291-8971c00b7cb4,
>> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
>> packets:53921887, bytes:3451000768, used:0.000s, dp:ovs,
>> actions:vhost1, dp-extra-info:miniflow_bits(4,1)
>>
>>   Rx-pps:      6042410          Rx-bps:   3093714208
>>   Tx-pps:      6042407          Tx-bps:   3093712616
>>

Hi, Mike and David.

Thanks for the test results, but I don't think they are relevant.  At least the
David's ones.  The datapath flows show no matches on eth addresses and that
suggests that the simple match is in use.  And miniflow_extract is not called in
this case, so the test doesn't really check the changes.  The variance in the
test results is also concerning as nothing should have changed in the datapath,
but the performance changes for some reason.

Mike, what OpenFlow rules are you using in your setup?


On my end, I did my own set of runs with and without this patch and I see about
0.82% performance degradation for a V2V scenario with a NORMAL OpenFlow rule and
no real difference with simple match, which is expected.  My numbers are:

        NORMAL            Simple match
    patch    main        patch    main
    7420.0   7481.6      8333.1   8333.9

        -0.82 %              -0.009 %

The numbres are averages over 14 alternating runs of each type, so the results
should be statistically significant.  The fact that there is no difference in
a simple match case also suggests that the difference with NORMAL is real.

Could you re-check your tests?

For the reference, my configuration is:

---
./configure CFLAGS="-msse4.2 -g -Ofast -march=native"  --with-dpdk=static CC=gcc

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14
ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev
ovs-vsctl add-port ovsbr vhost0 \
  -- set Interface vhost0 type=dpdkvhostuserclient \
     options:vhost-server-path=/tmp/vhost0
ovs-vsctl set Interface vhost0 other_config:pmd-rxq-affinity=0:2,1:2
ovs-vsctl add-port ovsbr vhost1 \
  -- set Interface vhost1 type=dpdkvhostuserclient \
     options:vhost-server-path=/tmp/vhost1
ovs-vsctl set Interface vhost1 other_config:pmd-rxq-affinity=0:4,1:4

ovs-vsctl set Open_vSwitch . other_config:dpdk-extra='--no-pci 
--single-file-segments'
ovs-vsctl set Open_vSwitch . other_config:dpdk-init=try

ovs-ofctl del-flows ovsbr
ovs-ofctl add-flow ovsbr actions=NORMAL

./build-24.11/bin/dpdk-testpmd -l 12,14 -n 4 --socket-mem=1024,0 --no-pci \
  
--vdev="net_virtio_user,path=/tmp/vhost1,server=1,mac=E6:49:42:EC:67:3C,in_order=1"
 \
  --in-memory --single-file-segments -- \
  --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1 \
  --eth-peer=0,5A:90:B6:77:22:F8 --forward-mode=txonly --stats-period=5

./build-24.11/bin/dpdk-testpmd -l 8,10 -n 4 --socket-mem=1024,0 --no-pci \
  
--vdev="net_virtio_user,path=/tmp/vhost0,server=1,mac=5A:90:B6:77:22:F8,in_order=1"
 \
  --in-memory  --single-file-segments  -- \
  --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1  \
  --eth-peer=0,E6:49:42:EC:67:3C  --forward-mode=mac --stats-period=5
---

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to