On Mon, Feb 10, 2025 at 8:30 AM Ilya Maximets <[email protected]> wrote:
>
> On 2/7/25 06:46, Mike Pattrick wrote:
> > On Thu, Feb 6, 2025 at 9:15 AM David Marchand <[email protected]>
> > wrote:
> >>
> >> Hello,
> >>
> >> On Wed, Feb 5, 2025 at 1:55 PM Ilya Maximets <[email protected]> wrote:
> >>>
> >>> On 1/23/25 16:56, David Marchand wrote:
> >>>> Rather than drop all pending Tx offloads on recirculation,
> >>>> preserve inner offloads (and mark packet with outer Tx offloads)
> >>>> when parsing the packet again.
> >>>>
> >>>> Fixes: c6538b443984 ("dpif-netdev: Fix crash due to tunnel offloading on
> >>>> recirculation.")
> >>>> Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
> >>>> Signed-off-by: David Marchand <[email protected]>
> >>>> ---
> >>>> Changes since v1:
> >>>> - rebased,
> >>>> - dropped API change on miniflow_extract(), rely on tunnel offloading
> >>>> flag presence instead,
> >>>> - introduced dp_packet_reset_outer_offsets,
> >>>>
> >>>> ---
> >>>> lib/dp-packet.h | 23 +++++++++++------------
> >>>> lib/dpif-netdev.c | 27 ---------------------------
> >>>> lib/flow.c | 34 ++++++++++++++++++++++++++++------
> >>>> 3 files changed, 39 insertions(+), 45 deletions(-)
> >>>
> >>> Hi, David. Thanks for the patch!
> >>>
> >>> Did you run some performance tests with this change? It touches the very
> >>> core of packet parsing, so we need to check how that impacts normal V2V or
> >>> PVP scenarios even without tunneling.
> >>
> >> I would be surprised those added branches add much to the already good
> >> number of branches in miniflow_extract.
> >> Though I can understand a concern of decreased performance.
> >>
> >>
> >> I did a "simple" test with testpmd as a tgen and a simple port0 ->
> >> port1 and port1 -> port0 openflow rules.
> >> 1 pmd thread per port on isolated cpu, no thread sibling.
> >>
> >> I used current main branch:
> >> 481bc0979 - (HEAD, origin/main, origin/HEAD) route-table: Allow
> >> parsing routes without nexthop. (7 days ago) <Martin Kalcok>
> >>
> >> Unexpectedly, I see a slight improvement (I repeated builds,
> >> configuration and tests a few times).
> >
> > Hello David,
> >
> > I also did a few performance tests. In all tests below I generated
> > traffic in a VM with iperf3, transited a netdev datapath OVS, and
> > egressed through an i40e network card. All tests were repeated 10
> > times and I restarted OVS in between some tests.
> >
> > First I tested with tso + tunnel encapsulation with a vxlan tunnel.
> >
> > Without patch:
> > Mean: 6.09 Gbps
> > Stdev: 0.098
> >
> > With patch:
> > Mean: 6.20 Gbps
> > Stdev: 0.097
> >
> > From this it's clear in the tunnel + TSO case there is a noticeable
> > improvement!
> >
> > Next I just tested just a straight path from the VM, through OVS, to the
> > nic.
> >
> > Without patch:
> > Mean: 16.81 Gbps
> > Stdev: 0.86
> >
> > With patch:
> > Mean: 17.68 Gbps
> > Stdev: 0.91
> >
> > Again we see the small but paradoxical performance improvement with
> > the patch. There weren't a lot of samples overall, but I ran a t-test
> > and found a p value of 0.045 suggesting significance.
> >
> > Cheers,
> > M
> >
> >>
> >>
> >> - testpmd (txonly) mlx5 -> mlx5 OVS mlx5 <-> mlx5 testpmd (mac)
> >> * Before patch:
> >> flow-dump from pmd on cpu core: 6
> >> ufid:5ba3b6ab-7595-4904-aeb3-410ec10f0f84,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:100320113, bytes:6420487232, used:0.000s, dp:ovs,
> >> actions:dpdk0, dp-extra-info:miniflow_bits(4,1)
> >> flow-dump from pmd on cpu core: 4
> >> ufid:3627c676-e0f9-4293-b86b-6824c35f9a6c,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:106807423, bytes:6835675072, used:0.000s, dp:ovs,
> >> actions:dpdk1, dp-extra-info:miniflow_bits(4,1)
> >>
> >> Rx-pps: 11367442 Rx-bps: 5820130688
> >> Tx-pps: 11367439 Tx-bps: 5820128800
> >>
> >> * After patch:
> >> flow-dump from pmd on cpu core: 6
> >> ufid:41a51bc1-f6cb-4810-8372-4a9254a1db52,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:91/00:00:00:00:00:00,dst=04:3f:72:b2:c0:90/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:32408002, bytes:2074112128, used:0.000s, dp:ovs,
> >> actions:dpdk0, dp-extra-info:miniflow_bits(4,1)
> >> flow-dump from pmd on cpu core: 4
> >> ufid:115e4654-1e01-467b-9360-de75eb1e872b,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(dpdk0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=04:3f:72:b2:c0:90/00:00:00:00:00:00,dst=02:00:00:00:00:00/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:37689559, bytes:2412131776, used:0.000s, dp:ovs,
> >> actions:dpdk1, dp-extra-info:miniflow_bits(4,1)
> >>
> >> Rx-pps: 12084135 Rx-bps: 6187077192
> >> Tx-pps: 12084135 Tx-bps: 6187077192
> >>
> >>
> >> - testpmd (txonly) virtio-user -> vhost-user OVS vhost-user ->
> >> virtio-user testpmd (mac)
> >> * Before patch:
> >> flow-dump from pmd on cpu core: 6
> >> ufid:79248354-3697-4d2e-9d70-cc4df5602ff9,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:23402111, bytes:1497735104, used:0.000s, dp:ovs,
> >> actions:vhost0, dp-extra-info:miniflow_bits(4,1)
> >> flow-dump from pmd on cpu core: 4
> >> ufid:ca8974b4-2c7e-49c1-bdc6-5d90638997b6,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:23402655, bytes:1497769920, used:0.001s, dp:ovs,
> >> actions:vhost1, dp-extra-info:miniflow_bits(4,1)
> >>
> >> Rx-pps: 6022487 Rx-bps: 3083513840
> >> Tx-pps: 6022487 Tx-bps: 3083513840
> >>
> >> * After patch:
> >> flow-dump from pmd on cpu core: 6
> >> ufid:c2bac91a-d8a6-4a96-9d56-aee133d1f047,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost1),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:56/00:00:00:00:00:00,dst=00:11:22:33:44:55/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:53921535, bytes:3450978240, used:0.000s, dp:ovs,
> >> actions:vhost0, dp-extra-info:miniflow_bits(4,1)
> >> flow-dump from pmd on cpu core: 4
> >> ufid:c4989fca-2662-4645-8291-8971c00b7cb4,
> >> recirc_id(0),dp_hash(0/0),skb_priority(0/0),in_port(vhost0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),packet_type(ns=0,id=0),eth(src=00:11:22:33:44:55/00:00:00:00:00:00,dst=00:11:22:33:44:66/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=198.18.0.1/0.0.0.0,dst=198.18.0.2/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=9/0,dst=9/0),
> >> packets:53921887, bytes:3451000768, used:0.000s, dp:ovs,
> >> actions:vhost1, dp-extra-info:miniflow_bits(4,1)
> >>
> >> Rx-pps: 6042410 Rx-bps: 3093714208
> >> Tx-pps: 6042407 Tx-bps: 3093712616
> >>
>
> Hi, Mike and David.
>
> Thanks for the test results, but I don't think they are relevant. At least
> the
> David's ones. The datapath flows show no matches on eth addresses and that
> suggests that the simple match is in use. And miniflow_extract is not called
> in
> this case, so the test doesn't really check the changes. The variance in the
> test results is also concerning as nothing should have changed in the
> datapath,
> but the performance changes for some reason.
>
> Mike, what OpenFlow rules are you using in your setup?
>
>
> On my end, I did my own set of runs with and without this patch and I see
> about
> 0.82% performance degradation for a V2V scenario with a NORMAL OpenFlow rule
> and
> no real difference with simple match, which is expected. My numbers are:
>
> NORMAL Simple match
> patch main patch main
> 7420.0 7481.6 8333.1 8333.9
>
> -0.82 % -0.009 %
>
> The numbres are averages over 14 alternating runs of each type, so the results
> should be statistically significant. The fact that there is no difference in
> a simple match case also suggests that the difference with NORMAL is real.
>
> Could you re-check your tests?
I reran the test as a V2V and confirmed that a normal action was used.
This time I increased the sample count to 60 runs per branch and
restarted OVS between each test. Results:
patch main
mean 1421798 1427197 KBytes/sec
stdev 35954 35824
-0.38 %
p = 0.41
High p value
I also ran this test with a ct(),recirc,normal action.
patch main
mean 1435295 1393243 KBytes/sec
stdev 33540 36335
+3 %
I also threw together a micro benchmark to just time miniflow_extract
raw: https://gist.github.com/mkp-rh/a47356202a485dac4c838cc07c3b77ba
After 100 samples of 1,000,000 iterations
patch main
mean 47 41 ns
stdev 4 2 ns
-11 %
Cheers,
M
>
> For the reference, my configuration is:
>
> ---
> ./configure CFLAGS="-msse4.2 -g -Ofast -march=native" --with-dpdk=static
> CC=gcc
>
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14
> ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev
> ovs-vsctl add-port ovsbr vhost0 \
> -- set Interface vhost0 type=dpdkvhostuserclient \
> options:vhost-server-path=/tmp/vhost0
> ovs-vsctl set Interface vhost0 other_config:pmd-rxq-affinity=0:2,1:2
> ovs-vsctl add-port ovsbr vhost1 \
> -- set Interface vhost1 type=dpdkvhostuserclient \
> options:vhost-server-path=/tmp/vhost1
> ovs-vsctl set Interface vhost1 other_config:pmd-rxq-affinity=0:4,1:4
>
> ovs-vsctl set Open_vSwitch . other_config:dpdk-extra='--no-pci
> --single-file-segments'
> ovs-vsctl set Open_vSwitch . other_config:dpdk-init=try
>
> ovs-ofctl del-flows ovsbr
> ovs-ofctl add-flow ovsbr actions=NORMAL
>
> ./build-24.11/bin/dpdk-testpmd -l 12,14 -n 4 --socket-mem=1024,0 --no-pci \
>
> --vdev="net_virtio_user,path=/tmp/vhost1,server=1,mac=E6:49:42:EC:67:3C,in_order=1"
> \
> --in-memory --single-file-segments -- \
> --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1 \
> --eth-peer=0,5A:90:B6:77:22:F8 --forward-mode=txonly --stats-period=5
>
> ./build-24.11/bin/dpdk-testpmd -l 8,10 -n 4 --socket-mem=1024,0 --no-pci \
>
> --vdev="net_virtio_user,path=/tmp/vhost0,server=1,mac=5A:90:B6:77:22:F8,in_order=1"
> \
> --in-memory --single-file-segments -- \
> --burst=32 --txd=2048 --rxd=2048 --rxq=1 --txq=1 --nb-cores=1 \
> --eth-peer=0,E6:49:42:EC:67:3C --forward-mode=mac --stats-period=5
> ---
>
> Best regards, Ilya Maximets.
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev