Re: [ovs-discuss] [ovs-dpdk] Performance drops after 3-4 minutes

2023-10-17 Thread Алексей Кашавкин via discuss


> On 16 Oct 2023, at 14:48, Ilya Maximets  wrote:
> 
> On 10/6/23 20:10, Алексей Кашавкин via discuss wrote:
>> Hello!
>> 
>> I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with 
>> the Train release. I am trying to measure the performance of the DPDK 
>> compute node. I have created two VMs [1], one as a DUT with DPDK and one as 
>> a traffic generator with SR-IOV [2]. Both of them are using Pktgen. 
>> 
>> What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] 
>> reception in DUT, after that the speed always drops to 400Mbit [4]. At the 
>> same time in the output of `pmd-rxq-show` command I always see one of the 
>> interfaces in the bond loaded [5], but it happens that after flapping of the 
>> active interface the speed in DUT increases up to 5Gbit and in the output of 
>> `pmd-rxq-show` command I start to see the load on two interfaces [6]. But at 
>> the same time after 3-4 minutes the speed drops to 700Mbit and I continue to 
>> see the same load on the two interfaces in the bond in the `pmd-rxq-show` 
>> command. In the logs I see nothing but flapping [7] of the interfaces in 
>> bond and the flapping has no effect on the speed drop after 3-4 minutes of 
>> test. After the speed drop from the DUT itself I run traffic towards the 
>> traffic generator [8] for a while and stop, then the speed on the DUT is 
>> restored to 2.6Gbit again with traffic going through one interface or 5Gbit 
>> with traffic going through two interfaces, but this again is only for 3-4 
>> minutes. If I do a test with a traffic generator with a 2.5 Gbit or 1 Gbit 
>> speed limit, the speed also drops to DUT after 4-5 minutes. I've put logging 
>> in debug for bond, dpdk, netdev_dpdk, dpif_netdev, but haven't seen anything 
>> that clarifies what's going on, and also it's not clear that sometimes after 
>> flapping the active interface traffic starts going through both interfaces 
>> in bond, but this happens rarely, not in every test.
> 
> Since rate is restored after you sending some traffic in the backward
> direction, I'd say you have MAC learning somewhere on the path and
> it is getting expired.  For example, if you use NORMAL action in one
> of the bridges, once the MAC is expired, the bridge will start flooding
> packets to all ports of the bridge, which is very slow.  You may look
> at datapath flow dump to confirm which actions are getting executed
> on your packets: ovs-appctl dpctl/dump-flows.
> 
> In general, you should always continuously send some traffic back
> for learned MAC addresses to not expire.  I'm not sure if Pktgen is
> doing that these days, but it wasn't a very robust piece of software
> in the past.

Yes, that is exactly what is happening. I noticed that in the bridge fdb table 
the mac DUT is being expiring and in the ip fabric the mac DUT entry is also 
being expiring. If you clear both of these tables, performance drops. The speed 
does not drop as long as one of the tables still has a mac address entry. 

Now, if performance drops, I check the FDB tables and if I really don't see mac 
in them, I send a single ping packet from the DUT VM with pktgen to the traffic 
generator side, after which mac is learned again and speed is restored.

Thank you, Ilya.

>> 
>> [4] The flapping of the interface through which traffic is going to the DUT 
>> VM is probably due to the fact that it is heavily loaded alone in the bond 
>> and there are no LACP PDU packets going to or from it. The log shows that it 
>> is down for 30 seconds because the LACP rate is set to slow mode.
> 
> Dropped LACP packets can cause bond flapping indeed.  The only way to
> fix that in older versions of OVS is to reduce the load.  With OVS 3.2
> you may try experimental 'rx-steering' configuration that was designed
> exactly for this scenario and should ensure that PDU packets are not
> dropped.
> 
> Also, balancing depends on packet hashes, so you need to send many
> different traffic flows in order to get consistent balancing.
> 
>> 
>> I have done DUT on different OS, with different versions of DPDK and Pktgen. 
>> But always the same thing happens, after 3-4 minutes the speed drops.
>> Only on the DPDK compute node I didn't change anything. The compute node has 
>> Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The 
>> PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1.
> 
> All in all, 2.6Gbps seems like a small number for the type of a
> system you have.  You might have some other configuration issues.

This figure is probably related to the tcp-packet size of 64 bytes. The traffic 
generator sends a frame of 64 bytes.

[ovs-discuss] [ovs-dpdk] Performance drops after 3-4 minutes

2023-10-09 Thread Алексей Кашавкин via discuss
Hello!

I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with the 
Train release. I am trying to measure the performance of the DPDK compute node. 
I have created two VMs [1], one as a DUT with DPDK and one as a traffic 
generator with SR-IOV [2]. Both of them are using Pktgen. 

What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] 
reception in DUT, after that the speed always drops to 400Mbit [4]. At the same 
time in the output of `pmd-rxq-show` command I always see one of the interfaces 
in the bond loaded [5], but it happens that after flapping of the active 
interface the speed in DUT increases up to 5Gbit and in the output of 
`pmd-rxq-show` command I start to see the load on two interfaces [6]. But at 
the same time after 3-4 minutes the speed drops to 700Mbit and I continue to 
see the same load on the two interfaces in the bond in the `pmd-rxq-show` 
command. In the logs I see nothing but flapping [7] of the interfaces in bond 
and the flapping has no effect on the speed drop after 3-4 minutes of test. 
After the speed drop from the DUT itself I run traffic towards the traffic 
generator [8] for a while and stop, then the speed on the DUT is restored to 
2.6Gbit again with traffic going through one interface or 5Gbit with traffic 
going through two interfaces, but this again is only for 3-4 minutes. If I do a 
test with a traffic generator with a 2.5 Gbit or 1 Gbit speed limit, the speed 
also drops to DUT after 4-5 minutes. I've put logging in debug for bond, dpdk, 
netdev_dpdk, dpif_netdev, but haven't seen anything that clarifies what's going 
on, and also it's not clear that sometimes after flapping the active interface 
traffic starts going through both interfaces in bond, but this happens rarely, 
not in every test.

[4] The flapping of the interface through which traffic is going to the DUT VM 
is probably due to the fact that it is heavily loaded alone in the bond and 
there are no LACP PDU packets going to or from it. The log shows that it is 
down for 30 seconds because the LACP rate is set to slow mode.

I have done DUT on different OS, with different versions of DPDK and Pktgen. 
But always the same thing happens, after 3-4 minutes the speed drops.
Only on the DPDK compute node I didn't change anything. The compute node has 
Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The 
PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1.

In addition:
[9] ovs-vsctl show
[10] OVSDB dump
[11] pmd-stats-show
[12] bond info with ovs-appctl

For compute nodes, I use Rocky Linux 8.5, Open vSwitch 2.15.5, and DPDK 20.11.1.


What could be the cause of this behavior? I don't understand where I should 
look to find out exactly what is going on.


1. https://that.guru/blog/pktgen-between-two-openstack-guests
2. https://freeimage.host/i/J206p8Q
3. https://freeimage.host/i/J20Po9p
4. https://freeimage.host/i/J20PRPs
5. https://pastebin.com/rpaggexZ
6. https://pastebin.com/Zhm779vT
7. https://pastebin.com/Vt5P35gc
8. https://freeimage.host/i/J204SkB
9. https://pastebin.com/rNJZeyPy
10. https://pastebin.com/wEifvivH
11. https://pastebin.com/pELywZUQ
12. https://pastebin.com/PTV6fWEb



___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss