Re: [ovs-discuss] Restarting the network triggers the deletion of one ovs port

2023-10-16 Thread Liqi An via discuss
Hi experts ,
I simplified the issue repetition steps , 

cluster12-b: # cat ovs-network.xml


2.11-ovs-network






















cluster12-b: # virsh list --all
 Id   Name   State


cluster12-b: # virsh net-list --all
 Name   State   Autostart   Persistent


cluster12-b: # virsh net-define ovs-network.xml
Network 2.11-ovs-network defined from ovs-network.xml

cluster12-b: # virsh net-list --all
 Name   State  Autostart   Persistent
---
 2.11-ovs-network   inactive   no  yes

cluster12-b: # virsh net-start 2.11-ovs-network
Network 2.11-ovs-network started

cluster12-b: # virsh net-list --all
 Name   StateAutostart   Persistent
-
 2.11-ovs-network   active   no  yes

cluster12-b: # ovs-vsctl show
2e9bf291-50ac-4c3a-ac55-2d590df1880d
ovs_version: "2.14.2"
cluster12-b: # ovs-vsctl add-br br-oam
cluster12-b: # ovs-vsctl show
2e9bf291-50ac-4c3a-ac55-2d590df1880d
Bridge br-oam
Port br-oam
Interface br-oam
type: internal
ovs_version: "2.14.2"
cluster12-b: # ovs-vsctl add-port br-oam bond1 trunk=3932,3933
cluster12-b: # ovs-vsctl show
2e9bf291-50ac-4c3a-ac55-2d590df1880d
Bridge br-oam
Port br-oam
Interface br-oam
type: internal
Port bond1
trunks: [3932, 3933]
Interface bond1
ovs_version: "2.14.2"
cluster12-b: # date
Tue Oct 17 13:47:02 CST 2023
cluster12-b: # service network restart 
cluster12-b: # ovs-vsctl show
2e9bf291-50ac-4c3a-ac55-2d590df1880d
Bridge br-oam
Port br-oam
Interface br-oam
type: internal
ovs_version: "2.14.2"
cluster12-b: #

it seems like a common issue . 

//An

-Original Message-
From: Ilya Maximets  
Sent: Monday, October 16, 2023 6:54 PM
To: Liqi An ; ovs-discuss@openvswitch.org
Cc: Cheng Chi ; Jonas Yi ; Yawei 
Lu ; i.maxim...@ovn.org
Subject: Re: [ovs-discuss] Restarting the network triggers the deletion of one 
ovs port

On 10/16/23 07:38, Liqi An via discuss wrote:
> Hi experts,
> 
>    I am having a very strange problem with matching virtual machines 
> installations with openvswitch. My operating system is suse15-sp4;
> 
> cluster12-b:~ # cat /etc/os-release
> /NAME="SLES"/
> /VERSION="15-SP4"/
> /VERSION_ID="15.4"/
> /PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"/ /ID="sles"/ 
> /ID_LIKE="suse"/ /ANSI_COLOR="0;32"/ 
> /CPE_NAME="cpe:/o:suse:sles:15:sp4"/
> /DOCUMENTATION_URL=https://documentation.suse.com/ 
> /
> 
> cluster12-b:~ # rpm -qa |grep openvswitch 
> /openvswitch-2.14.2-150400.22.23.x86_64/
> 
> cluster12-b:~ # virsh net-list --all
> 
> /Name   State    Autostart   Persistent/ 
> /-/
> /2.11-ovs-network   active   yes yes/
> 
> bond1 was used by the VMs:
> …
>    Bridge br-oam
>     Port bond1
>     trunks: [3932, 3933]
>     Interface bond1
>     Port "2.11-SC-2-eth1"
>     tag: 3932
>     Interface "2.11-SC-2-eth1"
>     Port br-oam
>     Interface br-oam
>     type: internal
>     Port "2.11-SC-2-eth2"
>     tag: 3933
>     Interface "2.11-SC-2-eth2"
> 
>  But when I restarted the network service by command: # service 
> network restart , this port bond1 lost in the bridge br-oam , and 
> there are some abnormal log in systemlog, Detailed operation logs are 
> attached
> 
> …
> /25302 2023-10-16T13:07:12.708071+08:00 cluster12-b kernel: 
> [340552.475586][ T2447] device eth1 left promiscuous mode/
> /25303 2023-10-16T13:07:12.824022+08:00 cluster12-b kernel: 
> [340552.593298][ T2447] bonding: bond0 is being deleted.../
> /25304 2023-10-16T13:07:12.824045+08:00 cluster12-b kernel: 
> [340552.593393][ T2447] bond0 (unregistering): Released all slaves/
> /25305 2023-10-16T13:07:12.881576+08:00 cluster12-b systemd[1]: 
> Starting Generate issue file for login session.../
> /25306 2023-10-16T13:07:12.905589+08:00 cluster12-b systemd[1]: 
> issue-generator.service: Deactivated successfully./
> /25307 2023-10-16T13:07:12.905662+08:00 cluster12-b systemd[1]: 
> Finished Generate issue file for login session./
> /25308 2023-10-16T13:07:17.668420+08:00 cluster12-b ovs-vsctl: 
> ovs|1|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port br-oam 
> bond1/
> /25309 2023-10-16T13:07:17.676015+08:00 cluster12-b kernel: 
> [340557.444150][ T2261] device bond1 left promiscuous mode/
> /25310 2023-10-16T13:07:17.720080+08:00 cluster12-b kernel: 
> [340557.486796][ T2447] bonding: bond1 is being deleted.../
> /25311 2023-10-16T13:07:17.720097+08:00 cluster12-b kern

[ovs-discuss] OpenvSwitch with DPDK, Mirror + VLAN translation

2023-10-16 Thread Fred Licht via discuss
Hi All,
I am looking for suggestions/advice on how to setup a configuration.  I 
have found methods on how to VLAN translate, and how to mirror, adding 
complication of DPDK but not a combined solution.

How to mirror all traffic on a given VLAN, translate the mirrored tagged van to 
a new VLAN ID, and send it back over the same OVS bridge.  Ensuring any of the 
new mirrored data only traverses over the bonded trunk back to a physical 
switch.

VLAN 123 => SPAN/Mirror => VLAN 1123 => OVS Bond => switch

TIA,
Fred Licht
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] [ovs-dpdk] Performance drops after 3-4 minutes

2023-10-16 Thread Ilya Maximets via discuss
On 10/6/23 20:10, Алексей Кашавкин via discuss wrote:
> Hello!
> 
> I am using OVS with DPDK in OpenStack. This is RDO+TripleO deployment with 
> the Train release. I am trying to measure the performance of the DPDK compute 
> node. I have created two VMs [1], one as a DUT with DPDK and one as a traffic 
> generator with SR-IOV [2]. Both of them are using Pktgen. 
> 
> What happens is the following: for the first 3-4 minutes I see 2.6Gbit [3] 
> reception in DUT, after that the speed always drops to 400Mbit [4]. At the 
> same time in the output of `pmd-rxq-show` command I always see one of the 
> interfaces in the bond loaded [5], but it happens that after flapping of the 
> active interface the speed in DUT increases up to 5Gbit and in the output of 
> `pmd-rxq-show` command I start to see the load on two interfaces [6]. But at 
> the same time after 3-4 minutes the speed drops to 700Mbit and I continue to 
> see the same load on the two interfaces in the bond in the `pmd-rxq-show` 
> command. In the logs I see nothing but flapping [7] of the interfaces in bond 
> and the flapping has no effect on the speed drop after 3-4 minutes of test. 
> After the speed drop from the DUT itself I run traffic towards the traffic 
> generator [8] for a while and stop, then the speed on the DUT is restored to 
> 2.6Gbit again with traffic going through one interface or 5Gbit with traffic
> going through two interfaces, but this again is only for 3-4 minutes. If I do 
> a test with a traffic generator with a 2.5 Gbit or 1 Gbit speed limit, the 
> speed also drops to DUT after 4-5 minutes. I've put logging in debug for 
> bond, dpdk, netdev_dpdk, dpif_netdev, but haven't seen anything that 
> clarifies what's going on, and also it's not clear that sometimes after 
> flapping the active interface traffic starts going through both interfaces in 
> bond, but this happens rarely, not in every test.

Since rate is restored after you sending some traffic in the backward
direction, I'd say you have MAC learning somewhere on the path and
it is getting expired.  For example, if you use NORMAL action in one
of the bridges, once the MAC is expired, the bridge will start flooding
packets to all ports of the bridge, which is very slow.  You may look
at datapath flow dump to confirm which actions are getting executed
on your packets: ovs-appctl dpctl/dump-flows.

In general, you should always continuously send some traffic back
for learned MAC addresses to not expire.  I'm not sure if Pktgen is
doing that these days, but it wasn't a very robust piece of software
in the past.

> 
> [4] The flapping of the interface through which traffic is going to the DUT 
> VM is probably due to the fact that it is heavily loaded alone in the bond 
> and there are no LACP PDU packets going to or from it. The log shows that it 
> is down for 30 seconds because the LACP rate is set to slow mode.

Dropped LACP packets can cause bond flapping indeed.  The only way to
fix that in older versions of OVS is to reduce the load.  With OVS 3.2
you may try experimental 'rx-steering' configuration that was designed
exactly for this scenario and should ensure that PDU packets are not
dropped.

Also, balancing depends on packet hashes, so you need to send many
different traffic flows in order to get consistent balancing.

> 
> I have done DUT on different OS, with different versions of DPDK and Pktgen. 
> But always the same thing happens, after 3-4 minutes the speed drops.
> Only on the DPDK compute node I didn't change anything. The compute node has 
> Intel E810 network card with 25Gbit ports and Intel Xeon Gold 6230R CPU. The 
> PMD threads uses cores 11, 21, 63, 73 on numa 0 and 36, 44, 88, 96 on numa 1.

All in all, 2.6Gbps seems like a small number for the type of a
system you have.  You might have some other configuration issues.

> 
> In addition:
> [9] ovs-vsctl show
> [10] OVSDB dump
> [11] pmd-stats-show
> [12] bond info with ovs-appctl
> 
> For compute nodes, I use Rocky Linux 8.5, Open vSwitch 2.15.5, and DPDK 
> 20.11.1.

FWIW, OVS 2.15 reached EOL ~1.5 years ago.

Best regards, Ilya Maximets.

> 
> 
> What could be the cause of this behavior? I don't understand where I should 
> look to find out exactly what is going on.
> 
> 
> 1. https://that.guru/blog/pktgen-between-two-openstack-guests 
> 
> 2. https://freeimage.host/i/J206p8Q 
> 3. https://freeimage.host/i/J20Po9p 
> 4. https://freeimage.host/i/J20PRPs 
> 5. https://pastebin.com/rpaggexZ 
> 6. https://pastebin.com/Zhm779vT 
> 7. https://pastebin.com/Vt5P35gc 
> 8. https://freeimage.host/i/J204SkB 
> 9. https://pastebin.com/rNJZeyPy 
> 10. https://pastebin.com/wEifvivH 

Re: [ovs-discuss] Restarting the network triggers the deletion of one ovs port

2023-10-16 Thread Ilya Maximets via discuss
On 10/16/23 07:38, Liqi An via discuss wrote:
> Hi experts,
> 
>    I am having a very strange problem with matching virtual machines 
> installations with openvswitch. My operating system is suse15-sp4;
> 
> cluster12-b:~ # cat /etc/os-release
> /NAME="SLES"/
> /VERSION="15-SP4"/
> /VERSION_ID="15.4"/
> /PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"/
> /ID="sles"/
> /ID_LIKE="suse"/
> /ANSI_COLOR="0;32"/
> /CPE_NAME="cpe:/o:suse:sles:15:sp4"/
> /DOCUMENTATION_URL=https://documentation.suse.com/ 
> /
> 
> cluster12-b:~ # rpm -qa |grep openvswitch
> /openvswitch-2.14.2-150400.22.23.x86_64/
> 
> cluster12-b:~ # virsh net-list --all
> 
> /Name   State    Autostart   Persistent/
> /-/
> /2.11-ovs-network   active   yes yes/
> 
> bond1 was used by the VMs:
> …
>    Bridge br-oam
>     Port bond1
>     trunks: [3932, 3933]
>     Interface bond1
>     Port "2.11-SC-2-eth1"
>     tag: 3932
>     Interface "2.11-SC-2-eth1"
>     Port br-oam
>     Interface br-oam
>     type: internal
>     Port "2.11-SC-2-eth2"
>     tag: 3933
>     Interface "2.11-SC-2-eth2"
> 
>  But when I restarted the network service by command: # service network 
> restart , this port bond1 lost in the bridge br-oam ,
> and there are some abnormal log in systemlog, Detailed operation logs are 
> attached
> 
> …
> /25302 2023-10-16T13:07:12.708071+08:00 cluster12-b kernel: [340552.475586][ 
> T2447] device eth1 left promiscuous mode/
> /25303 2023-10-16T13:07:12.824022+08:00 cluster12-b kernel: [340552.593298][ 
> T2447] bonding: bond0 is being deleted.../
> /25304 2023-10-16T13:07:12.824045+08:00 cluster12-b kernel: [340552.593393][ 
> T2447] bond0 (unregistering): Released all slaves/
> /25305 2023-10-16T13:07:12.881576+08:00 cluster12-b systemd[1]: Starting 
> Generate issue file for login session.../
> /25306 2023-10-16T13:07:12.905589+08:00 cluster12-b systemd[1]: 
> issue-generator.service: Deactivated successfully./
> /25307 2023-10-16T13:07:12.905662+08:00 cluster12-b systemd[1]: Finished 
> Generate issue file for login session./
> /25308 2023-10-16T13:07:17.668420+08:00 cluster12-b ovs-vsctl: 
> ovs|1|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port br-oam bond1/
> /25309 2023-10-16T13:07:17.676015+08:00 cluster12-b kernel: [340557.444150][ 
> T2261] device bond1 left promiscuous mode/
> /25310 2023-10-16T13:07:17.720080+08:00 cluster12-b kernel: [340557.486796][ 
> T2447] bonding: bond1 is being deleted.../
> /25311 2023-10-16T13:07:17.720097+08:00 cluster12-b kernel: [340557.486891][ 
> T2447] bond1 (unregistering): Released all slaves/

IIUC, the 'bond1' is some sort of a kernel bonding device configured
outside of OVS.  And it is getting removed.
When you restart the network, the system will execute whatever network
configuration is in your system settings, e.g. stuff from
/etc/sysconfig/network-scripts, maybe NetworkManager is going to re-apply
its configuration or netplan, I don't really know what SUSE is using.
So, you should look in these places for things that manage the bond1
interface.

Best regards, Ilya Maximets.

> 
> It seemed that Restarting the host's network service automatically triggered 
> behavior: /as /usr/bin/ovs-vsctl del-port br-oam bond1/
> 
> Also, I restart host which causes the same issue, would you please help check 
> and give some advice, thx~
> 
> //An

___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] Restarting the network triggers the deletion of one ovs port

2023-10-16 Thread Liqi An via discuss
Hi experts,
   I am having a very strange problem with matching virtual machines 
installations with openvswitch. My operating system is suse15-sp4;

cluster12-b:~ # cat /etc/os-release
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL=https://documentation.suse.com/

cluster12-b:~ # rpm -qa |grep openvswitch
openvswitch-2.14.2-150400.22.23.x86_64

cluster12-b:~ # virsh net-list --all
Name   StateAutostart   Persistent
-
2.11-ovs-network   active   yes yes


bond1 was used by the VMs:
...
   Bridge br-oam
Port bond1
trunks: [3932, 3933]
Interface bond1
Port "2.11-SC-2-eth1"
tag: 3932
Interface "2.11-SC-2-eth1"
Port br-oam
Interface br-oam
type: internal
Port "2.11-SC-2-eth2"
tag: 3933
Interface "2.11-SC-2-eth2"

 But when I restarted the network service by command: # service network 
restart , this port bond1 lost in the bridge br-oam ,
and there are some abnormal log in systemlog, Detailed operation logs are 
attached
...
25302 2023-10-16T13:07:12.708071+08:00 cluster12-b kernel: [340552.475586][ 
T2447] device eth1 left promiscuous mode
25303 2023-10-16T13:07:12.824022+08:00 cluster12-b kernel: [340552.593298][ 
T2447] bonding: bond0 is being deleted...
25304 2023-10-16T13:07:12.824045+08:00 cluster12-b kernel: [340552.593393][ 
T2447] bond0 (unregistering): Released all slaves
25305 2023-10-16T13:07:12.881576+08:00 cluster12-b systemd[1]: Starting 
Generate issue file for login session...
25306 2023-10-16T13:07:12.905589+08:00 cluster12-b systemd[1]: 
issue-generator.service: Deactivated successfully.
25307 2023-10-16T13:07:12.905662+08:00 cluster12-b systemd[1]: Finished 
Generate issue file for login session.
25308 2023-10-16T13:07:17.668420+08:00 cluster12-b ovs-vsctl: 
ovs|1|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port br-oam bond1
25309 2023-10-16T13:07:17.676015+08:00 cluster12-b kernel: [340557.444150][ 
T2261] device bond1 left promiscuous mode
25310 2023-10-16T13:07:17.720080+08:00 cluster12-b kernel: [340557.486796][ 
T2447] bonding: bond1 is being deleted...
25311 2023-10-16T13:07:17.720097+08:00 cluster12-b kernel: [340557.486891][ 
T2447] bond1 (unregistering): Released all slaves

It seemed that Restarting the host's network service automatically triggered 
behavior: as /usr/bin/ovs-vsctl del-port br-oam bond1

Also, I restart host which causes the same issue, would you please help check 
and give some advice, thx~


//An

cluster12-b:~ # rpm -qa |grep openvswitch
openvswitch-2.14.2-150400.22.23.x86_64
libopenvswitch-2_14-0-2.14.2-150400.22.23.x86_64
cluster12-b:~ # cat /proc/
Display all 557 possibilities? (y or n)
cluster12-b:~ # cat /etc/os-release 
NAME="SLES"
VERSION="15-SP4"
VERSION_ID="15.4"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp4"
DOCUMENTATION_URL="https://documentation.suse.com/";
cluster12-b:~ # rpm -qa |grep openvswitch
openvswitch-2.14.2-150400.22.23.x86_64
libopenvswitch-2_14-0-2.14.2-150400.22.23.x86_64
cluster12-b:~ # rpm -qa |grep libvirt
libvirt-client-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-nodedev-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-iscsi-direct-8.0.0-150400.5.8.x86_64
libvirt-daemon-config-network-8.0.0-150400.5.8.x86_64
libvirt-libs-8.0.0-150400.5.8.x86_64
libvirt-glib-1_0-0-4.0.0-150400.1.10.x86_64
libvirt-daemon-driver-qemu-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-interface-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-mpath-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-disk-8.0.0-150400.5.8.x86_64
libvirt-daemon-qemu-8.0.0-150400.5.8.x86_64
python3-libvirt-python-8.0.0-150400.1.6.x86_64
system-group-libvirt-20170617-150400.22.33.noarch
libvirt-daemon-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-secret-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-network-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-rbd-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-iscsi-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-core-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-scsi-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-nwfilter-8.0.0-150400.5.8.x86_64
libvirt-daemon-driver-storage-logical-8.0.0-150400.5.8.x86_64
cluster12-b:~ # rpm -qa |grep qemu
qemu-chardev-spice-6.2.0-150400.35.10.x86_64
qemu-sgabios-8-150400.35.10.noarch
qemu-hw-usb-host-6.2.0-150400.35.10.x86_64
qemu-ui-spice-core-6.2.0-150400.35.10.x86_64
qemu-block-rbd-6.2.0-150400.35.10.x86_64
qemu-ovmf-x86_64-202202-150400.3.3.noarch
qemu-accel-tcg-x86-6.2.0-150400.35.10.x86_64
qemu-hw-display-virtio-gpu-6.2.0-150400.35.10.x86_64
qemu-ipxe-1.0.0+-150400.3