[ovs-discuss] Gratuitous ARP is missing
Hii, At times, when I allocate a VM to a compute node, the network fails to learn an ARP entry. This issue occurs randomly, making it challenging to predict when it will happen. I am interested in logging any instances where the controller sends a Gratuitous ARP (GARP) to better understand and address this problem.I am not sure ovn-controller is sending the GARP or not. Is this a known issue? When we run into this issue, ovn-controller restart is fixing the problem. Any other hacks work-arounds to fix the problem? Thanks Srini ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
[ovs-discuss] OVN upgrade
Hi Team, We are currently running the following versions of ovn-controller and OVS. We are planning to migrate ovn-controller to 23.x version and OVS to 3.x version, considerably LTS versions. For the smooth migration of DB schemas, what are the compatible versions with backward compatibility? # ovn-controller --version ovn-controller 22.09.2 Open vSwitch Library 3.0.3 OpenFlow versions 0x6:0x6 SB DB Schema 20.25.0 # ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.17.6 DB Schema 8.3.0 Thanks Srini ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] OVN 100% CPU - massive number of ARP entries
On 11/1/23 06:06, Gavin McKee via discuss wrote: > Hi , > Hi Gavin, > We are seeing ovn-controller churning constantly at 100% CPU usage. > > (Open vSwitch) 2.17.6 > ovn-controller 22.09.2 > Is this a deployment that can be upgraded to newer OVS/OVN versions? > 2023-11-01T04:54:08.406Z|01514|poll_loop|INFO|wakeup due to [POLLIN] on > fd 24 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% > CPU usage) > 2023-11-01T04:54:11.053Z|01515|poll_loop|INFO|wakeup due to 2641-ms > timeout at lib/rconn.c:687 (100% CPU usage) > 2023-11-01T04:54:11.058Z|01516|poll_loop|INFO|wakeup due to [POLLIN] on > fd 23 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% > CPU usage) > > We don't have a huge logical scale , maybe like 400 hypervisors , most > machines have max 8 logical switch ports bound in the br-int . > It really depends on how the logical networks are connected to each other. If the 8 locally bound ports can logically access other remote ports (e.g., via a logical router or if connected to the same logical switch) then we need to install flows for those in the local br-int too. 400 hypervisors may be quite a lot though. What is the state of the Southbound database, is it handling the load properly? > I think we are generating a massive number of ARP entries to the ovs > switch , I'm wondering if this is why we are seeing the high CPU . > ovs-ofctl dump-flows br-int | grep -i arp | wc -l > 273259 > This does look suspicious having in mind that there are only a few locally bound ports. How many mac bindings do you have in the southbound database? What about FDB entries in the southbound? Do you have the "localnet_learn_fdb" option enabled on the localnet logical switch ports? > Why would we see so many ARP entries being generated? > One thing that comes to mind is that MAC_Binding (aka ARP cache) aging and FDB aging are disabled by default/not supported in that release. But it would be interesting to see how many such records (MAC_Binding/FDB) there actually are in the SB. Another possible scenario is that there are ACLs with complex matches. For example !=, <, > on IPs or layer 4 ports might cause lots of OpenFlows to be generated. For example, if I'm not mistaken, commit 422ab29e76b5 ("expr: Remove supersets from OR expressions.") [0] is not in OVN 22.09.2. > ovn-appctl coverage/show > Event coverage, avg rate over last: 5 seconds, last minute, last hour, > hash=14bcd236: > cmap_expand 0.0/sec 0.000/sec 0./sec > total: 3 > netlink_sent 0.0/sec 0.000/sec 0.0692/sec > total: 338 > netlink_received 0.0/sec 0.000/sec 0.0692/sec > total: 338 > vconn_sent 1.8/sec 0.600/sec 1.3228/sec > total: 650858 > vconn_received 2.0/sec 0.617/sec 0.3764/sec > total: 1743 > vconn_open 0.0/sec 0.000/sec 0./sec > total: 3 > util_xalloc 13326.6/sec 8847.767/sec 143201.4078/sec > total: 657040773 > unixctl_replied 0.2/sec 0.017/sec 0.0003/sec > total: 1 > unixctl_received 0.2/sec 0.017/sec 0.0003/sec > total: 1 > stream_open 0.0/sec 0.000/sec 0./sec > total: 5 > pstream_open 0.0/sec 0.000/sec 0./sec > total: 1 > seq_change 6.2/sec 3.617/sec 7.6122/sec > total: 72727 > rconn_sent 1.8/sec 0.600/sec 1.3197/sec > total: 648897 > rconn_queued 1.8/sec 0.600/sec 1.3197/sec > total: 648897 > poll_zero_timeout 0.0/sec 0.017/sec 0.0175/sec > total: 88 > poll_create_node 19.4/sec 10.550/sec 22.4789/sec > total: 225870 > txn_success 0.0/sec 0.000/sec 0.0150/sec > total: 84 > txn_incomplete 0.0/sec 0.000/sec 0.3678/sec > total: 3092 > txn_unchanged 1.6/sec 1.067/sec 2.5200/sec > total: 30484 > hmap_expand 302.2/sec 201.183/sec 642.3989/sec > total: 5011612 > hmap_pathological 7.2/sec 4.800/sec 24.6650/sec > total: 151135 > miniflow_malloc 0.0/sec 0.000/sec 11609.2186/sec > total: 43790437 > flow_extract 1.6/sec 0.367/sec 0.0908/sec > total: 386 > physical_run 0.0/sec 0.000/sec 0.0158/sec > total: 58 > pinctrl_total_pin_pkts 1.6/sec 0.367/sec 0.0908/sec > total: 386 > pinctrl_notify_main_thread 0.0/sec 0.000/sec 0.0003/sec > total: 1 > lflow_conj_free 0.0/sec 0.000/sec 0.0089/sec > total: 32 > lflow_conj_alloc 0.0/sec 0.000/sec 1.7628/sec > total: 6550 > lflow_cache_trim 0.0/sec 0.000/sec 0.0019/sec > total: 9 > lflow_cache_delete 0.0/sec 0.000/sec 0.0175/sec
Re: [ovs-discuss] bugzilla.kernel.org 218039
On 11/1/23 17:05, Chuck Lever III wrote: > > >> On Nov 1, 2023, at 3:18 AM, Ilya Maximets wrote: >> >> On 10/31/23 22:00, Chuck Lever III via discuss wrote: >>> Hi- >>> >>> I recently made some changes to tmpfs/shmemfs and someone has reported >>> an occasional memory leak, possibly when using ovs_vswitch.service. Can >>> I get someone to have a look at this report and perhaps make some >>> suggestions (or shoot down my working theory) ? >> >> Hi, Chuck. >> >> I looked at the bug, but the ovs-ctl script doesn't really do anything >> exceptional with tmpfs. It does use it though in a following way: >> >> 1. Create a couple of files with mktemp: >> >> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629 >> >> 2. Write some commands into these files, e.g.: >> >> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588 >> >> 3. Make them executable: >> >> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663 >> >> 4. Execute: >> >> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617 >> >> 5. Files are removed with 'rm -f' by an exit trap: >> >> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632 >> >> Also, ovsdb-server process is using the most basic version of a >> temporary file in its runtime, created by tmpfile(3). But this >> process is not even restarted by the ovs-vswitchd restart. > > Ilya, your response is very much appreciated. Would you mind > if I copied it to 218039 ? Sure. No problem. The list is actually public: https://mail.openvswitch.org/pipermail/ovs-discuss/2023-November/052783.html (bugs@ is an alias for ovs-discuss@) Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] bugzilla.kernel.org 218039
> On Nov 1, 2023, at 3:18 AM, Ilya Maximets wrote: > > On 10/31/23 22:00, Chuck Lever III via discuss wrote: >> Hi- >> >> I recently made some changes to tmpfs/shmemfs and someone has reported >> an occasional memory leak, possibly when using ovs_vswitch.service. Can >> I get someone to have a look at this report and perhaps make some >> suggestions (or shoot down my working theory) ? > > Hi, Chuck. > > I looked at the bug, but the ovs-ctl script doesn't really do anything > exceptional with tmpfs. It does use it though in a following way: > > 1. Create a couple of files with mktemp: > > https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629 > > 2. Write some commands into these files, e.g.: > > https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588 > > 3. Make them executable: > > https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663 > > 4. Execute: > > https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617 > > 5. Files are removed with 'rm -f' by an exit trap: > > https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632 > > Also, ovsdb-server process is using the most basic version of a > temporary file in its runtime, created by tmpfile(3). But this > process is not even restarted by the ovs-vswitchd restart. Ilya, your response is very much appreciated. Would you mind if I copied it to 218039 ? -- Chuck Lever ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] bugzilla.kernel.org 218039
On 10/31/23 22:00, Chuck Lever III via discuss wrote: > Hi- > > I recently made some changes to tmpfs/shmemfs and someone has reported > an occasional memory leak, possibly when using ovs_vswitch.service. Can > I get someone to have a look at this report and perhaps make some > suggestions (or shoot down my working theory) ? Hi, Chuck. I looked at the bug, but the ovs-ctl script doesn't really do anything exceptional with tmpfs. It does use it though in a following way: 1. Create a couple of files with mktemp: https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629 2. Write some commands into these files, e.g.: https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588 3. Make them executable: https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663 4. Execute: https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617 5. Files are removed with 'rm -f' by an exit trap: https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632 Also, ovsdb-server process is using the most basic version of a temporary file in its runtime, created by tmpfile(3). But this process is not even restarted by the ovs-vswitchd restart. HTH. Best regards, Ilya Maximets. ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss