[ovs-discuss] Gratuitous ARP is missing

2023-11-01 Thread Sri kor via discuss
Hii,

At times, when I allocate a VM to a compute node, the network fails to
learn an ARP entry. This issue occurs randomly, making it challenging to
predict when it will happen. I am interested in logging any instances where
the controller sends a Gratuitous ARP (GARP) to better understand and
address this problem.I am not sure ovn-controller is sending the GARP or
not.  Is this a known issue?

When we run into this issue, ovn-controller restart is fixing the problem.

Any other hacks work-arounds to fix the problem?


Thanks

Srini
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


[ovs-discuss] OVN upgrade

2023-11-01 Thread Sri kor via discuss
Hi Team,

   We are currently running the following versions of ovn-controller and
OVS. We are planning to migrate ovn-controller to 23.x version and OVS to
3.x version,  considerably LTS versions. For the smooth migration of DB
schemas, what are the compatible versions with backward compatibility?


# ovn-controller --version

ovn-controller 22.09.2

Open vSwitch Library 3.0.3

OpenFlow versions 0x6:0x6

SB DB Schema 20.25.0


# ovs-vsctl --version

ovs-vsctl (Open vSwitch) 2.17.6

DB Schema 8.3.0


Thanks

Srini
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] OVN 100% CPU - massive number of ARP entries

2023-11-01 Thread Dumitru Ceara via discuss
On 11/1/23 06:06, Gavin McKee via discuss wrote:
> Hi ,
> 

Hi Gavin,

> We are seeing ovn-controller churning constantly at 100% CPU usage.
> 
> (Open vSwitch) 2.17.6
> ovn-controller 22.09.2
> 

Is this a deployment that can be upgraded to newer OVS/OVN versions?

> 2023-11-01T04:54:08.406Z|01514|poll_loop|INFO|wakeup due to [POLLIN] on
> fd 24 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100%
> CPU usage)
> 2023-11-01T04:54:11.053Z|01515|poll_loop|INFO|wakeup due to 2641-ms
> timeout at lib/rconn.c:687 (100% CPU usage)
> 2023-11-01T04:54:11.058Z|01516|poll_loop|INFO|wakeup due to [POLLIN] on
> fd 23 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100%
> CPU usage)
> 
> We don't have a huge logical scale , maybe like 400 hypervisors , most
> machines have max 8 logical switch ports bound in the br-int .  
> 

It really depends on how the logical networks are connected to each
other.  If the 8 locally bound ports can logically access other remote
ports (e.g., via a logical router or if connected to the same logical
switch) then we need to install flows for those in the local br-int too.

400 hypervisors may be quite a lot though.  What is the state of the
Southbound database, is it handling the load properly?

> I think we are generating a massive number of ARP entries to the ovs
> switch , I'm wondering if this is why we are seeing the high CPU .
> ovs-ofctl dump-flows br-int | grep -i arp | wc -l
> 273259
> 

This does look suspicious having in mind that there are only a few
locally bound ports.  How many mac bindings do you have in the
southbound database?  What about FDB entries in the southbound?

Do you have the "localnet_learn_fdb" option enabled on the localnet
logical switch ports?

> Why would we see so many ARP entries being generated?
> 

One thing that comes to mind is that MAC_Binding (aka ARP cache) aging
and FDB aging are disabled by default/not supported in that release.
But it would be interesting to see how many such records
(MAC_Binding/FDB) there actually are in the SB.

Another possible scenario is that there are ACLs with complex matches.
For example !=, <, > on IPs or layer 4 ports might cause lots of
OpenFlows to be generated.  For example, if I'm not mistaken, commit
422ab29e76b5 ("expr: Remove supersets from OR expressions.") [0] is not
in OVN 22.09.2.

> ovn-appctl coverage/show
> Event coverage, avg rate over last: 5 seconds, last minute, last hour,
>  hash=14bcd236:
> cmap_expand                0.0/sec     0.000/sec        0./sec  
> total: 3
> netlink_sent               0.0/sec     0.000/sec        0.0692/sec  
> total: 338
> netlink_received           0.0/sec     0.000/sec        0.0692/sec  
> total: 338
> vconn_sent                 1.8/sec     0.600/sec        1.3228/sec  
> total: 650858
> vconn_received             2.0/sec     0.617/sec        0.3764/sec  
> total: 1743
> vconn_open                 0.0/sec     0.000/sec        0./sec  
> total: 3
> util_xalloc              13326.6/sec  8847.767/sec   143201.4078/sec  
> total: 657040773
> unixctl_replied            0.2/sec     0.017/sec        0.0003/sec  
> total: 1
> unixctl_received           0.2/sec     0.017/sec        0.0003/sec  
> total: 1
> stream_open                0.0/sec     0.000/sec        0./sec  
> total: 5
> pstream_open               0.0/sec     0.000/sec        0./sec  
> total: 1
> seq_change                 6.2/sec     3.617/sec        7.6122/sec  
> total: 72727
> rconn_sent                 1.8/sec     0.600/sec        1.3197/sec  
> total: 648897
> rconn_queued               1.8/sec     0.600/sec        1.3197/sec  
> total: 648897
> poll_zero_timeout          0.0/sec     0.017/sec        0.0175/sec  
> total: 88
> poll_create_node          19.4/sec    10.550/sec       22.4789/sec  
> total: 225870
> txn_success                0.0/sec     0.000/sec        0.0150/sec  
> total: 84
> txn_incomplete             0.0/sec     0.000/sec        0.3678/sec  
> total: 3092
> txn_unchanged              1.6/sec     1.067/sec        2.5200/sec  
> total: 30484
> hmap_expand              302.2/sec   201.183/sec      642.3989/sec  
> total: 5011612
> hmap_pathological          7.2/sec     4.800/sec       24.6650/sec  
> total: 151135
> miniflow_malloc            0.0/sec     0.000/sec    11609.2186/sec  
> total: 43790437
> flow_extract               1.6/sec     0.367/sec        0.0908/sec  
> total: 386
> physical_run               0.0/sec     0.000/sec        0.0158/sec  
> total: 58
> pinctrl_total_pin_pkts     1.6/sec     0.367/sec        0.0908/sec  
> total: 386
> pinctrl_notify_main_thread   0.0/sec     0.000/sec        0.0003/sec  
> total: 1
> lflow_conj_free            0.0/sec     0.000/sec        0.0089/sec  
> total: 32
> lflow_conj_alloc           0.0/sec     0.000/sec        1.7628/sec  
> total: 6550
> lflow_cache_trim           0.0/sec     0.000/sec        0.0019/sec  
> total: 9
> lflow_cache_delete         0.0/sec     0.000/sec        0.0175/sec  

Re: [ovs-discuss] bugzilla.kernel.org 218039

2023-11-01 Thread Ilya Maximets via discuss
On 11/1/23 17:05, Chuck Lever III wrote:
> 
> 
>> On Nov 1, 2023, at 3:18 AM, Ilya Maximets  wrote:
>>
>> On 10/31/23 22:00, Chuck Lever III via discuss wrote:
>>> Hi-
>>>
>>> I recently made some changes to tmpfs/shmemfs and someone has reported
>>> an occasional memory leak, possibly when using ovs_vswitch.service. Can
>>> I get someone to have a look at this report and perhaps make some
>>> suggestions (or shoot down my working theory) ?
>>
>> Hi, Chuck.
>>
>> I looked at the bug, but the ovs-ctl script doesn't really do anything
>> exceptional with tmpfs.  It does use it though in a following way:
>>
>> 1. Create a couple of files with mktemp:
>>   
>> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629
>>
>> 2. Write some commands into these files, e.g.:
>>   
>> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588
>>
>> 3. Make them executable:
>>   
>> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663
>>
>> 4. Execute:
>>   
>> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617
>>
>> 5. Files are removed with 'rm -f' by an exit trap:
>>   
>> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632
>>
>> Also, ovsdb-server process is using the most basic version of a
>> temporary file in its runtime, created by tmpfile(3).  But this
>> process is not even restarted by the ovs-vswitchd restart.
> 
> Ilya, your response is very much appreciated. Would you mind
> if I copied it to 218039 ?

Sure.  No problem.  The list is actually public:
  https://mail.openvswitch.org/pipermail/ovs-discuss/2023-November/052783.html

(bugs@ is an alias for ovs-discuss@)

Best regards, Ilya Maximets.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] bugzilla.kernel.org 218039

2023-11-01 Thread Chuck Lever III via discuss



> On Nov 1, 2023, at 3:18 AM, Ilya Maximets  wrote:
> 
> On 10/31/23 22:00, Chuck Lever III via discuss wrote:
>> Hi-
>> 
>> I recently made some changes to tmpfs/shmemfs and someone has reported
>> an occasional memory leak, possibly when using ovs_vswitch.service. Can
>> I get someone to have a look at this report and perhaps make some
>> suggestions (or shoot down my working theory) ?
> 
> Hi, Chuck.
> 
> I looked at the bug, but the ovs-ctl script doesn't really do anything
> exceptional with tmpfs.  It does use it though in a following way:
> 
> 1. Create a couple of files with mktemp:
>   
> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629
> 
> 2. Write some commands into these files, e.g.:
>   
> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588
> 
> 3. Make them executable:
>   
> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663
> 
> 4. Execute:
>   
> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617
> 
> 5. Files are removed with 'rm -f' by an exit trap:
>   
> https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632
> 
> Also, ovsdb-server process is using the most basic version of a
> temporary file in its runtime, created by tmpfile(3).  But this
> process is not even restarted by the ovs-vswitchd restart.

Ilya, your response is very much appreciated. Would you mind
if I copied it to 218039 ?


--
Chuck Lever


___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss


Re: [ovs-discuss] bugzilla.kernel.org 218039

2023-11-01 Thread Ilya Maximets via discuss
On 10/31/23 22:00, Chuck Lever III via discuss wrote:
> Hi-
> 
> I recently made some changes to tmpfs/shmemfs and someone has reported
> an occasional memory leak, possibly when using ovs_vswitch.service. Can
> I get someone to have a look at this report and perhaps make some
> suggestions (or shoot down my working theory) ?

Hi, Chuck.

I looked at the bug, but the ovs-ctl script doesn't really do anything
exceptional with tmpfs.  It does use it though in a following way:

1. Create a couple of files with mktemp:
   
https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L629

2. Write some commands into these files, e.g.:
   
https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L588

3. Make them executable:
   
https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L663

4. Execute:
   
https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L617

5. Files are removed with 'rm -f' by an exit trap:
   
https://github.com/openvswitch/ovs/blob/fdbf0bb2aed53e70b455eb1adcfda8d8278ea690/utilities/ovs-lib.in#L632

Also, ovsdb-server process is using the most basic version of a
temporary file in its runtime, created by tmpfile(3).  But this
process is not even restarted by the ovs-vswitchd restart.

HTH.

Best regards, Ilya Maximets.
___
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss