>-----Original Message-----
>From: Ilya Maximets <i.maxim...@ovn.org>
>Sent: Tuesday, 1 November 2022 12:23
>To: Eli Britstein <el...@nvidia.com>; Donald Sharp
><donaldshar...@gmail.com>; ovs-discuss@openvswitch.org;
>e...@eecs.berkeley.edu
>Cc: i.maxim...@ovn.org
>Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>
>External email: Use caution opening links or attachments
>
>
>On 11/1/22 10:50, Eli Britstein wrote:
>>
>>
>>> -----Original Message-----
>>> From: Ilya Maximets <i.maxim...@ovn.org>
>>> Sent: Monday, 31 October 2022 23:54
>>> To: Donald Sharp <donaldshar...@gmail.com>; ovs-
>>> disc...@openvswitch.org; e...@eecs.berkeley.edu; Eli Britstein
>>> <el...@nvidia.com>
>>> Cc: i.maxim...@ovn.org
>>> Subject: Re: [ovs-discuss] ovs-vswitchd running at 100% cpu
>>>
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 10/31/22 17:25, Donald Sharp via discuss wrote:
>>>> Hi!
>>>>
>>>> I work on the FRRouting project (https://frrouting/org
>>> <https://frrouting/org> ) and am doing work with FRR and have noticed
>>> that when I have a full BGP feed on a system that is also running
>>> ovs-vswitchd that ovs-vswitchd sits at 100% cpu:
>>>>
>>>> top - 09:43:12 up 4 days, 22:53, 3 users, load average: 1.06, 1.08, 1.08
>>>> Tasks: 188 total, 3 running, 185 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 12.3 us, 14.7 sy, 0.0 ni, 72.8 id, 0.0 wa, 0.0 hi, 0.2 si,
>>>> 0.0 st
>>>> MiB Mem : 7859.3 total, 2756.5 free, 2467.2 used, 2635.6 buff/cache
>>>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 5101.9 avail Mem
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>COMMAND
>>>> 730 root 10 -10 146204 146048 11636 R 98.3 1.8 6998:13 ovs-
>vswitchd
>>>> 169620 root 20 0 0 0 0 I 3.3 0.0 1:34.83
>>>> kworker/0:3-events
>>>> 21 root 20 0 0 0 0 S 1.3 0.0 14:09.59
>>>> ksoftirqd/1
>>>> 131734 frr 15 -5 2384292 609556 6612 S 1.0 7.6 21:57.51
>>>> zebra
>>>> 131739 frr 15 -5 1301168 1.0g 7420 S 1.0 13.3 18:16.17 bgpd
>>>>
>>>> When I turn off FRR ( or turn off the bgp feed ) ovs-vswitchd stops
>>>> running
>>> at 100%:
>>>>
>>>> top - 09:48:12 up 4 days, 22:58, 3 users, load average: 0.08, 0.60, 0.89
>>>> Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie
>>>> %Cpu(s): 0.2 us, 0.4 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.1 si,
>>>> 0.0 st
>>>> MiB Mem : 7859.3 total, 4560.6 free, 663.1 used, 2635.6 buff/cache
>>>> MiB Swap: 2048.0 total, 2048.0 free, 0.0 used. 6906.1 avail Mem
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>COMMAND
>>>> 179064 sharpd 20 0 11852 3816 3172 R 1.0 0.0 0:00.09 top
>>>> 1037 zerotie+ 20 0 291852 113180 7408 S 0.7 1.4 19:09.17
>>>> zerotier-
>one
>>>> 1043 Debian-+ 20 0 34356 21988 7588 S 0.3 0.3 22:04.42
>>>> snmpd
>>>> 178480 root 20 0 0 0 0 I 0.3 0.0 0:01.21
>>>> kworker/1:2-events
>>>> 178622 sharpd 20 0 14020 6364 4872 S 0.3 0.1 0:00.10 sshd
>>>> 1 root 20 0 169872 13140 8272 S 0.0 0.2 2:33.26
>>>> systemd
>>>> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.60
>>>> kthreadd
>>>>
>>>> I do not have any particular ovs configuration on this box:
>>>> sharpd@janelle:~$ sudo ovs-vsctl show
>>>> c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>>>> ovs_version: "2.13.8"
>>>>
>>>>
>>>> sharpd@janelle:~$ sudo ovs-vsctl list o .
>>>> _uuid : c72d327c-61eb-4877-b4e7-dcf7e07e24fc
>>>> bridges : []
>>>> cur_cfg : 0
>>>> datapath_types : [netdev, system]
>>>> datapaths : {}
>>>> db_version : "8.2.0"
>>>> dpdk_initialized : false
>>>> dpdk_version : none
>>>> external_ids : {hostname=janelle, rundir="/var/run/openvswitch",
>>> system-id="a1031fcf-8acc-40a9-9fd6-521716b0faaa"}
>>>> iface_types : [erspan, geneve, gre, internal, ip6erspan, ip6gre,
>>>> lisp,
>>> patch, stt, system, tap, vxlan]
>>>> manager_options : []
>>>> next_cfg : 0
>>>> other_config : {}
>>>> ovs_version : "2.13.8"
>>>> ssl : []
>>>> statistics : {}
>>>> system_type : ubuntu
>>>> system_version : "20.04"
>>>>
>>>> sharpd@janelle:~$ sudo ovs-appctl dpctl/dump-flows -m
>>>> ovs-vswitchd: no datapaths exist
>>>> ovs-vswitchd: datapath not found (Invalid argument)
>>>> ovs-appctl: ovs-vswitchd: server returned an error
>>>>
>>>> Eli Britstein suggested I update ovs-openvswitch to latest and I did
>>>> and saw the same behavior. When I pulled up the running code in a
>>> debugger I see that ovs-vswitchd is running in this loop below pretty
>>> much 100% of the time:
>>>>
>>>> (gdb) f 4
>>>> #4 0x0000559498b4e476 in route_table_run () at lib/route-table.c:133
>>>> 133 nln_run(nln);
>>>> (gdb) l
>>>> 128 OVS_EXCLUDED(route_table_mutex)
>>>> 129 {
>>>> 130 ovs_mutex_lock(&route_table_mutex);
>>>> 131 if (nln) {
>>>> 132 rtnetlink_run();
>>>> 133 nln_run(nln);
>>>> 134
>>>> 135 if (!route_table_valid) {
>>>> 136 route_table_reset();
>>>> 137 }
>>>> (gdb) l
>>>> 138 }
>>>> 139 ovs_mutex_unlock(&route_table_mutex);
>>>> 140 }
>>>>
>>>> I pulled up where route_table_valid is set:
>>>>
>>>> 298 static void
>>>> 299 route_table_change(const struct route_table_msg *change
>>> OVS_UNUSED,
>>>> 300 void *aux OVS_UNUSED)
>>>> 301 {
>>>> 302 route_table_valid = false;
>>>> 303 }
>>>>
>>>>
>>>> If I am reading the code correctly, every RTM_NEWROUTE netlink
>>>> message that ovs-vswitchd is getting is setting the
>>>> route_table_valid global variable to
>>> false and causing route_table_reset() to be run.
>>>> This makes sense in context of what FRR is doing. A full BGP feed
>>>> *always* has churn. So ovs-vswitchd is receiving. RTM_NEWROUTE
>>>> message, parsing it and deciding in route_table_change() that the
>>>> route table is no longer valid and causing it to call
>>>> route_table_reset() which
>>> redumps the entire routing table to ovs-vswitchd. In this case there
>>> are ~115k
>>> ipv6 routes in the linux fib.
>>>>
>>>> I hesitate to make any changes here since I really don't understand
>>>> what the
>>> end goal here is.
>>>> ovs-vswitchd is receiving a route change from the kernel but is in
>>>> turn causing it to redump the entire routing table again. What
>>>> should be the
>>> correct behavior be from ovs-vswitchd's perspective here?
>>>
>>> Hi, Donald.
>>>
>>> Your analysis is correct. OVS will invalidate the cached routing
>>> table and re- dump it in full on the next access on each netlink
>>> notification about route changes.
>>>
>>> Looking back into commit history, OVS did maintain the cache and only
>>> added/removed what was in the netlink message incrementally.
>>> But that changed in 2011 with the following commit:
>>>
>>> commit f0e167f0dbadbe2a8d684f63ad9faf68d8cb9884
>>> Author: Ethan J. Jackson <e...@eecs.berkeley.edu>
>>> Date: Thu Jan 13 16:29:31 2011 -0800
>>>
>>> route-table: Handle route updates more robustly.
>>>
>>> The kernel does not broadcast rtnetlink route messages in all cases
>>> one would expect. This can cause stale entires to end up in the
>>> route table which may cause incorrect results for
>>> route_table_get_ifindex() queries. This commit causes rtnetlink
>>> route messages to dump the entire route table on the next
>>> route_table_get_ifindex() query.
>>>
>>> And indeed, looking at the history of attempts of different projects
>>> to use route notifications, they all are facing issues and it seems
>>> like none of them is actually able to fully correctly handle all the
>>> notifications, just because these notifications are notoriously bad.
>>> It seems to be impossible in certain cases to tell what exactly
>>> changed and how. There could be duplicates or missing notifications.
>>> And the code of projects that are trying to maintain a route cache in
>>> userspace is insanely complex and doesn't handle 100% of cases anyway.
>>>
>>> There were attempts to convince kernel developers to add unique
>>> identifiers to routes, so userspace can tell them apart, but all of
>>> them seems to die leaving the problem unresolved.
>>>
>>> These are some discussions/bugs that I found:
>>>
>>>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>>> zil
>>>
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1337855&data=05%7C01%7Celi
>>>
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>>>
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>>>
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>>>
>XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=LWOW4uNIhpSbEtBBVlhyy0
>>> TiPyKXYxXv%2B%2Fwppp5bMpM%3D&reserved=0
>>>
>>>
>https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbug
>>> zil
>>>
>la.redhat.com%2Fshow_bug.cgi%3Fid%3D1722728&data=05%7C01%7Celi
>>>
>br%40nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d157273
>>>
>40c1b7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CT
>>>
>WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJ
>>>
>XVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=vOfVjOADZpRIt1mEIj9ygrkD
>>> UE2k4paCTiAB51Nj97w%3D&reserved=0
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>>> hu
>>>
>b.com%2Fthom311%2Flibnl%2Fissues%2F226&data=05%7C01%7Celibr%4
>>>
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>>>
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>>>
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>>>
>6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BCO0Ns6HTfiqjHYb3M6rHTh
>>> W7d01OtMAkcAqWDnQwVE%3D&reserved=0
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit
>>> hu
>>>
>b.com%2Fthom311%2Flibnl%2Fissues%2F224&data=05%7C01%7Celibr%4
>>>
>0nvidia.com%7C71010b27b13b4928f2d708dabb8a7bce%7C43083d15727340c1b
>>>
>7db39efd9ccc17a%7C0%7C0%7C638028500722289547%7CUnknown%7CTWFp
>>>
>bGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
>>>
>6Mn0%3D%7C3000%7C%7C%7C&sdata=p9rviRFrZjayuCmcfn4jij8lRWwTb
>>> 0Jsy6eeN5UfUJ0%3D&reserved=0
>>>
>>> None of the bugs seems to be resolved. Most are closed for
>>> non-technical reasons.
>>>
>>> I suppose, Ethan just decided to not deal with that horribly
>>> unreliable kernel interface and just re-dump the route table on changes.
>>>
>>>
>>> For your actual problem here, I'm not sure if we can fix it that easily.
>>>
>>> Is it necessary for OVS to know about these routes?
>>> If no, it might be possible to isolate them in a separate network
>>> namespace, so OVS will not receive all the route updates?
>>>
>>> Do you know how long it takes to dump a route table once?
>>> Maybe it worth limiting that process to only dump once a second or
>>> once in a few seconds. That should alleviate the load if the actual
>>> dump is relatively fast.
>> In this setup OVS just runs without any use. There is no datapath (no
>bridges/ports) configured. It is useless to run this mechanism at all for it.
>> We can bind this mechanism to at least one datapath is configured (or even
>only when there is at least one tunnel configured).
>> What do you think?
>
>Hmm. Why don't you just stop/disable the service then?
Indeed, that's possible. It's just turned on by default in this system (Debian)
and Donald noticed the CPU consumption.
>
>>>
>>> Best regards, Ilya Maximets.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss