On 2/13/25 13:07, Dumitru Ceara wrote: > On 2/10/25 7:55 AM, Ales Musil wrote: >> On Fri, Feb 7, 2025 at 2:18 PM Lucas Vargas Dias via dev < >> [email protected]> wrote: >> >>> CMS like neutron uses NB Global nb_cfg to check liveness >>> of compute nodes. It will increment nb_cfg of Chassis private >>> and with monitor all enabled, all chassis exchange between them >>> messages of update Chassis Private. So, CPU load of ovn-controller >>> will be high in scenario with many chassis. >>> To fix it, each chassis monitor its Chassis Private just only. >>> >>> Signed-off-by: Lucas Vargas Dias <[email protected]> >>> ---
<snip> > > With these changes I pushed the patch to main and branches 24.09 > and 24.03. I also added Lucas to the AUTHORS list. Hi, Lucas, Ales and Dumitru. Unfortunately, this change significantly increases the Sb DB CPU usage. Since the conditions are not all 'true', ovsdb-server can't use the JSON cache and so it has to prepare separate updates for each client, removing one of the main benefits of the monitor-all configuration. Southbound DB CPU usage multiplies with the number of clients and it's causing a very long poll intervals up to several minutes at high scale. Our weekly ovn-heater runs report 3x CPU usage increase in 250-node tests and a complete failure of 500-node tests with port creation latency exceeding 3.5 minutes, vs 10 seconds before this change on a cluster density test scenario. We can work on optimizing ovsdb-server for this use case, but that may also require reducing cache efficiency (e.g. with per-table caches), and users will pair new OVN releases with older versions of OVS for a while anyway. So, for now, I think, we should revert this change, as the current approach will break large clusters. Thoughts? Best regards, Ilya Maximets. _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
