On 2/13/25 13:07, Dumitru Ceara wrote:
> On 2/10/25 7:55 AM, Ales Musil wrote:
>> On Fri, Feb 7, 2025 at 2:18 PM Lucas Vargas Dias via dev <
>> [email protected]> wrote:
>>
>>> CMS like neutron uses NB Global nb_cfg to check liveness
>>> of compute nodes. It will increment nb_cfg of Chassis private
>>> and with monitor all enabled, all chassis exchange between them
>>> messages of update Chassis Private. So, CPU load of ovn-controller
>>> will be high in scenario with many chassis.
>>> To fix it, each chassis monitor its Chassis Private just only.
>>>
>>> Signed-off-by: Lucas Vargas Dias <[email protected]>
>>> ---

<snip>

> 
> With these changes I pushed the patch to main and branches 24.09
> and 24.03.  I also added Lucas to the AUTHORS list.

Hi, Lucas, Ales and Dumitru.

Unfortunately, this change significantly increases the Sb DB CPU usage.
Since the conditions are not all 'true', ovsdb-server can't use the
JSON cache and so it has to prepare separate updates for each client,
removing one of the main benefits of the monitor-all configuration.
Southbound DB CPU usage multiplies with the number of clients and it's
causing a very long poll intervals up to several minutes at high scale.

Our weekly ovn-heater runs report 3x CPU usage increase in 250-node tests
and a complete failure of 500-node tests with port creation latency
exceeding 3.5 minutes, vs 10 seconds before this change on a cluster
density test scenario.

We can work on optimizing ovsdb-server for this use case, but that may
also require reducing cache efficiency (e.g. with per-table caches), and
users will pair new OVN releases with older versions of OVS for a while
anyway.  So, for now, I think, we should revert this change, as the
current approach will break large clusters.

Thoughts?  

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to