On 2/17/25 16:11, Ilya Maximets wrote:
> On 2/17/25 16:02, Lucas Vargas Dias via dev wrote:
>> Hi Ales, Dumitru and Ilya,
>>
>> I agree with the revert, but I have one point:
>>
>> With the monitor all enabled in the scenario of many ovn-controllers, they
>> receive a lot of messages unnecessarily from update of another Chassis
>> Private and the CPU load of the ovn-controller will be high (in my scenario
>> with around 150 ovn-controllers is happening).
> 
> Have you considered turning the monitor-all off in your setup?
> 
> ovn-heater runs with the latest OVS/OVN seem to survive (even though
> there is a CPU usage increase) at 250 nodes without JSON cache.  And
> I assume you tested this change on your setup and it works fine even
> without the cache.
> 
> So, maybe your 150 node setup can live without monitor-all ?
> 
>>
>> Maybe we could discuss other solutions to avoid the high CPU load in
>> ovn-controller when the monitor all option is enabled and there is an
>> update of NB_Global nb_cfg and consequently update of Chass private nb_cfg.
> 
> Sure.  Do you know where the CPU usage is coming from though?
> IIUC, ideally it should just wake up and go back to sleep pretty much
> for all the unrelated updates.  Is it just waking up that is causing
> the high CPU usage or does ovn-controller perform some other heavy work
> that it can avoid doing?

And since those are very small fast incoming updates, maybe batching
them may be helpful.  E.g. maybe we can do in ovn-controller something
similar to what we do in northd:
  703949bd8b9a ("northd: Accumulate more database updates before processing.")
?

> 
> Best regards, Ilya Maximets.
> 
>>
>>
>> Regards,
>> Lucas
>>
>> Em seg., 17 de fev. de 2025 às 10:51, Dumitru Ceara <[email protected]>
>> escreveu:
>>
>>> On 2/17/25 11:38 AM, Ilya Maximets wrote:
>>>> On 2/13/25 13:07, Dumitru Ceara wrote:
>>>>> On 2/10/25 7:55 AM, Ales Musil wrote:
>>>>>> On Fri, Feb 7, 2025 at 2:18 PM Lucas Vargas Dias via dev <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> CMS like neutron uses NB Global nb_cfg to check liveness
>>>>>>> of compute nodes. It will increment nb_cfg of Chassis private
>>>>>>> and with monitor all enabled, all chassis exchange between them
>>>>>>> messages of update Chassis Private. So, CPU load of ovn-controller
>>>>>>> will be high in scenario with many chassis.
>>>>>>> To fix it, each chassis monitor its Chassis Private just only.
>>>>>>>
>>>>>>> Signed-off-by: Lucas Vargas Dias <[email protected]>
>>>>>>> ---
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> With these changes I pushed the patch to main and branches 24.09
>>>>> and 24.03.  I also added Lucas to the AUTHORS list.
>>>>
>>>> Hi, Lucas, Ales and Dumitru.
>>>>
>>>
>>> Hi Ilya,
>>>
>>>> Unfortunately, this change significantly increases the Sb DB CPU usage.
>>>> Since the conditions are not all 'true', ovsdb-server can't use the
>>>> JSON cache and so it has to prepare separate updates for each client,
>>>> removing one of the main benefits of the monitor-all configuration.
>>>> Southbound DB CPU usage multiplies with the number of clients and it's
>>>> causing a very long poll intervals up to several minutes at high scale.
>>>>
>>>> Our weekly ovn-heater runs report 3x CPU usage increase in 250-node tests
>>>> and a complete failure of 500-node tests with port creation latency
>>>> exceeding 3.5 minutes, vs 10 seconds before this change on a cluster
>>>> density test scenario.
>>>>
>>>
>>> Thanks for the bug report!
>>>
>>>> We can work on optimizing ovsdb-server for this use case, but that may
>>>> also require reducing cache efficiency (e.g. with per-table caches), and
>>>> users will pair new OVN releases with older versions of OVS for a while
>>>> anyway.  So, for now, I think, we should revert this change, as the
>>>> current approach will break large clusters.
>>>>
>>>> Thoughts?
>>>>
>>>
>>> I agree, I'll post a patch to revert this change.  We can revisit it
>>> after we figure out a way to avoid the ovsdb-server side performance hit.
>>>
>>>> Best regards, Ilya Maximets.
>>>>
>>>
>>> Regards,
>>> Dumitru
>>>
>>>
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to