Hi Ales, Dumitru and Ilya,

I agree with the revert, but I have one point:

With the monitor all enabled in the scenario of many ovn-controllers, they
receive a lot of messages unnecessarily from update of another Chassis
Private and the CPU load of the ovn-controller will be high (in my scenario
with around 150 ovn-controllers is happening).

Maybe we could discuss other solutions to avoid the high CPU load in
ovn-controller when the monitor all option is enabled and there is an
update of NB_Global nb_cfg and consequently update of Chass private nb_cfg.


Regards,
Lucas

Em seg., 17 de fev. de 2025 às 10:51, Dumitru Ceara <[email protected]>
escreveu:

> On 2/17/25 11:38 AM, Ilya Maximets wrote:
> > On 2/13/25 13:07, Dumitru Ceara wrote:
> >> On 2/10/25 7:55 AM, Ales Musil wrote:
> >>> On Fri, Feb 7, 2025 at 2:18 PM Lucas Vargas Dias via dev <
> >>> [email protected]> wrote:
> >>>
> >>>> CMS like neutron uses NB Global nb_cfg to check liveness
> >>>> of compute nodes. It will increment nb_cfg of Chassis private
> >>>> and with monitor all enabled, all chassis exchange between them
> >>>> messages of update Chassis Private. So, CPU load of ovn-controller
> >>>> will be high in scenario with many chassis.
> >>>> To fix it, each chassis monitor its Chassis Private just only.
> >>>>
> >>>> Signed-off-by: Lucas Vargas Dias <[email protected]>
> >>>> ---
> >
> > <snip>
> >
> >>
> >> With these changes I pushed the patch to main and branches 24.09
> >> and 24.03.  I also added Lucas to the AUTHORS list.
> >
> > Hi, Lucas, Ales and Dumitru.
> >
>
> Hi Ilya,
>
> > Unfortunately, this change significantly increases the Sb DB CPU usage.
> > Since the conditions are not all 'true', ovsdb-server can't use the
> > JSON cache and so it has to prepare separate updates for each client,
> > removing one of the main benefits of the monitor-all configuration.
> > Southbound DB CPU usage multiplies with the number of clients and it's
> > causing a very long poll intervals up to several minutes at high scale.
> >
> > Our weekly ovn-heater runs report 3x CPU usage increase in 250-node tests
> > and a complete failure of 500-node tests with port creation latency
> > exceeding 3.5 minutes, vs 10 seconds before this change on a cluster
> > density test scenario.
> >
>
> Thanks for the bug report!
>
> > We can work on optimizing ovsdb-server for this use case, but that may
> > also require reducing cache efficiency (e.g. with per-table caches), and
> > users will pair new OVN releases with older versions of OVS for a while
> > anyway.  So, for now, I think, we should revert this change, as the
> > current approach will break large clusters.
> >
> > Thoughts?
> >
>
> I agree, I'll post a patch to revert this change.  We can revisit it
> after we figure out a way to avoid the ovsdb-server side performance hit.
>
> > Best regards, Ilya Maximets.
> >
>
> Regards,
> Dumitru
>
>

-- 




_‘Esta mensagem é direcionada apenas para os endereços constantes no 
cabeçalho inicial. Se você não está listado nos endereços constantes no 
cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa 
mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão 
imediatamente anuladas e proibidas’._


* **‘Apesar do Magazine Luiza tomar 
todas as precauções razoáveis para assegurar que nenhum vírus esteja 
presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por 
quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*



_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to