Hi Ales, Dumitru and Ilya, I agree with the revert, but I have one point:
With the monitor all enabled in the scenario of many ovn-controllers, they receive a lot of messages unnecessarily from update of another Chassis Private and the CPU load of the ovn-controller will be high (in my scenario with around 150 ovn-controllers is happening). Maybe we could discuss other solutions to avoid the high CPU load in ovn-controller when the monitor all option is enabled and there is an update of NB_Global nb_cfg and consequently update of Chass private nb_cfg. Regards, Lucas Em seg., 17 de fev. de 2025 às 10:51, Dumitru Ceara <[email protected]> escreveu: > On 2/17/25 11:38 AM, Ilya Maximets wrote: > > On 2/13/25 13:07, Dumitru Ceara wrote: > >> On 2/10/25 7:55 AM, Ales Musil wrote: > >>> On Fri, Feb 7, 2025 at 2:18 PM Lucas Vargas Dias via dev < > >>> [email protected]> wrote: > >>> > >>>> CMS like neutron uses NB Global nb_cfg to check liveness > >>>> of compute nodes. It will increment nb_cfg of Chassis private > >>>> and with monitor all enabled, all chassis exchange between them > >>>> messages of update Chassis Private. So, CPU load of ovn-controller > >>>> will be high in scenario with many chassis. > >>>> To fix it, each chassis monitor its Chassis Private just only. > >>>> > >>>> Signed-off-by: Lucas Vargas Dias <[email protected]> > >>>> --- > > > > <snip> > > > >> > >> With these changes I pushed the patch to main and branches 24.09 > >> and 24.03. I also added Lucas to the AUTHORS list. > > > > Hi, Lucas, Ales and Dumitru. > > > > Hi Ilya, > > > Unfortunately, this change significantly increases the Sb DB CPU usage. > > Since the conditions are not all 'true', ovsdb-server can't use the > > JSON cache and so it has to prepare separate updates for each client, > > removing one of the main benefits of the monitor-all configuration. > > Southbound DB CPU usage multiplies with the number of clients and it's > > causing a very long poll intervals up to several minutes at high scale. > > > > Our weekly ovn-heater runs report 3x CPU usage increase in 250-node tests > > and a complete failure of 500-node tests with port creation latency > > exceeding 3.5 minutes, vs 10 seconds before this change on a cluster > > density test scenario. > > > > Thanks for the bug report! > > > We can work on optimizing ovsdb-server for this use case, but that may > > also require reducing cache efficiency (e.g. with per-table caches), and > > users will pair new OVN releases with older versions of OVS for a while > > anyway. So, for now, I think, we should revert this change, as the > > current approach will break large clusters. > > > > Thoughts? > > > > I agree, I'll post a patch to revert this change. We can revisit it > after we figure out a way to avoid the ovsdb-server side performance hit. > > > Best regards, Ilya Maximets. > > > > Regards, > Dumitru > > -- _‘Esta mensagem é direcionada apenas para os endereços constantes no cabeçalho inicial. Se você não está listado nos endereços constantes no cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas estão imediatamente anuladas e proibidas’._ * **‘Apesar do Magazine Luiza tomar todas as precauções razoáveis para assegurar que nenhum vírus esteja presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
