On Mon, Jun 30, 2025 at 4:28 PM Felix Huettner <felix.huettner@stackit.cloud>
wrote:

> On Tue, Jun 24, 2025 at 04:56:56PM +0200, Ales Musil wrote:
> > On Mon, Jun 23, 2025 at 11:14 AM Ilya Maximets <i.maxim...@ovn.org>
> wrote:
> >
> > > On 6/10/25 2:03 PM, Felix Huettner via dev wrote:
> > > > Previously the needed garps and rarps where calculated in each loop
> of
> > > > the ovn-controller in pinctrl_run. This is quite wasteful as most of
> the
> > > > time nothing relevant changes. In large external networks this can
> have
> > > > a significant performance impact on the ovn-controller.
> > > >
> > > > Previous patches
> > >
> https://patchwork.ozlabs.org/project/ovn/patch/zzxvora5keucn...@sit-sdelap1634.int.lidl.net/
> > > > extraced just a limited part of the functionality to an engine node.
> > > > This patchset now limits the logic in pinctrl_run to the absolute
> > > > minimum.
> > > > Also it addresses some issues with incremental processing changes
> while
> > > > the southbound connection is unavailable.
> > > >
> > > > v11->v12:
> > > >   * rebased
> > > > v10->v11:
> > > >   * removed merged patches
> > > >   * fix concurrency bug in updating announcement times
> > > > v9->v10:
> > > >   * rebased
> > > >   * fixed countdown in first patch
> > > > v8->v9:
> > > >   * rebased
> > > >   * rework usage of cmap for garp_rarp data
> > > > v7->v8:
> > > >   * rebased
> > > >   * improved handling of daemon_started_recently
> > > > v6->v7:
> > > >   * rebased
> > > >   * fixed ct-zone inconsitencies that break ci
> > > > v5->v6: rebased
> > > >
> > > > Felix Huettner (1):
> > > >   controller: Extract garp_rarp to engine node.
> > >
> > >
> > > Hello there.  Unfortunately, our ovn-heater scale CI reported a
> significant
> > > performance regression this week.  Further bisection points to this
> change
> > > being the root cause.
> > >
> > > What's happening is after this change, ovn-controller is in a constant
> loop
> > > of (forced) recomputes, which is very heavy at scale:
> > >
> > > 2025-06-22T16:14:02.292Z|02003|binding|INFO|Claiming lport lp-0-75 for
> > > this chassis.
> > > 2025-06-22T16:14:02.292Z|02004|binding|INFO|lp-0-75: Claiming
> > > 22:94:4c:ca:da:9b 16.0.0.76
> > > 2025-06-22T16:14:03.317Z|02005|inc_proc_eng|INFO|node: lb_data,
> recompute
> > > (forced) took 613ms
> > > 2025-06-22T16:14:08.939Z|02006|inc_proc_eng|INFO|node: lflow_output,
> > > recompute (forced) took 5622ms
> > > 2025-06-22T16:14:09.322Z|02007|binding|INFO|Setting lport lp-0-75
> > > ovn-installed in OVS
> > > 2025-06-22T16:14:09.322Z|02008|binding|INFO|Setting lport lp-0-75 up in
> > > Southbound
> > > 2025-06-22T16:14:09.323Z|02009|timeval|WARN|Unreasonably long 6964ms
> poll
> > > interval (5657ms user, 1287ms system)
> > >
> > > Before this change, the cluster-density test, for example, didn't have
> any
> > > long poll intervals and everything was computed incrementally.  After
> the
> > > change it recomputes pretty much on every iteration and average
> > > ovn-installed
> > > latency went up from 2.8 to 9.2 seconds.
> > >
> > > Could someone, please, take a look?
> > >
> > > Best regards, Ilya Maximets.
> > >
> > >
> > Hi Felix and Ilya,
> >
> > I have posted a patch to hopefully mitigate the issue
> > for most workloads [0], we will need some long term solution.
> >
> > [0]
> >
> https://patchwork.ozlabs.org/project/ovn/patch/20250624145405.3197685-1-amu...@redhat.com/
>
> Hi Ales, Hi Ilya,
>

Hi Felix,


>
> sorry for that and thanks for fixing it already.
>
> If i got the above and the commit message right, the issue is basically
> that there is no incremental processing yet and if recomputes are
> not allowed this will trigger a recompute of all engine nodes.
>

That is correct.


>
> While we could definately add incremental processing here i wounder
> about making this less problematic.
>
> From the comments it looks like we disallow recomputes if:
> 1. We do not have a southbound transaction (e.g. the previous one is
>    still in flight or we are not connected)
> 2. OR we have a backlog on the vswitchd connection
>
> However the first is only problematic if we write to southbound and the
> second only if write to vswitchd.
> For a node like the garp_rarp one, which just processes data for
> ovn-controller to work with both of these are irrelevant. We neither
> write to southbound nor to vswitchd.
>
> Would it be an option to make the decission if recomputes are allowed on
> a per engine node basis? Then we could do the recompute_allowed
> decission more granularly and thereby maybe skip global recomputes in
> some cases.
>


That is something that we were discussing internally, the garp_rarp
node is not the only one that was triggering cancellations, but it
happened more frequently. So this is a global problem and as you said
one of the solutions might be to "mark" nodes as not SB dependent, in
that case we can allow recompute when sb is readonly because it
doesn't affect the data at all. The only problem that we are trying
to solve before proceeding is how to make this "marking" error proof
programatically. Mostly in cases where the node is wrongly considered
as SB independent, but still writes into SB. For that we would
probably need support from idl layer, but any other ideas how to
achieve that are very much welcome.


> Thanks a lot,
> Felix
>
> >
> > Thanks,
> > Ales
>
>
>
Thanks,
Ales
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to