On Thu, Jul 3, 2025 at 9:46 AM Felix Huettner <felix.huettner@stackit.cloud> wrote:
> On Thu, Jul 03, 2025 at 08:48:23AM +0200, Ales Musil wrote: > > On Thu, Jul 3, 2025 at 8:39 AM Felix Huettner > <felix.huettner@stackit.cloud> > > wrote: > > > > > On Thu, Jul 03, 2025 at 08:27:01AM +0200, Ales Musil via dev wrote: > > > > On Mon, Jun 30, 2025 at 4:28 PM Felix Huettner > > > <felix.huettner@stackit.cloud> > > > > wrote: > > > > > > > > > On Tue, Jun 24, 2025 at 04:56:56PM +0200, Ales Musil wrote: > > > > > > On Mon, Jun 23, 2025 at 11:14 AM Ilya Maximets < > i.maxim...@ovn.org> > > > > > wrote: > > > > > > > > > > > > > On 6/10/25 2:03 PM, Felix Huettner via dev wrote: > > > > > > > > Previously the needed garps and rarps where calculated in > each > > > loop > > > > > of > > > > > > > > the ovn-controller in pinctrl_run. This is quite wasteful as > > > most of > > > > > the > > > > > > > > time nothing relevant changes. In large external networks > this > > > can > > > > > have > > > > > > > > a significant performance impact on the ovn-controller. > > > > > > > > > > > > > > > > Previous patches > > > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/ovn/patch/zzxvora5keucn...@sit-sdelap1634.int.lidl.net/ > > > > > > > > extraced just a limited part of the functionality to an > engine > > > node. > > > > > > > > This patchset now limits the logic in pinctrl_run to the > absolute > > > > > > > > minimum. > > > > > > > > Also it addresses some issues with incremental processing > changes > > > > > while > > > > > > > > the southbound connection is unavailable. > > > > > > > > > > > > > > > > v11->v12: > > > > > > > > * rebased > > > > > > > > v10->v11: > > > > > > > > * removed merged patches > > > > > > > > * fix concurrency bug in updating announcement times > > > > > > > > v9->v10: > > > > > > > > * rebased > > > > > > > > * fixed countdown in first patch > > > > > > > > v8->v9: > > > > > > > > * rebased > > > > > > > > * rework usage of cmap for garp_rarp data > > > > > > > > v7->v8: > > > > > > > > * rebased > > > > > > > > * improved handling of daemon_started_recently > > > > > > > > v6->v7: > > > > > > > > * rebased > > > > > > > > * fixed ct-zone inconsitencies that break ci > > > > > > > > v5->v6: rebased > > > > > > > > > > > > > > > > Felix Huettner (1): > > > > > > > > controller: Extract garp_rarp to engine node. > > > > > > > > > > > > > > > > > > > > > Hello there. Unfortunately, our ovn-heater scale CI reported a > > > > > significant > > > > > > > performance regression this week. Further bisection points to > this > > > > > change > > > > > > > being the root cause. > > > > > > > > > > > > > > What's happening is after this change, ovn-controller is in a > > > constant > > > > > loop > > > > > > > of (forced) recomputes, which is very heavy at scale: > > > > > > > > > > > > > > 2025-06-22T16:14:02.292Z|02003|binding|INFO|Claiming lport > lp-0-75 > > > for > > > > > > > this chassis. > > > > > > > 2025-06-22T16:14:02.292Z|02004|binding|INFO|lp-0-75: Claiming > > > > > > > 22:94:4c:ca:da:9b 16.0.0.76 > > > > > > > 2025-06-22T16:14:03.317Z|02005|inc_proc_eng|INFO|node: lb_data, > > > > > recompute > > > > > > > (forced) took 613ms > > > > > > > 2025-06-22T16:14:08.939Z|02006|inc_proc_eng|INFO|node: > > > lflow_output, > > > > > > > recompute (forced) took 5622ms > > > > > > > 2025-06-22T16:14:09.322Z|02007|binding|INFO|Setting lport > lp-0-75 > > > > > > > ovn-installed in OVS > > > > > > > 2025-06-22T16:14:09.322Z|02008|binding|INFO|Setting lport > lp-0-75 > > > up in > > > > > > > Southbound > > > > > > > 2025-06-22T16:14:09.323Z|02009|timeval|WARN|Unreasonably long > > > 6964ms > > > > > poll > > > > > > > interval (5657ms user, 1287ms system) > > > > > > > > > > > > > > Before this change, the cluster-density test, for example, > didn't > > > have > > > > > any > > > > > > > long poll intervals and everything was computed incrementally. > > > After > > > > > the > > > > > > > change it recomputes pretty much on every iteration and average > > > > > > > ovn-installed > > > > > > > latency went up from 2.8 to 9.2 seconds. > > > > > > > > > > > > > > Could someone, please, take a look? > > > > > > > > > > > > > > Best regards, Ilya Maximets. > > > > > > > > > > > > > > > > > > > > Hi Felix and Ilya, > > > > > > > > > > > > I have posted a patch to hopefully mitigate the issue > > > > > > for most workloads [0], we will need some long term solution. > > > > > > > > > > > > [0] > > > > > > > > > > > > > > > https://patchwork.ozlabs.org/project/ovn/patch/20250624145405.3197685-1-amu...@redhat.com/ > > > > > > > > > > Hi Ales, Hi Ilya, > > > > > > > > > > > > > Hi Felix, > > > > > > > > > > > > > > > > > > sorry for that and thanks for fixing it already. > > > > > > > > > > If i got the above and the commit message right, the issue is > basically > > > > > that there is no incremental processing yet and if recomputes are > > > > > not allowed this will trigger a recompute of all engine nodes. > > > > > > > > > > > > > That is correct. > > > > > > > > > > > > > > > > > > While we could definately add incremental processing here i wounder > > > > > about making this less problematic. > > > > > > > > > > From the comments it looks like we disallow recomputes if: > > > > > 1. We do not have a southbound transaction (e.g. the previous one > is > > > > > still in flight or we are not connected) > > > > > 2. OR we have a backlog on the vswitchd connection > > > > > > > > > > However the first is only problematic if we write to southbound > and the > > > > > second only if write to vswitchd. > > > > > For a node like the garp_rarp one, which just processes data for > > > > > ovn-controller to work with both of these are irrelevant. We > neither > > > > > write to southbound nor to vswitchd. > > > > > > > > > > Would it be an option to make the decission if recomputes are > allowed > > > on > > > > > a per engine node basis? Then we could do the recompute_allowed > > > > > decission more granularly and thereby maybe skip global recomputes > in > > > > > some cases. > > > > > > > > > > > > > > > > > That is something that we were discussing internally, the garp_rarp > > > > node is not the only one that was triggering cancellations, but it > > > > happened more frequently. So this is a global problem and as you said > > > > one of the solutions might be to "mark" nodes as not SB dependent, in > > > > that case we can allow recompute when sb is readonly because it > > > > doesn't affect the data at all. The only problem that we are trying > > > > to solve before proceeding is how to make this "marking" error proof > > > > programatically. Mostly in cases where the node is wrongly considered > > > > as SB independent, but still writes into SB. For that we would > > > > probably need support from idl layer, but any other ideas how to > > > > achieve that are very much welcome. > > > > > > Hi Ales, > > > > > > > Hi Felix, > > Hi Ales, > > > > > > > > > > > i just had an idea about that. Not sure if it is nice though :) > > > > > > > It makes sense, but there is one catch see down below. > > > > > > > > > > 1. Extend "struct engine_node" with a bool "needs_context" > > > 2. Add a define to set the value during node definition > > > 3. Set this on all nodes that require any txn/client_ctx > > > 4. Require "struct engine_node" to be passed to "engine_get_context" > > > 5. Crash "engine_get_context" if the provided engine node has > > > "!needs_context" > > > 6. In "engine_recompute" change the following > > > > > > - if (!allowed) { > > > + if (!allowed && node->needs_context) { > > > > > > > That would work only for SB calls that do need the txn directly. > > Unfortunately any SB update existing row acquires the txn > > internally in the idl instead through the context. > > > > So that part of SB writes wouldn't be error proof. > > thanks a lot, thats good to know. > > Would we then need a functionality on the ovs side to block the > transaction in some kind of way? > But i currently do not see any way the idl supports that. > Or Indication that there was an attempt to write, either way we will need something from the idl layer itself. > > > Thanks, > Felix > > > > > > > > > > > Thanks a lot, > > > Felix > > > > > > > > Thanks, > > Ales > > > > > > > > > > > > > > > > > > > > Thanks a lot, > > > > > Felix > > > > > > > > > > > > > > > > > Thanks, > > > > > > Ales > > > > > > > > > > > > > > > > > > > Thanks, > > > > Ales > > > > _______________________________________________ > > > > dev mailing list > > > > d...@openvswitch.org > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev