On Fri, Oct 10, 2025 at 2:51 AM Mark Michelson <[email protected]> wrote:
> On Thu, Oct 9, 2025 at 5:25 PM Mark Michelson <[email protected]> wrote: > > > > Hi Ales, > > > Hi Mark, thank you for the review, but I'm probably missing something here. > > I've been thinking through this change, and I think the idea might have > some flaws. Let's consider an arbitrary engine node B that takes engine > node A as input. In this scenario, engine node B writes to the SB DB and > engine node A does not. With this change, we mark engine node B as > SB_WRITE, but we do not do the same with node A. > > > Why would we need to? A state doesn't depend on SB being writable. > > We perform an engine run where recomputation is not allowed because the > SB DB is read only. When engine node A runs, since it does not have > SB_WRITE specified, it is able to recompute its data. However, when engine > node B runs, it cannot recompute its data. > > > > Now, what happens during the next engine run? > > * Let's say engine node A has no new input data. Engine node A's state > will be EN_UNCHANGED. In this case, engine node B sees that engine node A > is unchanged, so engine node B does not run. However, B really should run > because in the previous engine node run, A recomputed. In this case, B's > data is incorrect. > > * Let's say engine node A has new data and can process it incrementally. > Engine node A's state will be EN_UPDATED. > > * Engine node B may also try to incrementally handle A's tracked > data. However, the only tracked data that B will see is from the current > engine run. The data that A recomputed in the previous engine run will be > unknown to B. In this case, B's data is incorrect. > > * Engine node B may not have an input handler for A. In this case, B > will recompute and all will be well. > > * Let's say engine node A has new data and cannot process it > incrementally. In this case, A will recompute, and then B will also likely > recompute. The data integrity is fine in this case. However, A has had to > recompute twice. Before this change, A would only have recomputed on the > second engine run. > > > > In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4 cases, > the data is correct in engine node B, but A has to recompute twice. And in > 1 of the 4 cases, the data is correct in engine node B and we successfully > avoided an unnecessary recomputation of A. The only time this change has a > positive effect is if the SB_WRITE node has no input handlers for its input > nodes, and the input nodes are able to incrementally process their data. > > I've realized that in the scenario I outlined, a couple of those > outcomes can't happen. When the first engine run happens, B will fail > to recompute, and the engine run will be canceled. In ovn-controller, > if an engine run is canceled, then we call > engine_set_force_recompute_immediate(). This means that the next run > will force all nodes to recompute. This means that the second and > third outcomes (both based on A being able to incrementally process > its data) cannot happen. Also, when engine_force_recompute is true, > the state of input nodes is not considered. This means that if A is > unchanged, it does not matter. B will still recompute and will > therefore have all of A's data. So my concerns about data integrity > are not correct. > So none of the outcomes highlighted cannot happen, if B is cancelled we will immediately run force recompute loop that will set everything straight. > > However, the fact that the engine forces all nodes to recompute means > that A will recompute on both engine runs. This means that with the > patch, A will recompute two times across the two engine runs. Without > the patch, A only recomputes once. So I still think the patch is > hurting more than it is helping. > That's the case only in the worst scenario. Keep in mind that this doesn't always happen. So it's like all of the sudden we will do two recmputes for certain nodes back to back. In fact I see the opposite happening. Only 3 nodes in ovn-controller are marked as SB_WRITE any of those nodes might cancel but the rest is free to recompute however they like. We saw that there has been cancels for nodes do not require SB write and this was harming performance a good example are nodes pflow_output and lflow_output. Those are very heavy CPU time consumers, now they are free to recompute even if SB is read only during the current run. > > > > > Another consideration is that the incremental engine in northd hardcodes > the allow_recompute argument to true when it calls engine_run(). Therefore > this change will have no effect on the northd incremental engine. > > ovn-northd doesn't have any consideration for canceled engine runs > since it knows engine runs can't be canceled there. If it were > possible to cancel engine runs in ovn-northd, I imagine it would have > the same logic as ovn-controller. > This is done for consistency, yes northd cannot cancel that doesn't mean we cannot mark the nodes as SB_WRITE, if anything I can see the added value of having it documented what writes into SB. > > > > > For the above reasons, I don't think this patch should be accepted in > its current state. > > > > On Thu, Oct 9, 2025 at 8:47 AM Ales Musil via dev < > [email protected]> wrote: > > > > > > The engine would cancel node recompute if it wasn't allowed, > > > typically the SB is in read-only mode. That would lead to cancel > > > of the node recompute and force recompute the next engine run, which > > > is very wasteful. Add a way to mark the engine nodes as SB write. > > > Only nodes that are marked as SB write will lead to cancel when > > > recompute is not allowed. Other nodes (SB read-only) will be able > > > to proceed with recompute without any issue. > > > > > > To help with correct marking of the nodes there is a new IDL mechinsm > > > that will assert if the txn is marked as read-only but something > > > attempted to write into it. > > > > > > Signed-off-by: Ales Musil <[email protected]> > > > --- > > > controller/ovn-controller.c | 6 +++--- > > > lib/inc-proc-eng.c | 7 ++++++- > > > lib/inc-proc-eng.h | 6 ++++++ > > > northd/inc-proc-northd.c | 40 ++++++++++++++++++------------------- > > > ovs | 2 +- > > > 5 files changed, 36 insertions(+), 25 deletions(-) > > > > > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > > index 6fbf3a227..3cc7ab340 100644 > > > --- a/controller/ovn-controller.c > > > +++ b/controller/ovn-controller.c > > > @@ -6758,7 +6758,7 @@ main(int argc, char *argv[]) > > > ENGINE_NODE(template_vars, CLEAR_TRACKED_DATA); > > > ENGINE_NODE(ct_zones, CLEAR_TRACKED_DATA, IS_VALID); > > > ENGINE_NODE(ovs_interface_shadow, CLEAR_TRACKED_DATA); > > > - ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA); > > > + ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA, SB_WRITE); > > > ENGINE_NODE(non_vif_data); > > > ENGINE_NODE(mff_ovn_geneve); > > > ENGINE_NODE(ofctrl_is_connected); > > > @@ -6779,9 +6779,9 @@ main(int argc, char *argv[]) > > > ENGINE_NODE(acl_id, IS_VALID); > > > ENGINE_NODE(route); > > > ENGINE_NODE(route_table_notify); > > > - ENGINE_NODE(route_exchange); > > > + ENGINE_NODE(route_exchange, SB_WRITE); > > > ENGINE_NODE(route_exchange_status); > > > - ENGINE_NODE(garp_rarp); > > > + ENGINE_NODE(garp_rarp, SB_WRITE); > > > ENGINE_NODE(host_if_monitor); > > > ENGINE_NODE(neighbor); > > > ENGINE_NODE(neighbor_table_notify); > > > diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c > > > index 78187c8fd..bd366cb90 100644 > > > --- a/lib/inc-proc-eng.c > > > +++ b/lib/inc-proc-eng.c > > > @@ -27,6 +27,7 @@ > > > #include "openvswitch/hmap.h" > > > #include "openvswitch/poll-loop.h" > > > #include "openvswitch/vlog.h" > > > +#include "ovsdb-idl.h" > > > #include "inc-proc-eng.h" > > > #include "timeval.h" > > > #include "unixctl.h" > > > @@ -426,7 +427,7 @@ engine_recompute(struct engine_node *node, bool > allowed, > > > reason = xvasprintf(reason_fmt, reason_args); > > > va_end(reason_args); > > > > > > - if (!allowed) { > > > + if (node->sb_write && !allowed) { > > > VLOG_DBG("node: %s, recompute (%s) canceled", node->name, > reason); > > > engine_set_node_state(node, EN_CANCELED, "recompute not > allowed"); > > > goto done; > > > @@ -565,10 +566,14 @@ engine_run(bool recompute_allowed) > > > return; > > > } > > > > > > + struct ovsdb_idl_txn *sb_txn = > engine_get_context()->ovnsb_idl_txn; > > > + > > > engine_run_canceled = false; > > > struct engine_node *node; > > > VECTOR_FOR_EACH (&engine_nodes, node) { > > > + ovsdb_idl_txn_assert_read_only(sb_txn, !node->sb_write); > > > engine_run_node(node, recompute_allowed); > > > + ovsdb_idl_txn_assert_read_only(sb_txn, false); > > > > > > if (node->state == EN_CANCELED) { > > > node->stats.cancel++; > > > diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h > > > index d9174effa..9d39045b0 100644 > > > --- a/lib/inc-proc-eng.h > > > +++ b/lib/inc-proc-eng.h > > > @@ -272,6 +272,9 @@ struct engine_node { > > > > > > /* Engine stats. */ > > > struct engine_stats stats; > > > + > > > + /* Indication if the node writes to SB DB. */ > > > + bool sb_write; > > > }; > > > > > > /* Initialize the data for the engine nodes. It calls each node's > > > @@ -435,6 +438,9 @@ void engine_ovsdb_node_add_index(struct > engine_node *, const char *name, > > > #define COMPUTE_FAIL_INFO(NAME) \ > > > .get_compute_failure_info = en_##NAME##_compute_failure_info, > > > > > > +#define SB_WRITE(NAME) \ > > > + .sb_write = true > > > + > > > #define ENGINE_NODE2(NAME, ARG1) \ > > > ENGINE_NODE_DEF_START(NAME, #NAME) \ > > > ARG1(NAME), \ > > > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c > > > index ff9515be7..a5e4f9a53 100644 > > > --- a/northd/inc-proc-northd.c > > > +++ b/northd/inc-proc-northd.c > > > @@ -153,22 +153,22 @@ enum sb_engine_node { > > > > > > /* Define engine nodes for other nodes. They should be defined as > static to > > > * avoid sparse errors. */ > > > -static ENGINE_NODE(northd, CLEAR_TRACKED_DATA); > > > -static ENGINE_NODE(sync_from_sb); > > > +static ENGINE_NODE(northd, CLEAR_TRACKED_DATA, SB_WRITE); > > > +static ENGINE_NODE(sync_from_sb, SB_WRITE); > > > static ENGINE_NODE(sampling_app); > > > -static ENGINE_NODE(lflow); > > > -static ENGINE_NODE(mac_binding_aging); > > > +static ENGINE_NODE(lflow, SB_WRITE); > > > +static ENGINE_NODE(mac_binding_aging, SB_WRITE); > > > static ENGINE_NODE(mac_binding_aging_waker); > > > static ENGINE_NODE(northd_output); > > > -static ENGINE_NODE(sync_meters); > > > -static ENGINE_NODE(sync_to_sb); > > > -static ENGINE_NODE(sync_to_sb_addr_set); > > > -static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA); > > > -static ENGINE_NODE(fdb_aging); > > > +static ENGINE_NODE(sync_meters, SB_WRITE); > > > +static ENGINE_NODE(sync_to_sb, SB_WRITE); > > > +static ENGINE_NODE(sync_to_sb_addr_set, SB_WRITE); > > > +static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA, SB_WRITE); > > > +static ENGINE_NODE(fdb_aging, SB_WRITE); > > > static ENGINE_NODE(fdb_aging_waker); > > > -static ENGINE_NODE(sync_to_sb_lb); > > > -static ENGINE_NODE(sync_to_sb_pb); > > > -static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA); > > > +static ENGINE_NODE(sync_to_sb_lb, SB_WRITE); > > > +static ENGINE_NODE(sync_to_sb_pb, SB_WRITE); > > > +static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA, SB_WRITE); > > > static ENGINE_NODE(lb_data, CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(lr_nat, CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(lr_stateful, CLEAR_TRACKED_DATA); > > > @@ -176,20 +176,20 @@ static ENGINE_NODE(ls_stateful, > CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(route_policies); > > > static ENGINE_NODE(routes); > > > static ENGINE_NODE(bfd); > > > -static ENGINE_NODE(bfd_sync); > > > -static ENGINE_NODE(ecmp_nexthop); > > > -static ENGINE_NODE(multicast_igmp); > > > -static ENGINE_NODE(acl_id); > > > -static ENGINE_NODE(advertised_route_sync); > > > -static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA); > > > +static ENGINE_NODE(bfd_sync, SB_WRITE); > > > +static ENGINE_NODE(ecmp_nexthop, SB_WRITE); > > > +static ENGINE_NODE(multicast_igmp, SB_WRITE); > > > +static ENGINE_NODE(acl_id, SB_WRITE); > > > +static ENGINE_NODE(advertised_route_sync, SB_WRITE); > > > +static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA, SB_WRITE); > > > static ENGINE_NODE(dynamic_routes); > > > static ENGINE_NODE(group_ecmp_route, CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(datapath_logical_router, CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(datapath_logical_switch, CLEAR_TRACKED_DATA); > > > -static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA); > > > +static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA, SB_WRITE); > > > static ENGINE_NODE(datapath_synced_logical_router, > CLEAR_TRACKED_DATA); > > > static ENGINE_NODE(datapath_synced_logical_switch, > CLEAR_TRACKED_DATA); > > > -static ENGINE_NODE(ic_learned_svc_monitors); > > > +static ENGINE_NODE(ic_learned_svc_monitors, SB_WRITE); > > > > > > void inc_proc_northd_init(struct ovsdb_idl_loop *nb, > > > struct ovsdb_idl_loop *sb) > > > diff --git a/ovs b/ovs > > > index 852f07e52..42d99eb38 160000 > > > --- a/ovs > > > +++ b/ovs > > > @@ -1 +1 @@ > > > -Subproject commit 852f07e5251c6a0c0d5c43dc980d12a4f1bcd370 > > > +Subproject commit 42d99eb38725c6e6c99d558d6130db7ade9ba121 > > > -- > > > 2.51.0 > > > > > > _______________________________________________ > > > dev mailing list > > > [email protected] > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
