On Thu, Oct 9, 2025 at 5:25 PM Mark Michelson <[email protected]> wrote:
>
> Hi Ales,
>
> I've been thinking through this change, and I think the idea might have some 
> flaws. Let's consider an arbitrary engine node B that takes engine node A as 
> input. In this scenario, engine node B writes to the SB DB and engine node A 
> does not. With this change, we mark engine node B as SB_WRITE, but we do not 
> do the same with node A.
>
> We perform an engine run where recomputation is not allowed because the SB DB 
> is read only. When engine node A runs, since it does not have SB_WRITE 
> specified, it is able to recompute its data. However, when engine node B 
> runs, it cannot recompute its data.
>
> Now, what happens during the next engine run?
> * Let's say engine node A has no new input data. Engine node A's state will 
> be EN_UNCHANGED. In this case, engine node B sees that engine node A is 
> unchanged, so engine node B does not run. However, B really should run 
> because in the previous engine node run, A recomputed. In this case, B's data 
> is incorrect.
> * Let's say engine node A has new data and can process it incrementally. 
> Engine node A's state will be EN_UPDATED.
>     * Engine node B may also try to incrementally handle A's tracked data. 
> However, the only tracked data that B will see is from the current engine 
> run. The data that A recomputed in the previous engine run will be unknown to 
> B. In this case, B's data is incorrect.
>     * Engine node B may not have an input handler for A. In this case, B will 
> recompute and all will be well.
> * Let's say engine node A has new data and cannot process it incrementally. 
> In this case, A will recompute, and then B will also likely recompute. The 
> data integrity is fine in this case. However, A has had to recompute twice. 
> Before this change, A would only have recomputed on the second engine run.
>
> In 2 of the 4 cases, B ends up with incorrect data. In 1 of the 4 cases, the 
> data is correct in engine node B, but A has to recompute twice. And in 1 of 
> the 4 cases, the data is correct in engine node B and we successfully avoided 
> an unnecessary recomputation of A. The only time this change has a positive 
> effect is if the SB_WRITE node has no input handlers for its input nodes, and 
> the input nodes are able to incrementally process their data.

I've realized that in the scenario I outlined, a couple of those
outcomes can't happen. When the first engine run happens, B will fail
to recompute, and the engine run will be canceled. In ovn-controller,
if an engine run is canceled, then we call
engine_set_force_recompute_immediate(). This means that the next run
will force all nodes to recompute. This means that the second and
third outcomes (both based on A being able to incrementally process
its data) cannot happen. Also, when engine_force_recompute is true,
the state of input nodes is not considered. This means that if A is
unchanged, it does not matter. B will still recompute and will
therefore have all of A's data. So my concerns about data integrity
are not correct.

However, the fact that the engine forces all nodes to recompute means
that A will recompute on both engine runs. This means that with the
patch, A will recompute two times across the two engine runs. Without
the patch, A only recomputes once. So I still think the patch is
hurting more than it is helping.

>
> Another consideration is that the incremental engine in northd hardcodes the 
> allow_recompute argument to true when it calls engine_run(). Therefore this 
> change will have no effect on the northd incremental engine.

ovn-northd doesn't have any consideration for canceled engine runs
since it knows engine runs can't be canceled there. If it were
possible to cancel engine runs in ovn-northd, I imagine it would have
the same logic as ovn-controller.


>
> For the above reasons, I don't think this patch should be accepted in its 
> current state.
>
> On Thu, Oct 9, 2025 at 8:47 AM Ales Musil via dev <[email protected]> 
> wrote:
> >
> > The engine would cancel node recompute if it wasn't allowed,
> > typically the SB is in read-only mode. That would lead to cancel
> > of the node recompute and force recompute the next engine run, which
> > is very wasteful. Add a way to mark the engine nodes as SB write.
> > Only nodes that are marked as SB write will lead to cancel when
> > recompute is not allowed. Other nodes (SB read-only) will be able
> > to proceed with recompute without any issue.
> >
> > To help with correct marking of the nodes there is a new IDL mechinsm
> > that will assert if the txn is marked as read-only but something
> > attempted to write into it.
> >
> > Signed-off-by: Ales Musil <[email protected]>
> > ---
> >  controller/ovn-controller.c |  6 +++---
> >  lib/inc-proc-eng.c          |  7 ++++++-
> >  lib/inc-proc-eng.h          |  6 ++++++
> >  northd/inc-proc-northd.c    | 40 ++++++++++++++++++-------------------
> >  ovs                         |  2 +-
> >  5 files changed, 36 insertions(+), 25 deletions(-)
> >
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 6fbf3a227..3cc7ab340 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -6758,7 +6758,7 @@ main(int argc, char *argv[])
> >      ENGINE_NODE(template_vars, CLEAR_TRACKED_DATA);
> >      ENGINE_NODE(ct_zones, CLEAR_TRACKED_DATA, IS_VALID);
> >      ENGINE_NODE(ovs_interface_shadow, CLEAR_TRACKED_DATA);
> > -    ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA);
> > +    ENGINE_NODE(runtime_data, CLEAR_TRACKED_DATA, SB_WRITE);
> >      ENGINE_NODE(non_vif_data);
> >      ENGINE_NODE(mff_ovn_geneve);
> >      ENGINE_NODE(ofctrl_is_connected);
> > @@ -6779,9 +6779,9 @@ main(int argc, char *argv[])
> >      ENGINE_NODE(acl_id, IS_VALID);
> >      ENGINE_NODE(route);
> >      ENGINE_NODE(route_table_notify);
> > -    ENGINE_NODE(route_exchange);
> > +    ENGINE_NODE(route_exchange, SB_WRITE);
> >      ENGINE_NODE(route_exchange_status);
> > -    ENGINE_NODE(garp_rarp);
> > +    ENGINE_NODE(garp_rarp, SB_WRITE);
> >      ENGINE_NODE(host_if_monitor);
> >      ENGINE_NODE(neighbor);
> >      ENGINE_NODE(neighbor_table_notify);
> > diff --git a/lib/inc-proc-eng.c b/lib/inc-proc-eng.c
> > index 78187c8fd..bd366cb90 100644
> > --- a/lib/inc-proc-eng.c
> > +++ b/lib/inc-proc-eng.c
> > @@ -27,6 +27,7 @@
> >  #include "openvswitch/hmap.h"
> >  #include "openvswitch/poll-loop.h"
> >  #include "openvswitch/vlog.h"
> > +#include "ovsdb-idl.h"
> >  #include "inc-proc-eng.h"
> >  #include "timeval.h"
> >  #include "unixctl.h"
> > @@ -426,7 +427,7 @@ engine_recompute(struct engine_node *node, bool allowed,
> >      reason = xvasprintf(reason_fmt, reason_args);
> >      va_end(reason_args);
> >
> > -    if (!allowed) {
> > +    if (node->sb_write && !allowed) {
> >          VLOG_DBG("node: %s, recompute (%s) canceled", node->name, reason);
> >          engine_set_node_state(node, EN_CANCELED, "recompute not allowed");
> >          goto done;
> > @@ -565,10 +566,14 @@ engine_run(bool recompute_allowed)
> >          return;
> >      }
> >
> > +    struct ovsdb_idl_txn *sb_txn = engine_get_context()->ovnsb_idl_txn;
> > +
> >      engine_run_canceled = false;
> >      struct engine_node *node;
> >      VECTOR_FOR_EACH (&engine_nodes, node) {
> > +        ovsdb_idl_txn_assert_read_only(sb_txn, !node->sb_write);
> >          engine_run_node(node, recompute_allowed);
> > +        ovsdb_idl_txn_assert_read_only(sb_txn, false);
> >
> >          if (node->state == EN_CANCELED) {
> >              node->stats.cancel++;
> > diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> > index d9174effa..9d39045b0 100644
> > --- a/lib/inc-proc-eng.h
> > +++ b/lib/inc-proc-eng.h
> > @@ -272,6 +272,9 @@ struct engine_node {
> >
> >      /* Engine stats. */
> >      struct engine_stats stats;
> > +
> > +    /* Indication if the node writes to SB DB. */
> > +    bool sb_write;
> >  };
> >
> >  /* Initialize the data for the engine nodes. It calls each node's
> > @@ -435,6 +438,9 @@ void engine_ovsdb_node_add_index(struct engine_node *, 
> > const char *name,
> >  #define COMPUTE_FAIL_INFO(NAME) \
> >          .get_compute_failure_info = en_##NAME##_compute_failure_info,
> >
> > +#define SB_WRITE(NAME) \
> > +    .sb_write = true
> > +
> >  #define ENGINE_NODE2(NAME, ARG1) \
> >      ENGINE_NODE_DEF_START(NAME, #NAME) \
> >      ARG1(NAME), \
> > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
> > index ff9515be7..a5e4f9a53 100644
> > --- a/northd/inc-proc-northd.c
> > +++ b/northd/inc-proc-northd.c
> > @@ -153,22 +153,22 @@ enum sb_engine_node {
> >
> >  /* Define engine nodes for other nodes. They should be defined as static to
> >   * avoid sparse errors. */
> > -static ENGINE_NODE(northd, CLEAR_TRACKED_DATA);
> > -static ENGINE_NODE(sync_from_sb);
> > +static ENGINE_NODE(northd, CLEAR_TRACKED_DATA, SB_WRITE);
> > +static ENGINE_NODE(sync_from_sb, SB_WRITE);
> >  static ENGINE_NODE(sampling_app);
> > -static ENGINE_NODE(lflow);
> > -static ENGINE_NODE(mac_binding_aging);
> > +static ENGINE_NODE(lflow, SB_WRITE);
> > +static ENGINE_NODE(mac_binding_aging, SB_WRITE);
> >  static ENGINE_NODE(mac_binding_aging_waker);
> >  static ENGINE_NODE(northd_output);
> > -static ENGINE_NODE(sync_meters);
> > -static ENGINE_NODE(sync_to_sb);
> > -static ENGINE_NODE(sync_to_sb_addr_set);
> > -static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA);
> > -static ENGINE_NODE(fdb_aging);
> > +static ENGINE_NODE(sync_meters, SB_WRITE);
> > +static ENGINE_NODE(sync_to_sb, SB_WRITE);
> > +static ENGINE_NODE(sync_to_sb_addr_set, SB_WRITE);
> > +static ENGINE_NODE(port_group, CLEAR_TRACKED_DATA, SB_WRITE);
> > +static ENGINE_NODE(fdb_aging, SB_WRITE);
> >  static ENGINE_NODE(fdb_aging_waker);
> > -static ENGINE_NODE(sync_to_sb_lb);
> > -static ENGINE_NODE(sync_to_sb_pb);
> > -static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA);
> > +static ENGINE_NODE(sync_to_sb_lb, SB_WRITE);
> > +static ENGINE_NODE(sync_to_sb_pb, SB_WRITE);
> > +static ENGINE_NODE(global_config, CLEAR_TRACKED_DATA, SB_WRITE);
> >  static ENGINE_NODE(lb_data, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(lr_nat, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(lr_stateful, CLEAR_TRACKED_DATA);
> > @@ -176,20 +176,20 @@ static ENGINE_NODE(ls_stateful, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(route_policies);
> >  static ENGINE_NODE(routes);
> >  static ENGINE_NODE(bfd);
> > -static ENGINE_NODE(bfd_sync);
> > -static ENGINE_NODE(ecmp_nexthop);
> > -static ENGINE_NODE(multicast_igmp);
> > -static ENGINE_NODE(acl_id);
> > -static ENGINE_NODE(advertised_route_sync);
> > -static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA);
> > +static ENGINE_NODE(bfd_sync, SB_WRITE);
> > +static ENGINE_NODE(ecmp_nexthop, SB_WRITE);
> > +static ENGINE_NODE(multicast_igmp, SB_WRITE);
> > +static ENGINE_NODE(acl_id, SB_WRITE);
> > +static ENGINE_NODE(advertised_route_sync, SB_WRITE);
> > +static ENGINE_NODE(learned_route_sync, CLEAR_TRACKED_DATA, SB_WRITE);
> >  static ENGINE_NODE(dynamic_routes);
> >  static ENGINE_NODE(group_ecmp_route, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(datapath_logical_router, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(datapath_logical_switch, CLEAR_TRACKED_DATA);
> > -static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA);
> > +static ENGINE_NODE(datapath_sync, CLEAR_TRACKED_DATA, SB_WRITE);
> >  static ENGINE_NODE(datapath_synced_logical_router, CLEAR_TRACKED_DATA);
> >  static ENGINE_NODE(datapath_synced_logical_switch, CLEAR_TRACKED_DATA);
> > -static ENGINE_NODE(ic_learned_svc_monitors);
> > +static ENGINE_NODE(ic_learned_svc_monitors, SB_WRITE);
> >
> >  void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
> >                            struct ovsdb_idl_loop *sb)
> > diff --git a/ovs b/ovs
> > index 852f07e52..42d99eb38 160000
> > --- a/ovs
> > +++ b/ovs
> > @@ -1 +1 @@
> > -Subproject commit 852f07e5251c6a0c0d5c43dc980d12a4f1bcd370
> > +Subproject commit 42d99eb38725c6e6c99d558d6130db7ade9ba121
> > --
> > 2.51.0
> >
> > _______________________________________________
> > dev mailing list
> > [email protected]
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to