Hi Dumitru.

(patch as attachment) This updates the following

* Schema version 21.8.1 -> 21.9.0
* Added NEWS section
* E2E tests added
    * nb_cfg_timestamp is correctly set from on hv with 
options:enable_chassis_nb_cfg_update set to true. 

    * nb_cfg_timestamp is correctly set from on hv with 
options:enable_chassis_nb_cfg_update set to false. 



On Thursday, 11 June 2026 at 15:43, Dumitru Ceara <[email protected]> wrote:

> On 6/8/26 4:42 PM, [email protected] wrote:
> > From: Loke Berne <[email protected]>
> >
> > Large scale OVN deployments commonly disable the per-chassis nb_cfg
> > write-back mechanism by setting options:enable_chassis_nb_cfg_update
> > to false.  With thousands of hypervisors each writing their nb_cfg
> > completion back to Chassis_Private on every generation, the resulting
> > write amplification can overload the southbound OVSDB cluster.
> > Disabling write-back eliminates this pressure but also removes the
> > only existing signal for measuring how long a northbound change takes
> > to reach each hypervisor.
> >
> > OVN_Northbound already records nb_cfg_timestamp in NB_Global when
> > ovn-northd advances nb_cfg, but hypervisors connect to the southbound
> > database only.  This patch adds the same timestamp to SB_Global,
> > written atomically with each nb_cfg update.  ovn-controller reads
> > this value and stores it in the local OVS bridge external_ids as
> > ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
> > completion time).  An external collector such as ovs_exporter can
> > read both values from the bridge and compute per-chassis propagation
> > latency histograms without any writes to the southbound database,
> > keeping measurement overhead independent of fleet size.
> >
> > Placing the timestamp in SB_Global rather than requiring collectors
> > to reach the northbound database means it travels transparently
> > through any relay or VPN between the southbound cluster and the
> > hypervisor, naturally including that transit in the measurement.
> >
> > Testing: confirmed in OVN sandbox and a two-container central/HV
> > setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
> > advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
> > continues to update correctly when enable_chassis_nb_cfg_update is
> > set to false.
> >
> > Signed-off-by: Loke Berne <[email protected]>
> > Assisted-by: Claude Sonnet 4.6
> > Submitted-at: https://github.com/ovn-org/ovn/pull/306
> > Signed-off-by: Numan Siddique <[email protected]>
> > ---
>
> Hi Loke,
>
> Thanks for the patch!
>
> On top of Ilya's comment about the schema version, I have some of my own
> too.
>
> Let me know if you need help with getting v2 posted for review.  If
> needed I can also do that for you if you point me to your dev branch or
> PR once you have it.
>
> >  br-controller/ovn-br-controller.c |  2 +-
> >  controller/if-status.c            |  2 +-
> >  controller/ovn-controller.c       | 83 +++++++++++++++++++++++++------
> >  lib/ofctrl-seqno.c                | 21 +++++++-
> >  lib/ofctrl-seqno.h                | 11 +++-
> >  lib/test-ofctrl-seqno.c           |  2 +-
> >  northd/ovn-northd.c               |  3 ++
> >  ovn-sb.ovsschema                  |  5 +-
> >  ovn-sb.xml                        |  9 ++++
>
> I think we need to add a NEWS entry item for this user visible change.
>
> Also, would it be possible to extend (or add a new) the "nb_cfg
> timestamp" test in our testsuite to cover this new feture?
>
> https://github.com/ovn-org/ovn/blob/main/tests/ovn.at#L31397
>
> >  9 files changed, 115 insertions(+), 23 deletions(-)
> >
> > diff --git a/br-controller/ovn-br-controller.c 
> > b/br-controller/ovn-br-controller.c
> > index 93526a2f6d..e20a110513 100644
> > --- a/br-controller/ovn-br-controller.c
> > +++ b/br-controller/ovn-br-controller.c
> > @@ -299,7 +299,7 @@ main(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
> >              ofctrl_seqno_update_create(
> >                  ofctrl_seq_type_br_cfg,
> >                  
> > get_ovnbr_cfg(ovnbrrec_br_global_table_get(ovnbr_idl_loop.idl),
> > -                              ovnbr_cond_seqno, 
> > ovnbr_expected_cond_seqno));
> > +                              ovnbr_cond_seqno, 
> > ovnbr_expected_cond_seqno), 0);
> >
> >              br_ofctrls_put(ofctrl_seqno_get_req_cfg(),
> >                             engine_node_changed(&en_lflow_output),
> > diff --git a/controller/if-status.c b/controller/if-status.c
> > index 6c6e9b27b1..65e745e250 100644
> > --- a/controller/if-status.c
> > +++ b/controller/if-status.c
> > @@ -712,7 +712,7 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> >      if (new_ifaces) {
> >          mgr->iface_seqno++;
> >          ofctrl_seqno_update_create(mgr->iface_seq_type_pb_cfg,
> > -                                   mgr->iface_seqno);
> > +                                   mgr->iface_seqno, 0);
> >          VLOG_DBG("Seqno requested: %"PRIu32, mgr->iface_seqno);
> >      }
> >  }
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index ad094a4543..765948442c 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -141,6 +141,7 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
> >
> >  #define OVS_NB_CFG_NAME "ovn-nb-cfg"
> >  #define OVS_NB_CFG_TS_NAME "ovn-nb-cfg-ts"
> > +#define OVS_NB_CFG_SB_TS_NAME "ovn-nb-cfg-sb-ts"
>
> We should probably document this in ovn-controller.8.xml like we do with
> the others.
>
> >  #define OVS_STARTUP_TS_NAME "ovn-startup-ts"
> >
> >  struct br_int_remote {
> > @@ -825,28 +826,62 @@ struct ed_type_ct_zones {
> >  };
> >
> >
> > +/* Returns the current SB_Global.nb_cfg and, if 'ts_out' is non-NULL, also
> > + * the matching SB_Global.nb_cfg_timestamp.  The pair is always read from
> > + * the same SB_Global snapshot so callers can rely on (nb_cfg, ts) being
> > + * consistent.
> > + *
> > + * 'nb_cfg_timestamp' is the wall-clock time northd wrote nb_cfg to SB.
> > + * The delta between that and the local completion time is the per-chassis
> > + * end-to-end propagation latency (northd compile + SB write + relay
> > + * fan-out + ovn-controller engine + ofctrl barrier ack).
> > + *
> > + * If a monitor condition change is in flight the cached pair from the
> > + * previous call is returned, because updates received between the request
> > + * and the cond ack could be from before the SB_Global value we're trying
> > + * to read.
> > + */
> >  static uint64_t
> >  get_nb_cfg(const struct sbrec_sb_global_table *sb_global_table,
> > -           unsigned int cond_seqno, unsigned int expected_cond_seqno)
> > +           unsigned int cond_seqno, unsigned int expected_cond_seqno,
> > +           int64_t *ts_out)
>
> Should we return a struct here instead of returning half of the output
> through the ts_out arg?
>
> >  {
> >      static uint64_t nb_cfg = 0;
> > +    static int64_t nb_cfg_ts = 0;
>
> Let's store nb_cfg_ts as uint64_t too?  Deep down, in the seqno
> structures we store unsigned ints anyway.
>
> >
> > -    /* Delay getting nb_cfg if there are monitor condition changes
> > -     * in flight.  It might be that those changes would instruct the
> > -     * server to send updates that happened before SB_Global.nb_cfg.
> > -     */
> > -    if (cond_seqno != expected_cond_seqno) {
> > -        return nb_cfg;
> > +    if (cond_seqno == expected_cond_seqno) {
> > +        const struct sbrec_sb_global *sb
> > +          = sbrec_sb_global_table_first(sb_global_table);
>
> Nit: indentation is off here.
>
> > +        nb_cfg = sb ? sb->nb_cfg : 0;
> > +        nb_cfg_ts = sb ? sb->nb_cfg_timestamp : 0;
> > +    }
> > +
> > +    if (ts_out) {
> > +        *ts_out = nb_cfg_ts;
> >      }
> >
> > -    const struct sbrec_sb_global *sb
> > -        = sbrec_sb_global_table_first(sb_global_table);
> > -    nb_cfg = sb ? sb->nb_cfg : 0;
> >      return nb_cfg;
> >  }
> >
> >  /* Propagates the local cfg seqno, 'cur_cfg', to the chassis_private record
> >   * and to the local OVS DB.
> > + *
> > + * The br-int external_ids triplet (ovn-nb-cfg, ovn-nb-cfg-ts,
> > + * ovn-nb-cfg-sb-ts) is stamped unconditionally, independent of
> > + * 'enable_ch_nb_cfg_update'.  The SB chassis_private writeback remains
> > + * gated by 'enable_ch_nb_cfg_update' for deployments that want to suppress
> > + * the per-bump SB write load.  An external exporter watching br-int can
> > + * compute the per-chassis propagation delta as
> > + *     (ovn-nb-cfg-ts - ovn-nb-cfg-sb-ts)
> > + * regardless of the writeback setting.
> > + *
> > + * 'ovn-nb-cfg-sb-ts' is the SB_Global.nb_cfg_timestamp that was paired
> > + * with this cur_cfg at the moment the barrier was queued (snapshotted via
> > + * ofctrl-seqno's req_ts).  This avoids the pitfall of pairing the just-
> > + * acked cur_cfg with whatever SB_Global timestamp happens to be current
> > + * now -- on a fast-churning fleet SB_Global may already have advanced
> > + * past cur_cfg by the time the barrier acks, which would under-report
> > + * the delta.
> >   */
> >  static void
> >  store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
> > @@ -858,6 +893,7 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct 
> > ovsdb_idl_txn *ovs_txn,
> >      struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
> >          ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
> >      uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
> > +    uint64_t cur_cfg_sb_ts = acked_nb_cfg_seqnos->last_acked_req_ts;
> >      int64_t startup_ts = daemon_startup_ts();
> >
> >      if (ovs_txn && br_int
> > @@ -894,6 +930,13 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct 
> > ovsdb_idl_txn *ovs_txn,
> >                                                   cur_cfg_str);
> >          ovsrec_bridge_update_external_ids_setkey(br_int, 
> > OVS_NB_CFG_TS_NAME,
> >                                                   cur_cfg_ts_str);
> > +        if (cur_cfg_sb_ts) {
>
> Why do we skip the 0 case?  Should we clear if cur_cfg_sb_ts is 0 instead?
>
> > +            char *sb_ts_str = xasprintf("%"PRIu64, cur_cfg_sb_ts);
> > +            ovsrec_bridge_update_external_ids_setkey(br_int,
> > +                                                     OVS_NB_CFG_SB_TS_NAME,
> > +                                                     sb_ts_str);
> > +            free(sb_ts_str);
> > +        }
> >          free(cur_cfg_ts_str);
> >          free(cur_cfg_str);
> >      }
> > @@ -8200,12 +8243,20 @@ main(int argc, char *argv[])
> >                                       chassis, mac_cache_data);
> >                      }
> >
> > -                    ofctrl_seqno_update_create(
> > -                        ofctrl_seq_type_nb_cfg,
> > -                        get_nb_cfg(sbrec_sb_global_table_get(
> > -                                                       ovnsb_idl_loop.idl),
> > -                                              ovnsb_cond_seqno,
> > -                                              ovnsb_expected_cond_seqno));
> > +                    /* Snapshot (nb_cfg, sb_ts) atomically from SB_Global
> > +                     * and pair them through the barrier ack so the
> > +                     * eventual completion can be attributed to the
> > +                     * timestamp that corresponded to this exact nb_cfg
> > +                     * generation -- not whatever SB_Global value has
> > +                     * moved on to by the time the barrier acks. */
> > +                    int64_t sb_nb_cfg_ts = 0;
> > +                    uint64_t sb_nb_cfg = get_nb_cfg(
> > +                        sbrec_sb_global_table_get(ovnsb_idl_loop.idl),
> > +                        ovnsb_cond_seqno, ovnsb_expected_cond_seqno,
> > +                        &sb_nb_cfg_ts);
> > +                    ofctrl_seqno_update_create(ofctrl_seq_type_nb_cfg,
> > +                                               sb_nb_cfg,
> > +                                               (uint64_t) sb_nb_cfg_ts);
>
> Is this cast really needed?
>
> >
> >                      struct local_binding_data *binding_data =
> >                          runtime_data ? &runtime_data->lbinding_data : NULL;
> > diff --git a/lib/ofctrl-seqno.c b/lib/ofctrl-seqno.c
> > index 83c17c0e52..7c613dc0a4 100644
> > --- a/lib/ofctrl-seqno.c
> > +++ b/lib/ofctrl-seqno.c
> > @@ -36,6 +36,11 @@ struct ofctrl_seqno_update {
> >                                  * application.
> >                                  */
> >      uint64_t req_cfg;          /* Application specific seqno. */
> > +    uint64_t req_ts;           /* Opaque per-request timestamp captured by
> > +                                * the caller at the moment the update was
> > +                                * queued.  Carried through to the acked
> > +                                * state so consumers can pair the acked
> > +                                * seqno with the input that produced it. */
> >  };
> >
> >  /* List of in flight sequence number updates. */
> > @@ -51,6 +56,9 @@ struct ofctrl_seqno_state {
> >                                   * application consumed acked requests.
> >                                   */
> >      uint64_t cur_cfg;           /* Last acked application seqno. */
> > +    uint64_t cur_cfg_req_ts;    /* req_ts that was paired with cur_cfg when
> > +                                 * the update was queued.  0 if the caller
> > +                                 * didn't supply a timestamp. */
> >      uint64_t req_cfg;           /* Last requested application seqno. */
> >  };
> >
> > @@ -73,6 +81,7 @@ ofctrl_acked_seqnos_get(size_t seqno_type)
> >      struct ofctrl_acked_seqnos *acked_seqnos = xmalloc(sizeof 
> > *acked_seqnos);
> >      acked_seqnos->acked = vector_clone(&state->acked_cfgs);
> >      acked_seqnos->last_acked = state->cur_cfg;
> > +    acked_seqnos->last_acked_req_ts = state->cur_cfg_req_ts;
> >
> >      vector_clear(&state->acked_cfgs);
> >      if (vector_capacity(&state->acked_cfgs) >= VECTOR_THRESHOLD) {
> > @@ -140,6 +149,7 @@ ofctrl_seqno_add_type(void)
> >      struct ofctrl_seqno_state state = (struct ofctrl_seqno_state) {
> >          .acked_cfgs = VECTOR_EMPTY_INITIALIZER(uint64_t),
> >          .cur_cfg = 0,
> > +        .cur_cfg_req_ts = 0,
> >          .req_cfg = 0,
> >      };
> >      vector_push(&ofctrl_seqno_states, &state);
> > @@ -149,9 +159,16 @@ ofctrl_seqno_add_type(void)
> >
> >  /* Creates a new seqno update request for an application specific
> >   * 'seqno_type'.
> > + *
> > + * 'req_ts' is an opaque per-request timestamp captured by the caller (for
> > + * example, the SB_Global nb_cfg_timestamp at the moment we read 
> > 'new_cfg').
> > + * It is carried unchanged to ofctrl_acked_seqnos_get() so consumers can
> > + * pair the eventual ack with the input state that produced it.  Callers
> > + * that don't need this pairing should pass 0.
> >   */
> >  void
> > -ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg)
> > +ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg,
> > +                           uint64_t req_ts)
>
> Maybe we should add a ofctrl_stamped_seqno_update_create() wrapper?  We
> call ofctrl_seqno_update_create() with a non-zero req_ts only in one
> place in our code base.
>
> >  {
> >      struct ofctrl_seqno_state *state = ofctrl_seqno_state_get(seqno_type);
> >
> > @@ -169,6 +186,7 @@ ofctrl_seqno_update_create(size_t seqno_type, uint64_t 
> > new_cfg)
> >          .seqno_type = seqno_type,
> >          .flow_cfg = ofctrl_req_seqno,
> >          .req_cfg = new_cfg,
> > +        .req_ts = req_ts,
> >      };
> >      vector_push(&ofctrl_seqno_updates, &update);
> >  }
> > @@ -190,6 +208,7 @@ ofctrl_seqno_run(uint64_t flow_cfg)
> >          struct ofctrl_seqno_state *state =
> >              ofctrl_seqno_state_get(update->seqno_type);
> >          state->cur_cfg = update->req_cfg;
> > +        state->cur_cfg_req_ts = update->req_ts;
> >          vector_push(&state->acked_cfgs, &update->req_cfg);
> >
> >          index++;
> > diff --git a/lib/ofctrl-seqno.h b/lib/ofctrl-seqno.h
> > index faa97cc535..66b666fe4d 100644
> > --- a/lib/ofctrl-seqno.h
> > +++ b/lib/ofctrl-seqno.h
> > @@ -23,10 +23,18 @@
> >
> >  /* Collection of acked ofctrl_seqno_update requests and the most recent
> >   * 'last_acked' value.
> > + *
> > + * 'last_acked_req_ts' carries the opaque timestamp that was associated 
> > with
> > + * 'last_acked' at the time the seqno was requested.  Consumers that don't
> > + * pass a timestamp at create time will see 0 here.  The timestamp is
> > + * preserved across the barrier so that applications can pair the acked
> > + * config seqno with the input state that produced it (e.g. SB_Global
> > + * nb_cfg_timestamp at the moment we asked OVS to barrier).
> >   */
> >  struct ofctrl_acked_seqnos {
> >      struct vector acked;
> >      uint64_t last_acked;
> > +    uint64_t last_acked_req_ts;
> >  };
> >
> >  struct ofctrl_acked_seqnos *ofctrl_acked_seqnos_get(size_t seqno_type);
> > @@ -35,7 +43,8 @@ bool ofctrl_acked_seqnos_contains(const struct 
> > ofctrl_acked_seqnos *seqnos,
> >                                    uint64_t val);
> >
> >  size_t ofctrl_seqno_add_type(void);
> > -void ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg);
> > +void ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg,
> > +                                uint64_t req_ts);
> >  void ofctrl_seqno_run(uint64_t flow_cfg);
> >  uint64_t ofctrl_seqno_get_req_cfg(void);
> >  void ofctrl_seqno_flush(void);
> > diff --git a/lib/test-ofctrl-seqno.c b/lib/test-ofctrl-seqno.c
> > index 7d478c033e..b31a50ecdc 100644
> > --- a/lib/test-ofctrl-seqno.c
> > +++ b/lib/test-ofctrl-seqno.c
> > @@ -123,7 +123,7 @@ test_ofctrl_seqno_ack_seqnos(struct ovs_cmdl_context 
> > *ctx)
> >                                          &app_seqno)) {
> >                  return;
> >              }
> > -            ofctrl_seqno_update_create(i, app_seqno);
> > +            ofctrl_seqno_update_create(i, app_seqno, 0);
> >          }
> >      }
> >      printf("ofctrl-seqno-req-cfg: %u\n", n_reqs);
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index c3c198f2f3..0eaef6de44 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -541,6 +541,7 @@ update_sequence_numbers(int64_t loop_start_time,
> >       * Also set up to update sb_cfg once our southbound transaction 
> > commits. */
> >      if (nb->nb_cfg != sb->nb_cfg) {
> >          sbrec_sb_global_set_nb_cfg(sb, nb->nb_cfg);
> > +        sbrec_sb_global_set_nb_cfg_timestamp(sb, loop_start_time);
> >          nbrec_nb_global_set_nb_cfg_timestamp(nb, loop_start_time);
> >      }
> >      sb_loop->next_cfg = nb->nb_cfg;
> > @@ -944,6 +945,8 @@ main(int argc, char *argv[])
> >
> >      /* Disable alerting for pure write-only columns. */
> >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_sb_global_col_nb_cfg);
> > +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> > +                         &sbrec_sb_global_col_nb_cfg_timestamp);
> >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_address_set_col_name);
> >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, 
> > &sbrec_address_set_col_addresses);
> >      for (size_t i = 0; i < SBREC_LOGICAL_FLOW_N_COLUMNS; i++) {
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index d9a91739cc..afe691a0d3 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,11 +1,12 @@
> >  {
> >      "name": "OVN_Southbound",
> > -    "version": "21.8.0",
> > -    "cksum": "614397313 36713",
> > +    "version": "21.8.1",
> > +    "cksum": "3241242866 36779",
> >      "tables": {
> >          "SB_Global": {
> >              "columns": {
> >                  "nb_cfg": {"type": {"key": "integer"}},
> > +                "nb_cfg_timestamp": {"type": {"key": "integer"}},
> >                  "external_ids": {
> >                      "type": {"key": "string", "value": "string",
> >                               "min": 0, "max": "unlimited"}},
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index e45b63d73f..c1883db2e3 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -156,6 +156,15 @@
> >          the southbound database to bring it up to date with these changes, 
> > it
> >          updates this column to the same value.
> >        </column>
> > +
> > +      <column name="nb_cfg_timestamp">
> > +        The time at which <code>ovn-northd</code> last wrote
> > +        <ref column="nb_cfg"/> to the southbound database, in milliseconds
> > +        since the Unix epoch.  Set atomically with each update to
> > +        <ref column="nb_cfg"/>.  Hypervisors read this value to measure
> > +        end-to-end propagation latency from northbound commit to local
> > +        datapath programming completion.
> > +      </column>
> >      </group>
> >
> >      <group title="Common Columns">
>
> Regards,
> Dumitru
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to