Hi Dumitru.
(patch as attachment) This updates the following
* Schema version 21.8.1 -> 21.9.0
* Added NEWS section
* E2E tests added
* nb_cfg_timestamp is correctly set from on hv with
options:enable_chassis_nb_cfg_update set to true.
* nb_cfg_timestamp is correctly set from on hv with
options:enable_chassis_nb_cfg_update set to false.
On Thursday, 11 June 2026 at 15:43, Dumitru Ceara <[email protected]> wrote:
> On 6/8/26 4:42 PM, [email protected] wrote:
> > From: Loke Berne <[email protected]>
> >
> > Large scale OVN deployments commonly disable the per-chassis nb_cfg
> > write-back mechanism by setting options:enable_chassis_nb_cfg_update
> > to false. With thousands of hypervisors each writing their nb_cfg
> > completion back to Chassis_Private on every generation, the resulting
> > write amplification can overload the southbound OVSDB cluster.
> > Disabling write-back eliminates this pressure but also removes the
> > only existing signal for measuring how long a northbound change takes
> > to reach each hypervisor.
> >
> > OVN_Northbound already records nb_cfg_timestamp in NB_Global when
> > ovn-northd advances nb_cfg, but hypervisors connect to the southbound
> > database only. This patch adds the same timestamp to SB_Global,
> > written atomically with each nb_cfg update. ovn-controller reads
> > this value and stores it in the local OVS bridge external_ids as
> > ovn-nb-cfg-sb-ts alongside the existing ovn-nb-cfg-ts (local
> > completion time). An external collector such as ovs_exporter can
> > read both values from the bridge and compute per-chassis propagation
> > latency histograms without any writes to the southbound database,
> > keeping measurement overhead independent of fleet size.
> >
> > Placing the timestamp in SB_Global rather than requiring collectors
> > to reach the northbound database means it travels transparently
> > through any relay or VPN between the southbound cluster and the
> > hypervisor, naturally including that transit in the measurement.
> >
> > Testing: confirmed in OVN sandbox and a two-container central/HV
> > setup that nb_cfg_timestamp is written to SB_Global on each nb_cfg
> > advance, propagated to br-int external_ids as ovn-nb-cfg-sb-ts, and
> > continues to update correctly when enable_chassis_nb_cfg_update is
> > set to false.
> >
> > Signed-off-by: Loke Berne <[email protected]>
> > Assisted-by: Claude Sonnet 4.6
> > Submitted-at: https://github.com/ovn-org/ovn/pull/306
> > Signed-off-by: Numan Siddique <[email protected]>
> > ---
>
> Hi Loke,
>
> Thanks for the patch!
>
> On top of Ilya's comment about the schema version, I have some of my own
> too.
>
> Let me know if you need help with getting v2 posted for review. If
> needed I can also do that for you if you point me to your dev branch or
> PR once you have it.
>
> > br-controller/ovn-br-controller.c | 2 +-
> > controller/if-status.c | 2 +-
> > controller/ovn-controller.c | 83 +++++++++++++++++++++++++------
> > lib/ofctrl-seqno.c | 21 +++++++-
> > lib/ofctrl-seqno.h | 11 +++-
> > lib/test-ofctrl-seqno.c | 2 +-
> > northd/ovn-northd.c | 3 ++
> > ovn-sb.ovsschema | 5 +-
> > ovn-sb.xml | 9 ++++
>
> I think we need to add a NEWS entry item for this user visible change.
>
> Also, would it be possible to extend (or add a new) the "nb_cfg
> timestamp" test in our testsuite to cover this new feture?
>
> https://github.com/ovn-org/ovn/blob/main/tests/ovn.at#L31397
>
> > 9 files changed, 115 insertions(+), 23 deletions(-)
> >
> > diff --git a/br-controller/ovn-br-controller.c
> > b/br-controller/ovn-br-controller.c
> > index 93526a2f6d..e20a110513 100644
> > --- a/br-controller/ovn-br-controller.c
> > +++ b/br-controller/ovn-br-controller.c
> > @@ -299,7 +299,7 @@ main(int argc OVS_UNUSED, char *argv[] OVS_UNUSED)
> > ofctrl_seqno_update_create(
> > ofctrl_seq_type_br_cfg,
> >
> > get_ovnbr_cfg(ovnbrrec_br_global_table_get(ovnbr_idl_loop.idl),
> > - ovnbr_cond_seqno,
> > ovnbr_expected_cond_seqno));
> > + ovnbr_cond_seqno,
> > ovnbr_expected_cond_seqno), 0);
> >
> > br_ofctrls_put(ofctrl_seqno_get_req_cfg(),
> > engine_node_changed(&en_lflow_output),
> > diff --git a/controller/if-status.c b/controller/if-status.c
> > index 6c6e9b27b1..65e745e250 100644
> > --- a/controller/if-status.c
> > +++ b/controller/if-status.c
> > @@ -712,7 +712,7 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> > if (new_ifaces) {
> > mgr->iface_seqno++;
> > ofctrl_seqno_update_create(mgr->iface_seq_type_pb_cfg,
> > - mgr->iface_seqno);
> > + mgr->iface_seqno, 0);
> > VLOG_DBG("Seqno requested: %"PRIu32, mgr->iface_seqno);
> > }
> > }
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index ad094a4543..765948442c 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -141,6 +141,7 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
> >
> > #define OVS_NB_CFG_NAME "ovn-nb-cfg"
> > #define OVS_NB_CFG_TS_NAME "ovn-nb-cfg-ts"
> > +#define OVS_NB_CFG_SB_TS_NAME "ovn-nb-cfg-sb-ts"
>
> We should probably document this in ovn-controller.8.xml like we do with
> the others.
>
> > #define OVS_STARTUP_TS_NAME "ovn-startup-ts"
> >
> > struct br_int_remote {
> > @@ -825,28 +826,62 @@ struct ed_type_ct_zones {
> > };
> >
> >
> > +/* Returns the current SB_Global.nb_cfg and, if 'ts_out' is non-NULL, also
> > + * the matching SB_Global.nb_cfg_timestamp. The pair is always read from
> > + * the same SB_Global snapshot so callers can rely on (nb_cfg, ts) being
> > + * consistent.
> > + *
> > + * 'nb_cfg_timestamp' is the wall-clock time northd wrote nb_cfg to SB.
> > + * The delta between that and the local completion time is the per-chassis
> > + * end-to-end propagation latency (northd compile + SB write + relay
> > + * fan-out + ovn-controller engine + ofctrl barrier ack).
> > + *
> > + * If a monitor condition change is in flight the cached pair from the
> > + * previous call is returned, because updates received between the request
> > + * and the cond ack could be from before the SB_Global value we're trying
> > + * to read.
> > + */
> > static uint64_t
> > get_nb_cfg(const struct sbrec_sb_global_table *sb_global_table,
> > - unsigned int cond_seqno, unsigned int expected_cond_seqno)
> > + unsigned int cond_seqno, unsigned int expected_cond_seqno,
> > + int64_t *ts_out)
>
> Should we return a struct here instead of returning half of the output
> through the ts_out arg?
>
> > {
> > static uint64_t nb_cfg = 0;
> > + static int64_t nb_cfg_ts = 0;
>
> Let's store nb_cfg_ts as uint64_t too? Deep down, in the seqno
> structures we store unsigned ints anyway.
>
> >
> > - /* Delay getting nb_cfg if there are monitor condition changes
> > - * in flight. It might be that those changes would instruct the
> > - * server to send updates that happened before SB_Global.nb_cfg.
> > - */
> > - if (cond_seqno != expected_cond_seqno) {
> > - return nb_cfg;
> > + if (cond_seqno == expected_cond_seqno) {
> > + const struct sbrec_sb_global *sb
> > + = sbrec_sb_global_table_first(sb_global_table);
>
> Nit: indentation is off here.
>
> > + nb_cfg = sb ? sb->nb_cfg : 0;
> > + nb_cfg_ts = sb ? sb->nb_cfg_timestamp : 0;
> > + }
> > +
> > + if (ts_out) {
> > + *ts_out = nb_cfg_ts;
> > }
> >
> > - const struct sbrec_sb_global *sb
> > - = sbrec_sb_global_table_first(sb_global_table);
> > - nb_cfg = sb ? sb->nb_cfg : 0;
> > return nb_cfg;
> > }
> >
> > /* Propagates the local cfg seqno, 'cur_cfg', to the chassis_private record
> > * and to the local OVS DB.
> > + *
> > + * The br-int external_ids triplet (ovn-nb-cfg, ovn-nb-cfg-ts,
> > + * ovn-nb-cfg-sb-ts) is stamped unconditionally, independent of
> > + * 'enable_ch_nb_cfg_update'. The SB chassis_private writeback remains
> > + * gated by 'enable_ch_nb_cfg_update' for deployments that want to suppress
> > + * the per-bump SB write load. An external exporter watching br-int can
> > + * compute the per-chassis propagation delta as
> > + * (ovn-nb-cfg-ts - ovn-nb-cfg-sb-ts)
> > + * regardless of the writeback setting.
> > + *
> > + * 'ovn-nb-cfg-sb-ts' is the SB_Global.nb_cfg_timestamp that was paired
> > + * with this cur_cfg at the moment the barrier was queued (snapshotted via
> > + * ofctrl-seqno's req_ts). This avoids the pitfall of pairing the just-
> > + * acked cur_cfg with whatever SB_Global timestamp happens to be current
> > + * now -- on a fast-churning fleet SB_Global may already have advanced
> > + * past cur_cfg by the time the barrier acks, which would under-report
> > + * the delta.
> > */
> > static void
> > store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
> > @@ -858,6 +893,7 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct
> > ovsdb_idl_txn *ovs_txn,
> > struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
> > ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
> > uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
> > + uint64_t cur_cfg_sb_ts = acked_nb_cfg_seqnos->last_acked_req_ts;
> > int64_t startup_ts = daemon_startup_ts();
> >
> > if (ovs_txn && br_int
> > @@ -894,6 +930,13 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct
> > ovsdb_idl_txn *ovs_txn,
> > cur_cfg_str);
> > ovsrec_bridge_update_external_ids_setkey(br_int,
> > OVS_NB_CFG_TS_NAME,
> > cur_cfg_ts_str);
> > + if (cur_cfg_sb_ts) {
>
> Why do we skip the 0 case? Should we clear if cur_cfg_sb_ts is 0 instead?
>
> > + char *sb_ts_str = xasprintf("%"PRIu64, cur_cfg_sb_ts);
> > + ovsrec_bridge_update_external_ids_setkey(br_int,
> > + OVS_NB_CFG_SB_TS_NAME,
> > + sb_ts_str);
> > + free(sb_ts_str);
> > + }
> > free(cur_cfg_ts_str);
> > free(cur_cfg_str);
> > }
> > @@ -8200,12 +8243,20 @@ main(int argc, char *argv[])
> > chassis, mac_cache_data);
> > }
> >
> > - ofctrl_seqno_update_create(
> > - ofctrl_seq_type_nb_cfg,
> > - get_nb_cfg(sbrec_sb_global_table_get(
> > - ovnsb_idl_loop.idl),
> > - ovnsb_cond_seqno,
> > - ovnsb_expected_cond_seqno));
> > + /* Snapshot (nb_cfg, sb_ts) atomically from SB_Global
> > + * and pair them through the barrier ack so the
> > + * eventual completion can be attributed to the
> > + * timestamp that corresponded to this exact nb_cfg
> > + * generation -- not whatever SB_Global value has
> > + * moved on to by the time the barrier acks. */
> > + int64_t sb_nb_cfg_ts = 0;
> > + uint64_t sb_nb_cfg = get_nb_cfg(
> > + sbrec_sb_global_table_get(ovnsb_idl_loop.idl),
> > + ovnsb_cond_seqno, ovnsb_expected_cond_seqno,
> > + &sb_nb_cfg_ts);
> > + ofctrl_seqno_update_create(ofctrl_seq_type_nb_cfg,
> > + sb_nb_cfg,
> > + (uint64_t) sb_nb_cfg_ts);
>
> Is this cast really needed?
>
> >
> > struct local_binding_data *binding_data =
> > runtime_data ? &runtime_data->lbinding_data : NULL;
> > diff --git a/lib/ofctrl-seqno.c b/lib/ofctrl-seqno.c
> > index 83c17c0e52..7c613dc0a4 100644
> > --- a/lib/ofctrl-seqno.c
> > +++ b/lib/ofctrl-seqno.c
> > @@ -36,6 +36,11 @@ struct ofctrl_seqno_update {
> > * application.
> > */
> > uint64_t req_cfg; /* Application specific seqno. */
> > + uint64_t req_ts; /* Opaque per-request timestamp captured by
> > + * the caller at the moment the update was
> > + * queued. Carried through to the acked
> > + * state so consumers can pair the acked
> > + * seqno with the input that produced it. */
> > };
> >
> > /* List of in flight sequence number updates. */
> > @@ -51,6 +56,9 @@ struct ofctrl_seqno_state {
> > * application consumed acked requests.
> > */
> > uint64_t cur_cfg; /* Last acked application seqno. */
> > + uint64_t cur_cfg_req_ts; /* req_ts that was paired with cur_cfg when
> > + * the update was queued. 0 if the caller
> > + * didn't supply a timestamp. */
> > uint64_t req_cfg; /* Last requested application seqno. */
> > };
> >
> > @@ -73,6 +81,7 @@ ofctrl_acked_seqnos_get(size_t seqno_type)
> > struct ofctrl_acked_seqnos *acked_seqnos = xmalloc(sizeof
> > *acked_seqnos);
> > acked_seqnos->acked = vector_clone(&state->acked_cfgs);
> > acked_seqnos->last_acked = state->cur_cfg;
> > + acked_seqnos->last_acked_req_ts = state->cur_cfg_req_ts;
> >
> > vector_clear(&state->acked_cfgs);
> > if (vector_capacity(&state->acked_cfgs) >= VECTOR_THRESHOLD) {
> > @@ -140,6 +149,7 @@ ofctrl_seqno_add_type(void)
> > struct ofctrl_seqno_state state = (struct ofctrl_seqno_state) {
> > .acked_cfgs = VECTOR_EMPTY_INITIALIZER(uint64_t),
> > .cur_cfg = 0,
> > + .cur_cfg_req_ts = 0,
> > .req_cfg = 0,
> > };
> > vector_push(&ofctrl_seqno_states, &state);
> > @@ -149,9 +159,16 @@ ofctrl_seqno_add_type(void)
> >
> > /* Creates a new seqno update request for an application specific
> > * 'seqno_type'.
> > + *
> > + * 'req_ts' is an opaque per-request timestamp captured by the caller (for
> > + * example, the SB_Global nb_cfg_timestamp at the moment we read
> > 'new_cfg').
> > + * It is carried unchanged to ofctrl_acked_seqnos_get() so consumers can
> > + * pair the eventual ack with the input state that produced it. Callers
> > + * that don't need this pairing should pass 0.
> > */
> > void
> > -ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg)
> > +ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg,
> > + uint64_t req_ts)
>
> Maybe we should add a ofctrl_stamped_seqno_update_create() wrapper? We
> call ofctrl_seqno_update_create() with a non-zero req_ts only in one
> place in our code base.
>
> > {
> > struct ofctrl_seqno_state *state = ofctrl_seqno_state_get(seqno_type);
> >
> > @@ -169,6 +186,7 @@ ofctrl_seqno_update_create(size_t seqno_type, uint64_t
> > new_cfg)
> > .seqno_type = seqno_type,
> > .flow_cfg = ofctrl_req_seqno,
> > .req_cfg = new_cfg,
> > + .req_ts = req_ts,
> > };
> > vector_push(&ofctrl_seqno_updates, &update);
> > }
> > @@ -190,6 +208,7 @@ ofctrl_seqno_run(uint64_t flow_cfg)
> > struct ofctrl_seqno_state *state =
> > ofctrl_seqno_state_get(update->seqno_type);
> > state->cur_cfg = update->req_cfg;
> > + state->cur_cfg_req_ts = update->req_ts;
> > vector_push(&state->acked_cfgs, &update->req_cfg);
> >
> > index++;
> > diff --git a/lib/ofctrl-seqno.h b/lib/ofctrl-seqno.h
> > index faa97cc535..66b666fe4d 100644
> > --- a/lib/ofctrl-seqno.h
> > +++ b/lib/ofctrl-seqno.h
> > @@ -23,10 +23,18 @@
> >
> > /* Collection of acked ofctrl_seqno_update requests and the most recent
> > * 'last_acked' value.
> > + *
> > + * 'last_acked_req_ts' carries the opaque timestamp that was associated
> > with
> > + * 'last_acked' at the time the seqno was requested. Consumers that don't
> > + * pass a timestamp at create time will see 0 here. The timestamp is
> > + * preserved across the barrier so that applications can pair the acked
> > + * config seqno with the input state that produced it (e.g. SB_Global
> > + * nb_cfg_timestamp at the moment we asked OVS to barrier).
> > */
> > struct ofctrl_acked_seqnos {
> > struct vector acked;
> > uint64_t last_acked;
> > + uint64_t last_acked_req_ts;
> > };
> >
> > struct ofctrl_acked_seqnos *ofctrl_acked_seqnos_get(size_t seqno_type);
> > @@ -35,7 +43,8 @@ bool ofctrl_acked_seqnos_contains(const struct
> > ofctrl_acked_seqnos *seqnos,
> > uint64_t val);
> >
> > size_t ofctrl_seqno_add_type(void);
> > -void ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg);
> > +void ofctrl_seqno_update_create(size_t seqno_type, uint64_t new_cfg,
> > + uint64_t req_ts);
> > void ofctrl_seqno_run(uint64_t flow_cfg);
> > uint64_t ofctrl_seqno_get_req_cfg(void);
> > void ofctrl_seqno_flush(void);
> > diff --git a/lib/test-ofctrl-seqno.c b/lib/test-ofctrl-seqno.c
> > index 7d478c033e..b31a50ecdc 100644
> > --- a/lib/test-ofctrl-seqno.c
> > +++ b/lib/test-ofctrl-seqno.c
> > @@ -123,7 +123,7 @@ test_ofctrl_seqno_ack_seqnos(struct ovs_cmdl_context
> > *ctx)
> > &app_seqno)) {
> > return;
> > }
> > - ofctrl_seqno_update_create(i, app_seqno);
> > + ofctrl_seqno_update_create(i, app_seqno, 0);
> > }
> > }
> > printf("ofctrl-seqno-req-cfg: %u\n", n_reqs);
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index c3c198f2f3..0eaef6de44 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -541,6 +541,7 @@ update_sequence_numbers(int64_t loop_start_time,
> > * Also set up to update sb_cfg once our southbound transaction
> > commits. */
> > if (nb->nb_cfg != sb->nb_cfg) {
> > sbrec_sb_global_set_nb_cfg(sb, nb->nb_cfg);
> > + sbrec_sb_global_set_nb_cfg_timestamp(sb, loop_start_time);
> > nbrec_nb_global_set_nb_cfg_timestamp(nb, loop_start_time);
> > }
> > sb_loop->next_cfg = nb->nb_cfg;
> > @@ -944,6 +945,8 @@ main(int argc, char *argv[])
> >
> > /* Disable alerting for pure write-only columns. */
> > ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_sb_global_col_nb_cfg);
> > + ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> > + &sbrec_sb_global_col_nb_cfg_timestamp);
> > ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_address_set_col_name);
> > ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> > &sbrec_address_set_col_addresses);
> > for (size_t i = 0; i < SBREC_LOGICAL_FLOW_N_COLUMNS; i++) {
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index d9a91739cc..afe691a0d3 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,11 +1,12 @@
> > {
> > "name": "OVN_Southbound",
> > - "version": "21.8.0",
> > - "cksum": "614397313 36713",
> > + "version": "21.8.1",
> > + "cksum": "3241242866 36779",
> > "tables": {
> > "SB_Global": {
> > "columns": {
> > "nb_cfg": {"type": {"key": "integer"}},
> > + "nb_cfg_timestamp": {"type": {"key": "integer"}},
> > "external_ids": {
> > "type": {"key": "string", "value": "string",
> > "min": 0, "max": "unlimited"}},
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index e45b63d73f..c1883db2e3 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -156,6 +156,15 @@
> > the southbound database to bring it up to date with these changes,
> > it
> > updates this column to the same value.
> > </column>
> > +
> > + <column name="nb_cfg_timestamp">
> > + The time at which <code>ovn-northd</code> last wrote
> > + <ref column="nb_cfg"/> to the southbound database, in milliseconds
> > + since the Unix epoch. Set atomically with each update to
> > + <ref column="nb_cfg"/>. Hypervisors read this value to measure
> > + end-to-end propagation latency from northbound commit to local
> > + datapath programming completion.
> > + </column>
> > </group>
> >
> > <group title="Common Columns">
>
> Regards,
> Dumitru
>
>_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev