On Tue, Aug 15, 2023 at 5:17 PM Dumitru Ceara <[email protected]> wrote:
> On 8/15/23 17:01, Ales Musil wrote: > > Add config option called "northd-backoff-interval-ms" that allows > > to delay northd engine runs capped by the config option. > > When the config option is set to 0 or unspecified, the engine > > will run without any restrictions. If the value >0 we will delay > > northd engine run by the previous run time capped by the > > config option. > > > > The reason to delay northd is to prevent it from consuming 100% > > CPU all the time, secondary to that is the batch of NB updates > > that could be processed in single northd run. With the recent > > changes to I-P the engine processing updates faster, but not > > quite fast enough for processing to be faster than NB changes, > > which in turn can result into northd never going to sleep. With > > the backoff period enabled northd can sleep and process the DB > > updates in bigger batch. > > > > In addition to process the updates as fast as possible wake the > > northd immediately if there are changes accumulated over period > > of 500 ms or bigger. > > > > The results are very noticeable during scale testing. > > Run without any backoff period: > > northd aggregate CPU 9810% avg / 12765% max > > northd was spinning at 100% CPU the entire second half of the test. > > > > Run with 200 ms max backoff period: > > northd aggregate CPU 6066% avg / 7689% max > > northd was around 60% for the second half of the test. > > > > One thing to note is that the overall latency was slightly > > increased from P99 4s to P994.1s. > > > > Signed-off-by: Ales Musil <[email protected]> > > --- > > NEWS | 2 ++ > > northd/inc-proc-northd.c | 23 ++++++++++++++++++++++- > > northd/inc-proc-northd.h | 3 ++- > > northd/ovn-northd.c | 15 +++++++++++++-- > > ovn-nb.xml | 9 +++++++++ > > 5 files changed, 48 insertions(+), 4 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 8275877f9..6109f13a2 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -10,6 +10,8 @@ Post v23.06.0 > > - To allow optimizing ovn-controller's monitor conditions for the > regular > > VIF case, ovn-controller now unconditionally monitors all sub-ports > > (ports with parent_port set). > > + - Add "northd-backoff-interval-ms" config option to delay northd > engine > > + runs capped at the set value. > > > > OVN v23.06.0 - 01 Jun 2023 > > -------------------------- > > diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c > > index d328deb22..87db50ad1 100644 > > --- a/northd/inc-proc-northd.c > > +++ b/northd/inc-proc-northd.c > > @@ -42,6 +42,8 @@ VLOG_DEFINE_THIS_MODULE(inc_proc_northd); > > > > static unixctl_cb_func chassis_features_list; > > > > +static int64_t next_northd_run_ms = 0; > > + > > #define NB_NODES \ > > NB_NODE(nb_global, "nb_global") \ > > NB_NODE(logical_switch, "logical_switch") \ > > @@ -295,8 +297,10 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, > > /* Returns true if the incremental processing ended up updating nodes. > */ > > bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn, > > struct ovsdb_idl_txn *ovnsb_txn, > > - bool recompute) { > > + bool recompute, uint32_t backoff_ms) { > > ovs_assert(ovnnb_txn && ovnsb_txn); > > + > > + int64_t start = time_msec(); > > engine_init_run(); > > > > /* Force a full recompute if instructed to, for example, after a > NB/SB > > @@ -330,6 +334,12 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn > *ovnnb_txn, > > } else { > > engine_set_force_recompute(false); > > } > > + > > + int64_t now = time_msec(); > > + /* Postpone the next run by length of current run with maximum > capped > > + * by "northd-backoff-interval-ms" interval. */ > > + next_northd_run_ms = now + MIN(now - start, backoff_ms); > > + > > return engine_has_updated(); > > } > > > > @@ -339,6 +349,17 @@ void inc_proc_northd_cleanup(void) > > engine_set_context(NULL); > > } > > > > +bool > > +inc_proc_northd_can_run(bool recompute) > > +{ > > + if (recompute || time_msec() >= next_northd_run_ms) { > > + return true; > > + } > > + > > + poll_timer_wait_until(next_northd_run_ms); > > + return false; > > +} > > + > > static void > > chassis_features_list(struct unixctl_conn *conn, int argc OVS_UNUSED, > > const char *argv[] OVS_UNUSED, void *features_) > > diff --git a/northd/inc-proc-northd.h b/northd/inc-proc-northd.h > > index 9b81c7ee0..af418d7d7 100644 > > --- a/northd/inc-proc-northd.h > > +++ b/northd/inc-proc-northd.h > > @@ -10,7 +10,8 @@ void inc_proc_northd_init(struct ovsdb_idl_loop *nb, > > struct ovsdb_idl_loop *sb); > > bool inc_proc_northd_run(struct ovsdb_idl_txn *ovnnb_txn, > > struct ovsdb_idl_txn *ovnsb_txn, > > - bool recompute); > > + bool recompute, uint32_t backoff_ms); > > void inc_proc_northd_cleanup(void); > > +bool inc_proc_northd_can_run(bool recompute); > > > > #endif /* INC_PROC_NORTHD */ > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > index 4fa1b039e..2da06715f 100644 > > --- a/northd/ovn-northd.c > > +++ b/northd/ovn-northd.c > > @@ -710,6 +710,12 @@ run_idl_loop(struct ovsdb_idl_loop *idl_loop, const > char *name) > > VLOG(duration > 500 ? VLL_INFO : VLL_DBG, > > "%s IDL run: %d iterations in %lld ms", name, n + 1, > duration); > > } > > + > > + /* Wake the northd immediately if the duration exceeds 500 ms. */ > > + if (duration >= 500) { > > + poll_immediate_wake(); > > + } > > + > > Is this good enough or, more precisely, will this allow northd to run > the I-P engine next time it wakes up? On a closer look I think not, right? > You are right I misunderstood it will be fixed in v4. > > > return txn; > > } > > > > @@ -868,6 +874,7 @@ main(int argc, char *argv[]) > > /* Main loop. */ > > exiting = false; > > > > + uint32_t northd_backoff_ms = 0; > > bool recompute = false; > > while (!exiting) { > > update_ssl_config(); > > @@ -932,10 +939,12 @@ main(int argc, char *argv[]) > > > > if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) { > > bool activity = false; > > - if (ovnnb_txn && ovnsb_txn) { > > + if (ovnnb_txn && ovnsb_txn && > > + inc_proc_northd_can_run(recompute)) { > > int64_t loop_start_time = time_wall_msec(); > > activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn, > > - recompute); > > + recompute, > > + northd_backoff_ms); > > recompute = false; > > check_and_add_supported_dhcp_opts_to_sb_db( > > ovnsb_txn, ovnsb_idl_loop.idl); > > @@ -1019,6 +1028,8 @@ main(int argc, char *argv[]) > > if (nb) { > > interval = smap_get_int(&nb->options, > "northd_probe_interval", > > interval); > > + northd_backoff_ms = smap_get_uint(&nb->options, > > + > "northd-backoff-interval-ms", 0); > > } > > set_idl_probe_interval(ovnnb_idl_loop.idl, ovnnb_db, interval); > > set_idl_probe_interval(ovnsb_idl_loop.idl, ovnsb_db, interval); > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index 4fbf4f7e5..bca280367 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -349,6 +349,15 @@ > > of HWOL compatibility with GDP. > > </column> > > > > + <column name="options" key="northd-backoff-interval-ms"> > > + Maximum interval that the northd incremental engine is delayed > by > > + in milliseconds. Setting the value to nonzero delays the next > northd > > + engine run by the previous run time, capped by the specified > value. > > + If the value is zero the engine won't be delayed at all. > > + The recommended period is smaller than 500 ms, beyond that the > latency > > + of SB changes would be very noticeable. > > + </column> > > + > > <group title="Options for configuring interconnection route > advertisement"> > > <p> > > These options control how routes are advertised between OVN > > Thanks, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA <https://www.redhat.com> [email protected] IM: amusil <https://red.ht/sig> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
