On 6/2/25 1:33 PM, Felix Huettner via dev wrote: > When starting the ovn-controller without monitor_all we need a few > iterations of the engine node and updating monitoring conditions until > we have loaded all data. > Previously this was handled by the number of main loop iterations of > ovn-controller and an arbitrary timeout. > However this is was always best effort and there was no guarantee that > we actually had loaded all data. > > To solve this we now track the monitoring requests we sent out to > southbound and when that data becomes available. > > For monitor_all we just need to have the initial monitoring conditions > fullfilled and are afterwards sure that we have all required data. > > For non monitor_all we need at least one more iteration of the engine > node and updating sb monitoring conditions. We limit this to 20 rounds > for now, which should be sufficient and is at least the value of what > the previous implementation provided. > However if we notice that we do not update monitoring condition anymore > after a engine run we can also assume that we have finished loading all > data. > > Signed-off-by: Felix Huettner <felix.huettner@stackit.cloud> > --- > v9->v10: countdown only if we had actually a change to compute changes > v8->v9: addressed reviews > v7->v8: introduced >
Hi Felix, Thanks for the patch, it looks good to me. I think it addresses Ilya's and Han's concerns from other similar patches: https://patchwork.ozlabs.org/project/ovn/patch/20220822091918.2804852-1-i.maxim...@ovn.org/ https://patchwork.ozlabs.org/comment/3510940/ I went ahead and applied it with the following minor changes (removed the debug appctl as we don't use it anymore and refactored a bit the code to make it slightly more readable - in my opinion). I also took care of Ales' comment and added his ack. diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 1671baa189..37b88664a9 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -113,7 +113,6 @@ static unixctl_cb_func debug_dump_lflow_conj_ids; static unixctl_cb_func lflow_cache_flush_cmd; static unixctl_cb_func lflow_cache_show_stats_cmd; static unixctl_cb_func debug_delay_nb_cfg_report; -static unixctl_cb_func debug_ignore_startup_delay; #define DEFAULT_BRIDGE_NAME "br-int" #define DEFAULT_DATAPATH "system" @@ -6066,9 +6065,6 @@ main(int argc, char *argv[]) unixctl_command_register("debug/dump-mac-bindings", "", 0, 0, debug_dump_local_mac_bindings, &mac_cache_data->mac_bindings); - - unixctl_command_register("debug/ignore-startup-delay", "", 0, 0, - debug_ignore_startup_delay, NULL); ovn_debug_commands_register(); unsigned int ovs_cond_seqno = UINT_MAX; @@ -6153,8 +6149,8 @@ main(int argc, char *argv[]) * If we have sb_monitor_all that means we have all data that we would * ever need. * In other cases we depend on engine runs. This is handled below. */ - if (ovnsb_cond_seqno == ovnsb_expected_cond_seqno && - ovnsb_expected_cond_seqno != UINT_MAX && sb_monitor_all) { + if (ovnsb_cond_seqno == ovnsb_expected_cond_seqno + && ovnsb_expected_cond_seqno != UINT_MAX && sb_monitor_all) { daemon_started_recently_ignore(); } @@ -6443,14 +6439,17 @@ main(int argc, char *argv[]) daemon_started_recently_countdown(); } } - /* If there is no new expected seqno we have - * finished loading all needed data from - * southbound. We then need to run one more time since - * we might behave differently. */ - if (daemon_started_recently() - && ovnsb_cond_seqno == ovnsb_expected_cond_seqno) { - daemon_started_recently_ignore(); - poll_immediate_wake(); + /* If there is no new expected seqno we have finished + * loading all needed data from southbound. We then + * need to run one more time since we might behave + * differently. */ + if (daemon_started_recently()) { + bool condition_changed = ovnsb_cond_seqno != + ovnsb_expected_cond_seqno; + if (!condition_changed) { + daemon_started_recently_ignore(); + poll_immediate_wake(); + } } if (ovs_idl_txn) { update_qos(sbrec_port_binding_by_name, ovs_idl_txn, @@ -7133,11 +7132,3 @@ debug_dump_local_mac_bindings(struct unixctl_conn *conn, int argc OVS_UNUSED, unixctl_command_reply(conn, ds_cstr(&mb_str)); ds_destroy(&mb_str); } - -static void -debug_ignore_startup_delay(struct unixctl_conn *conn, int argc OVS_UNUSED, - const char *argv[] OVS_UNUSED, void *arg OVS_UNUSED) -{ - daemon_started_recently_ignore(); - unixctl_command_reply(conn, NULL); -} -- Regards, Dumitru > controller/ovn-controller.c | 33 ++++++++++++++++++++++++++++++--- > lib/ovn-util.c | 19 ++++++++++++++----- > tests/ovn-controller.at | 1 - > tests/ovn-performance.at | 3 --- > tests/ovn.at | 10 ---------- > 5 files changed, 44 insertions(+), 22 deletions(-) > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 10c7ffa26..1671baa18 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -6148,6 +6148,16 @@ main(int argc, char *argv[]) > ovnsb_cond_seqno = new_ovnsb_cond_seqno; > } > > + /* Check if we have received all initial dumps of the southbound > + * based on the monitor condtions we set. > + * If we have sb_monitor_all that means we have all data that we > would > + * ever need. > + * In other cases we depend on engine runs. This is handled below. */ > + if (ovnsb_cond_seqno == ovnsb_expected_cond_seqno && > + ovnsb_expected_cond_seqno != UINT_MAX && sb_monitor_all) { > + daemon_started_recently_ignore(); > + } > + > struct engine_context eng_ctx = { > .ovs_idl_txn = ovs_idl_txn, > .ovnsb_idl_txn = ovnsb_idl_txn, > @@ -6305,9 +6315,6 @@ main(int argc, char *argv[]) > > stopwatch_stop(CONTROLLER_LOOP_STOPWATCH_NAME, > time_msec()); > - if (engine_has_updated()) { > - daemon_started_recently_countdown(); > - } > > ct_zones_data = engine_get_data(&en_ct_zones); > bfd_chassis_data = engine_get_data(&en_bfd_chassis); > @@ -6417,6 +6424,8 @@ main(int argc, char *argv[]) > * logical datapath goups changed. */ > if (engine_node_changed(&en_runtime_data) > || engine_node_changed(&en_sb_logical_dp_group)) > { > + bool had_all_data = ovnsb_cond_seqno == > + ovnsb_expected_cond_seqno; > ovnsb_expected_cond_seqno = > update_sb_monitors( > ovnsb_idl_loop.idl, chassis, > @@ -6424,6 +6433,24 @@ main(int argc, char *argv[]) > &runtime_data->lbinding_data.bindings, > &runtime_data->local_datapaths, > sb_monitor_all); > + bool condition_changed = ovnsb_cond_seqno != > + > ovnsb_expected_cond_seqno; > + if (had_all_data && condition_changed) { > + /* We limit the amount of condition updates > + * that we treat as daemon_started_recently. > + * This allows us to proceed even if there is > + * a continuous reason for monitor updates. > */ > + daemon_started_recently_countdown(); > + } > + } > + /* If there is no new expected seqno we have > + * finished loading all needed data from > + * southbound. We then need to run one more time > since > + * we might behave differently. */ > + if (daemon_started_recently() > + && ovnsb_cond_seqno == > ovnsb_expected_cond_seqno) { > + daemon_started_recently_ignore(); > + poll_immediate_wake(); > } > if (ovs_idl_txn) { > update_qos(sbrec_port_binding_by_name, > ovs_idl_txn, > diff --git a/lib/ovn-util.c b/lib/ovn-util.c > index b3d2d5dd7..9f0bc99ba 100644 > --- a/lib/ovn-util.c > +++ b/lib/ovn-util.c > @@ -937,13 +937,24 @@ get_bridge(const struct ovsrec_bridge_table > *bridge_table, const char *br_name) > return NULL; > } > > +/* This counts the amount of iterations of monitoring condition updates and > + * their respective update towards us. > + * This only needs to handle the case of non monitor_all since that will use > + * the ignore feature below. > + * In this case we need at least 2 iterations: > + * 1. the initial iteration where we pull the sb content based on the few > + * conditions we now have. > + * 2. after the first engine run we have enough information to update the > + * monitoring conditions to their final values. > + * However based on the structure of the datapaths we might need more. The > + * value below is just a hard limit of iterations. We detect if we are done > + * earlier and then skip further iterations. */ > #define DAEMON_STARTUP_DELAY_SEED 20 > -#define DAEMON_STARTUP_DELAY_MS 10000 > > static int64_t startup_ts; > static int startup_delay = DAEMON_STARTUP_DELAY_SEED; > > -/* Used by debug command only, for tests. */ > +/* Used if we do not need the startup delay (e.g. when using monitor_all). */ > static bool ignore_startup_delay = false; > > OVS_CONSTRUCTOR(startup_ts_initializer) { > @@ -984,9 +995,7 @@ daemon_started_recently(void) > if (startup_delay) { > return true; > } > - > - /* Ensure that at least an amount of time has passed. */ > - return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS; > + return false; > } > > /* Builds a unique address set compatible name ([a-zA-Z_.][a-zA-Z_.0-9]*) > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at > index ad2f4e3ab..ae7eb6f31 100644 > --- a/tests/ovn-controller.at > +++ b/tests/ovn-controller.at > @@ -161,7 +161,6 @@ check_patches \ > 'br-eth0 patch-localnet2-to-br-int patch-br-int-to-localnet2' > > # Delete the mapping and the ovn-bridge-mapping patch ports should go away. > -check ovn-appctl -t ovn-controller debug/ignore-startup-delay > AT_CHECK([ovs-vsctl remove Open_vSwitch . external-ids ovn-bridge-mappings]) > check_bridge_mappings > check_patches > diff --git a/tests/ovn-performance.at b/tests/ovn-performance.at > index 44bd4f7e4..4ddb1bd81 100644 > --- a/tests/ovn-performance.at > +++ b/tests/ovn-performance.at > @@ -238,9 +238,6 @@ for i in `seq 1 5`; do > as hv$i > ovs-vsctl add-br br-phys > ovn_attach n1 br-phys 192.168.0.$i > - # Do not postpone patch ports related changes, as they may occur more or > less anywere within the test, > - # hence causing unexpected recomputes. > - check ovn-appctl -t ovn-controller debug/ignore-startup-delay > if [[ $i -ge 3 ]] ; then > ovs-vsctl add-br br-ex > ovs-vsctl set open . external_ids:ovn-bridge-mappings="public:br-ex" > diff --git a/tests/ovn.at b/tests/ovn.at > index 46a0f09bc..bc58f8650 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -35309,7 +35309,6 @@ OVS_WAIT_UNTIL([ > test x430 = x$(as hv1 ovs-vsctl get Interface ${iface1_uuid} mtu_request) > ]) > > -as hv1 check ovn-appctl -t ovn-controller debug/ignore-startup-delay > # Check that pointing requested-chassis somewhere else will unplug the port > check ovn-nbctl --wait=hv set Logical_Switch_Port lsp1 \ > options:requested-chassis=non-existent-chassis > @@ -35317,7 +35316,6 @@ OVS_WAIT_UNTIL([ > ! as hv1 ovs-vsctl get Interface ${iface1_uuid} _uuid > ]) > > -as hv2 check ovn-appctl -t ovn-controller debug/ignore-startup-delay > # Check that removing an lport will unplug it > AT_CHECK([test x${iface2_uuid} = x$(as hv2 ovs-vsctl get Interface > ${iface2_uuid} _uuid)], [0], []) > check ovn-nbctl --wait=hv lsp-del ${lsp2_uuid} > @@ -38081,9 +38079,6 @@ ovs-vsctl -- add-port br-phys-2 fake-phys-2-to-int-2 > -- \ > > ovn_attach n1 br-phys 192.168.1.1 24 > > -# allow controller to immediately clean patch ports up > -check ovn-appctl -t ovn-controller debug/ignore-startup-delay > - > # check that patch port on br-int is cleaned up (and also its peer) > OVS_WAIT_UNTIL([ > test 0 = $(ovs-vsctl --columns _uuid --bare find Port > name=fake-int-to-phys | wc -l) > @@ -38149,9 +38144,6 @@ check ovn-nbctl lsp-set-options ln_port > network_name=phys-1 > wait_for_ports_up > check ovn-nbctl --wait=hv sync > > -# Allow controller to immediately clean patch ports up if any. > -check ovn-appctl -t ovn-controller debug/ignore-startup-delay > - > # check that patch port created on br-int and br-phys-1. > OVS_WAIT_UNTIL([ > test 1 = $(ovs-vsctl --columns _uuid --bare find Port > name=patch-br-int-to-ln_port | wc -l) > @@ -41416,8 +41408,6 @@ check ovn-nbctl lsp-add sw0 sw0p1 > > check ovn-nbctl --wait=hv sync > > -check ovn-appctl -t ovn-controller debug/ignore-startup-delay > - > # Waits until the OVS database contains exactly the specified patch ports. > # Each argument should be of the form BRIDGE PORT PEER. > check_patch_ports () { _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev