Hi Mark, Thank you for your valuable comments! Could you please answer my main question that I originally raised to Dumitru :
Do you want to have this fix in main at all? I mean this is a real problem in the branch 22 but mostly is not affected the main branch. So, I will provide this fix for our running environment (which is v22) but not sure you want to have it in main. best regards, Alexander On 2/7/25 9:19 PM, Mark Michelson wrote: > Hi Aleksandr, thanks for the patch. > > This patch needs a rebase because it conflicts with > tests/ovn-northd.at in current main. > > See below for my findings. > > On 2/5/25 10:21, Aleksandr Smirnov wrote: >> According to the ovn-architecture(7) sb_cfg counter should be >> propagated at the moment northd completed transaction of related changes >> to the southbound db. However, a processing engine call happened right >> between two events, a sb transaction commitment, and initiating >> the sb_cfg write. Normally, that processing engine call has nothing >> to do, because it has only update from sb db that fires back result >> of recent transaction from northd. But if, by some reason, engine >> change handler falls into full recompute there will be big delay >> before sb_cfg propagated to nb db, and in fact it really happened >> in old release, f.e. 22.09 >> >> The fix proposed defers engine run for one iteration (of main loop) >> if we have sb_cfg ready to propagate. >> >> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud> >> --- >> northd/inc-proc-northd.c | 1 + >> northd/ovn-northd.c | 22 ++++++++++++++++++++-- >> tests/ovn-northd.at | 23 +++++++++++++++++++++++ >> 3 files changed, 44 insertions(+), 2 deletions(-) >> >> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c >> index 4a1d4eac9..6f348e610 100644 >> --- a/northd/inc-proc-northd.c >> +++ b/northd/inc-proc-northd.c >> @@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn >> *ovnnb_txn, >> VLOG_DBG("engine was canceled, force recompute next time."); >> engine_set_force_recompute_immediate(); >> } else { >> + VLOG_DBG("Engine has ran successfully."); > > I think you can test your change without the need for this new debug > message. > > In lib/inc-proc-eng.c, there is a debug message that prints for every > run of the engine: "Initializing new run". I think it makes more sense > to use the existing message since it prints for all engine runs, not > just successful ones. This is important since your test is trying to > ensure the engine does not attempt to run when we have a pending > change to NB_Global:sb_cfg . > >> engine_clear_force_recompute(); >> } >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >> index d3470d94c..e6adb76d7 100644 >> --- a/northd/ovn-northd.c >> +++ b/northd/ovn-northd.c >> @@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, >> bool activity) >> memory_trimmer_wait(mt); >> } >> +static bool >> +sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl, >> + struct ovsdb_idl_loop *sb_loop) >> +{ >> + const struct nbrec_nb_global *nb = >> nbrec_nb_global_first(ovnnb_idl); >> + >> + return (nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg) >> + ? true : false; >> +} >> + >> int >> main(int argc, char *argv[]) >> { >> @@ -1052,8 +1062,16 @@ main(int argc, char *argv[]) >> if (ovnnb_txn && ovnsb_txn && >> inc_proc_northd_can_run(&eng_ctx)) { >> int64_t loop_start_time = time_wall_msec(); >> - activity = inc_proc_northd_run(ovnnb_txn, >> ovnsb_txn, >> - &eng_ctx); >> + >> + if (!sb_cfg_is_updated(ovnnb_idl_loop.idl, >> + &ovnsb_idl_loop)) { >> + activity = inc_proc_northd_run(ovnnb_txn, >> + ovnsb_txn, >> + &eng_ctx); >> + } else { >> + clear_idl_track = false; > > I think you should add a call to poll_immediate_wake() here. In > theory, we should write the new sb_cfg value to the northbound > database. Then that should trigger an update from the northbound > database that will wake up northd and process what was skipped in this > loop. However, since we also know that we definitely have data at hand > that the incremental engine needs to process, we should not rely on > the database transactions to work as expected before we process that > data. Calling poll_immediate_wake() will ensure the loop wakes up > immediately and performs an incremental engine run. > >> + } >> + >> check_and_add_supported_dhcp_opts_to_sb_db( >> ovnsb_txn, ovnsb_idl_loop.idl); >> check_and_add_supported_dhcpv6_opts_to_sb_db( >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >> index 7abd4d3f3..ae6c999f9 100644 >> --- a/tests/ovn-northd.at >> +++ b/tests/ovn-northd.at >> @@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0 >> name=239.0.1.68 >> OVN_CLEANUP([hv1], [hv2]) >> AT_CLEANUP >> ]) >> + >> + >> + >> +###################################333 >> +#####################################33333 > > Please use a single blank line to separate test cases. There is no > need for custom separators. > >> +OVN_FOR_EACH_NORTHD_NO_HV([ >> +AT_SETUP([sb_cfg propagation]) >> + > > This test could use a comment explaining what its purpose is, because > the code itself is fairly esoteric without the accompanying commit > message. > >> +ovn_start >> + >> +rm -f northd/ovn-northd.log >> +check as northd ovn-appctl -t ovn-northd vlog/reopen >> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg >> +check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_northd:dbg >> + >> +check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1 >> +wait_column 1 nb:nb_global sb_cfg > > Instead of using wait_column here, you should be able to use the > --wait=sb option in the previous ovn-nbctl command. > > check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1 > > Also, out of curiosity, why are you adding a switch? > >> + >> +call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has >> ran)" northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'` >> +AT_CHECK([test x`echo $call_seq` = xEEuuE]) > > I'm worried that the "E's" in this string could end up changing due to > unrelated code changes that happen either in northd or in the > incremental engine. It's possible we could end up seeing more or fewer > engine runs than what the test is checking for. > > What's most important is that there are no "E's" between the two > "u's". So maybe instead of looking for an exact match of "EEuuE", you > could simply ensure that you see "uu" together in the output string? > >> + >> +AT_CLEANUP >> +]) > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev