On Mon, Feb 10, 2025 at 4:40 AM Aleksandr Smirnov (K2 Cloud) <AleksSmirnov@k2.cloud> wrote:
> Hi Mark, > > Thank you for your valuable comments! > Could you please answer my main question that I originally raised to > Dumitru : > > Do you want to have this fix in main at all? > > I mean this is a real problem in the branch 22 but mostly is not > affected the main branch. > So, I will provide this fix for our running environment (which is v22) > but not sure you want to have it in main. > I think to answer your question, I need to know why this is not a problem in the main branch, currently. If the reason is that engine runs tend to be quicker, then it may be worth puting in main anyway. If there are other mitigating circumstances, then I'm not as sure. > > best regards, > > Alexander > > On 2/7/25 9:19 PM, Mark Michelson wrote: > > Hi Aleksandr, thanks for the patch. > > > > This patch needs a rebase because it conflicts with > > tests/ovn-northd.at in current main. > > > > See below for my findings. > > > > On 2/5/25 10:21, Aleksandr Smirnov wrote: > >> According to the ovn-architecture(7) sb_cfg counter should be > >> propagated at the moment northd completed transaction of related changes > >> to the southbound db. However, a processing engine call happened right > >> between two events, a sb transaction commitment, and initiating > >> the sb_cfg write. Normally, that processing engine call has nothing > >> to do, because it has only update from sb db that fires back result > >> of recent transaction from northd. But if, by some reason, engine > >> change handler falls into full recompute there will be big delay > >> before sb_cfg propagated to nb db, and in fact it really happened > >> in old release, f.e. 22.09 > >> > >> The fix proposed defers engine run for one iteration (of main loop) > >> if we have sb_cfg ready to propagate. > >> > >> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud> > >> --- > >> northd/inc-proc-northd.c | 1 + > >> northd/ovn-northd.c | 22 ++++++++++++++++++++-- > >> tests/ovn-northd.at | 23 +++++++++++++++++++++++ > >> 3 files changed, 44 insertions(+), 2 deletions(-) > >> > >> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c > >> index 4a1d4eac9..6f348e610 100644 > >> --- a/northd/inc-proc-northd.c > >> +++ b/northd/inc-proc-northd.c > >> @@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn > >> *ovnnb_txn, > >> VLOG_DBG("engine was canceled, force recompute next time."); > >> engine_set_force_recompute_immediate(); > >> } else { > >> + VLOG_DBG("Engine has ran successfully."); > > > > I think you can test your change without the need for this new debug > > message. > > > > In lib/inc-proc-eng.c, there is a debug message that prints for every > > run of the engine: "Initializing new run". I think it makes more sense > > to use the existing message since it prints for all engine runs, not > > just successful ones. This is important since your test is trying to > > ensure the engine does not attempt to run when we have a pending > > change to NB_Global:sb_cfg . > > > >> engine_clear_force_recompute(); > >> } > >> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >> index d3470d94c..e6adb76d7 100644 > >> --- a/northd/ovn-northd.c > >> +++ b/northd/ovn-northd.c > >> @@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, > >> bool activity) > >> memory_trimmer_wait(mt); > >> } > >> +static bool > >> +sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl, > >> + struct ovsdb_idl_loop *sb_loop) > >> +{ > >> + const struct nbrec_nb_global *nb = > >> nbrec_nb_global_first(ovnnb_idl); > >> + > >> + return (nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg) > >> + ? true : false; > >> +} > >> + > >> int > >> main(int argc, char *argv[]) > >> { > >> @@ -1052,8 +1062,16 @@ main(int argc, char *argv[]) > >> if (ovnnb_txn && ovnsb_txn && > >> inc_proc_northd_can_run(&eng_ctx)) { > >> int64_t loop_start_time = time_wall_msec(); > >> - activity = inc_proc_northd_run(ovnnb_txn, > >> ovnsb_txn, > >> - &eng_ctx); > >> + > >> + if (!sb_cfg_is_updated(ovnnb_idl_loop.idl, > >> + &ovnsb_idl_loop)) { > >> + activity = inc_proc_northd_run(ovnnb_txn, > >> + ovnsb_txn, > >> + &eng_ctx); > >> + } else { > >> + clear_idl_track = false; > > > > I think you should add a call to poll_immediate_wake() here. In > > theory, we should write the new sb_cfg value to the northbound > > database. Then that should trigger an update from the northbound > > database that will wake up northd and process what was skipped in this > > loop. However, since we also know that we definitely have data at hand > > that the incremental engine needs to process, we should not rely on > > the database transactions to work as expected before we process that > > data. Calling poll_immediate_wake() will ensure the loop wakes up > > immediately and performs an incremental engine run. > > > >> + } > >> + > >> check_and_add_supported_dhcp_opts_to_sb_db( > >> ovnsb_txn, ovnsb_idl_loop.idl); > >> check_and_add_supported_dhcpv6_opts_to_sb_db( > >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > >> index 7abd4d3f3..ae6c999f9 100644 > >> --- a/tests/ovn-northd.at > >> +++ b/tests/ovn-northd.at > >> @@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0 > >> name=239.0.1.68 > >> OVN_CLEANUP([hv1], [hv2]) > >> AT_CLEANUP > >> ]) > >> + > >> + > >> + > >> +###################################333 > >> +#####################################33333 > > > > Please use a single blank line to separate test cases. There is no > > need for custom separators. > > > >> +OVN_FOR_EACH_NORTHD_NO_HV([ > >> +AT_SETUP([sb_cfg propagation]) > >> + > > > > This test could use a comment explaining what its purpose is, because > > the code itself is fairly esoteric without the accompanying commit > > message. > > > >> +ovn_start > >> + > >> +rm -f northd/ovn-northd.log > >> +check as northd ovn-appctl -t ovn-northd vlog/reopen > >> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg > >> +check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_northd:dbg > >> + > >> +check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1 > >> +wait_column 1 nb:nb_global sb_cfg > > > > Instead of using wait_column here, you should be able to use the > > --wait=sb option in the previous ovn-nbctl command. > > > > check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1 > > > > Also, out of curiosity, why are you adding a switch? > > > >> + > >> +call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has > >> ran)" northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'` > >> +AT_CHECK([test x`echo $call_seq` = xEEuuE]) > > > > I'm worried that the "E's" in this string could end up changing due to > > unrelated code changes that happen either in northd or in the > > incremental engine. It's possible we could end up seeing more or fewer > > engine runs than what the test is checking for. > > > > What's most important is that there are no "E's" between the two > > "u's". So maybe instead of looking for an exact match of "EEuuE", you > > could simply ensure that you see "uu" together in the output string? > > > >> + > >> +AT_CLEANUP > >> +]) > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev