Hi Mark,

Thank you for your valuable comments!
Could you please answer my main question that I originally raised to 
Dumitru :

Do you want to have this fix in main at all?

I mean this is a real problem in the branch 22 but mostly is not 
affected the main branch.
So, I will provide this fix for our running environment (which is v22) 
but not sure you want to have it in main.

best regards,

Alexander

On 2/7/25 9:19 PM, Mark Michelson wrote:
> Hi Aleksandr, thanks for the patch.
>
> This patch needs a rebase because it conflicts with 
> tests/ovn-northd.at in current main.
>
> See below for my findings.
>
> On 2/5/25 10:21, Aleksandr Smirnov wrote:
>> According to the ovn-architecture(7) sb_cfg counter should be
>> propagated at the moment northd completed transaction of related changes
>> to the southbound db. However, a processing engine call happened right
>> between two events, a sb transaction commitment, and initiating
>> the sb_cfg write. Normally, that processing engine call has nothing
>> to do, because it has only update from sb db that fires back result
>> of recent transaction from northd. But if, by some reason, engine
>> change handler falls into full recompute there will be big delay
>> before sb_cfg propagated to nb db, and in fact it really happened
>> in old release, f.e. 22.09
>>
>> The fix proposed defers engine run for one iteration (of main loop)
>> if we have sb_cfg ready to propagate.
>>
>> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud>
>> ---
>>   northd/inc-proc-northd.c |  1 +
>>   northd/ovn-northd.c      | 22 ++++++++++++++++++++--
>>   tests/ovn-northd.at      | 23 +++++++++++++++++++++++
>>   3 files changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
>> index 4a1d4eac9..6f348e610 100644
>> --- a/northd/inc-proc-northd.c
>> +++ b/northd/inc-proc-northd.c
>> @@ -480,6 +480,7 @@ bool inc_proc_northd_run(struct ovsdb_idl_txn 
>> *ovnnb_txn,
>>           VLOG_DBG("engine was canceled, force recompute next time.");
>>           engine_set_force_recompute_immediate();
>>       } else {
>> +        VLOG_DBG("Engine has ran successfully.");
>
> I think you can test your change without the need for this new debug 
> message.
>
> In lib/inc-proc-eng.c, there is a debug message that prints for every 
> run of the engine: "Initializing new run". I think it makes more sense 
> to use the existing message since it prints for all engine runs, not 
> just successful ones. This is important since your test is trying to 
> ensure the engine does not attempt to run when we have a pending 
> change to NB_Global:sb_cfg .
>
>>           engine_clear_force_recompute();
>>       }
>>   diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> index d3470d94c..e6adb76d7 100644
>> --- a/northd/ovn-northd.c
>> +++ b/northd/ovn-northd.c
>> @@ -798,6 +798,16 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, 
>> bool activity)
>>       memory_trimmer_wait(mt);
>>   }
>>   +static bool
>> +sb_cfg_is_updated(struct ovsdb_idl *ovnnb_idl,
>> +                  struct ovsdb_idl_loop *sb_loop)
>> +{
>> +    const struct nbrec_nb_global *nb = 
>> nbrec_nb_global_first(ovnnb_idl);
>> +
>> +    return (nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg)
>> +           ? true : false;
>> +}
>> +
>>   int
>>   main(int argc, char *argv[])
>>   {
>> @@ -1052,8 +1062,16 @@ main(int argc, char *argv[])
>>                   if (ovnnb_txn && ovnsb_txn &&
>>                       inc_proc_northd_can_run(&eng_ctx)) {
>>                       int64_t loop_start_time = time_wall_msec();
>> -                    activity = inc_proc_northd_run(ovnnb_txn, 
>> ovnsb_txn,
>> - &eng_ctx);
>> +
>> +                    if (!sb_cfg_is_updated(ovnnb_idl_loop.idl,
>> + &ovnsb_idl_loop)) {
>> +                        activity = inc_proc_northd_run(ovnnb_txn,
>> + ovnsb_txn,
>> + &eng_ctx);
>> +                    } else {
>> +                        clear_idl_track = false;
>
> I think you should add a call to poll_immediate_wake() here. In 
> theory, we should write the new sb_cfg value to the northbound 
> database. Then that should trigger an update from the northbound 
> database that will wake up northd and process what was skipped in this 
> loop. However, since we also know that we definitely have data at hand 
> that the incremental engine needs to process, we should not rely on 
> the database transactions to work as expected before we process that 
> data. Calling poll_immediate_wake() will ensure the loop wakes up 
> immediately and performs an incremental engine run.
>
>> +                    }
>> +
>> check_and_add_supported_dhcp_opts_to_sb_db(
>>                                    ovnsb_txn, ovnsb_idl_loop.idl);
>> check_and_add_supported_dhcpv6_opts_to_sb_db(
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index 7abd4d3f3..ae6c999f9 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -14579,3 +14579,26 @@ wait_row_count Multicast_Group 0 
>> name=239.0.1.68
>>   OVN_CLEANUP([hv1], [hv2])
>>   AT_CLEANUP
>>   ])
>> +
>> +
>> +
>> +###################################333
>> +#####################################33333
>
> Please use a single blank line to separate test cases. There is no 
> need for custom separators.
>
>> +OVN_FOR_EACH_NORTHD_NO_HV([
>> +AT_SETUP([sb_cfg propagation])
>> +
>
> This test could use a comment explaining what its purpose is, because 
> the code itself is fairly esoteric without the accompanying commit 
> message.
>
>> +ovn_start
>> +
>> +rm -f northd/ovn-northd.log
>> +check as northd ovn-appctl -t ovn-northd vlog/reopen
>> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
>> +check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_northd:dbg
>> +
>> +check ovn-nbctl ls-add sw1 -- set nb_global . nb_cfg=1
>> +wait_column 1 nb:nb_global sb_cfg
>
> Instead of using wait_column here, you should be able to use the 
> --wait=sb option in the previous ovn-nbctl command.
>
> check ovn-nbctl --wait=sb ls-add sw1 -- set nb_global . nb_cfg=1
>
> Also, out of curiosity, why are you adding a switch?
>
>> +
>> +call_seq=`grep -E "(transact.*sb_cfg)|(transact.*Southbound)|(has 
>> ran)" northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'`
>> +AT_CHECK([test x`echo $call_seq` = xEEuuE])
>
> I'm worried that the "E's" in this string could end up changing due to 
> unrelated code changes that happen either in northd or in the 
> incremental engine. It's possible we could end up seeing more or fewer 
> engine runs than what the test is checking for.
>
> What's most important is that there are no "E's" between the two 
> "u's". So maybe instead of looking for an exact match of "EEuuE", you 
> could simply ensure that you see "uu" together in the output string?
>
>> +
>> +AT_CLEANUP
>> +])
>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to