Re: [ovs-dev] [PATCH ovn, v3] northd: Remove delay in sb_cfg propagation.

Smirnov Aleksandr (K2 Cloud) Mon, 05 May 2025 02:26:54 -0700

Hello Mark,

Could you please review Dumitru's comment on poll_immediate_wake() usage?
I have no my own solid point of view on this.


Thank you,

Alexander


On 3/31/25 10:40 PM, Dumitru Ceara wrote:
> On 3/10/25 11:40 AM, Aleksandr Smirnov wrote:
>> According to the ovn-architecture(7) sb_cfg counter should be
>> propagated at the moment northd completed transaction of related changes
>> to the southbound db. However, a processing engine call happened right
>> between two events, a sb transaction commitment, and initiating
>> the sb_cfg write. Normally, that processing engine call has nothing
>> to do, because it has only update from sb db that fires back result
>> of recent transaction from northd. But if, by some reason, engine
>> change handler falls into full recompute there will be big delay
>> before sb_cfg propagated to nb db, and in fact it really happened
>> in old release, for example 22.09.
>>
>> The fix defers engine run for one iteration (of main loop)
>> if we have sb_cfg ready to propagate.
>>
>> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud>
>> ---
> Hi Aleksandr,
>
> Thanks for the patch and sorry for the delay in reviewing.
>
>> v2: Address Mark's comments.
>> v3: Address Vladislav's comments.
>> ---
>>   northd/ovn-northd.c | 22 ++++++++++++++++++++--
>>   tests/ovn-northd.at | 37 +++++++++++++++++++++++++++++++++++++
>>   2 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> index 72eedbfdb..68a31d0b0 100644
>> --- a/northd/ovn-northd.c
>> +++ b/northd/ovn-northd.c
>> @@ -798,6 +798,15 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, bool 
>> activity)
>>       memory_trimmer_wait(mt);
>>   }
>>   
>> +static bool
>> +sb_cfg_is_in_sync(struct ovsdb_idl *ovnnb_idl,
>> +                  struct ovsdb_idl_loop *sb_loop)
>> +{
>> +    const struct nbrec_nb_global *nb = nbrec_nb_global_first(ovnnb_idl);
>> +
>> +    return !(nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop->cur_cfg);
>> +}
>> +
>>   int
>>   main(int argc, char *argv[])
>>   {
>> @@ -1064,8 +1073,17 @@ main(int argc, char *argv[])
>>                   if (ovnnb_txn && ovnsb_txn &&
>>                       inc_proc_northd_can_run(&eng_ctx)) {
>>                       int64_t loop_start_time = time_wall_msec();
>> -                    activity = inc_proc_northd_run(ovnnb_txn, ovnsb_txn,
>> -                                                   &eng_ctx);
>> +
>> +                    if (sb_cfg_is_in_sync(ovnnb_idl_loop.idl,
>> +                                          &ovnsb_idl_loop)) {
>> +                        activity = inc_proc_northd_run(ovnnb_txn,
>> +                                                       ovnsb_txn,
>> +                                                       &eng_ctx);
>> +                    } else {
>> +                        poll_immediate_wake();
> I guess this call to poll_immediate_wake() was added after Mark's
> comment on v1 [0]:
>
>> I think you should add a call to poll_immediate_wake() here. In
>> theory, we should write the new sb_cfg value to the northbound
>> database. Then that should trigger an update from the northbound
>> database that will wake up northd and process what was skipped in this
>> loop. However, since we also know that we definitely have data at hand
>> that the incremental engine needs to process, we should not rely on
>> the database transactions to work as expected before we process that
>> data. Calling poll_immediate_wake() will ensure the loop wakes up
>> immediately and performs an incremental engine run.
> [0] https://mail.openvswitch.org/pipermail/ovs-dev/2025-February/420812.html
>
> However, even without it we should wake up for the next incoming event,
> whichever happens first; that is one of:
> - northd's SB transaction to bump SB_Global.nb_cfg (done by
> update_sequence_numbers() executed just below) completes
> - a different NB/SB update wakes us
> - a different scheduled poll wake timeout (e.g., mac binding refresh)
> expires
>
> Calling poll_immediate_wake() seems a little excessive but I might be
> wrong.  Am I missing a case here?
>
> Regards,
> Dumitru
>
>> +                        clear_idl_track = false;
>> +                    }
>> +
>>                       check_and_add_supported_dhcp_opts_to_sb_db(
>>                                    ovnsb_txn, ovnsb_idl_loop.idl);
>>                       check_and_add_supported_dhcpv6_opts_to_sb_db(
>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
>> index cfaba19bf..5db6d4984 100644
>> --- a/tests/ovn-northd.at
>> +++ b/tests/ovn-northd.at
>> @@ -16748,3 +16748,40 @@ check grep -q "Bad configuration: The peer of the 
>> switch port 'ls-lrp1' (LRP pee
>>   
>>   AT_CLEANUP
>>   ])
>> +
>> +OVN_FOR_EACH_NORTHD_NO_HV([
>> +AT_SETUP([sb_cfg propagation])
>> +ovn_start
>> +
>> +#
>> +# Test engine call does not happen between sb db transaction
>> +# commitment and sb_cfg write to the nb db (if have to)
>> +#
>> +> northd/ovn-northd.log
>> +check as northd ovn-appctl -t ovn-northd vlog/set jsonrpc:dbg
>> +check as northd ovn-appctl -t ovn-northd vlog/set inc_proc_eng:dbg
>> +# Any change that invoke engine processing, for example add logical switch.
>> +# nb_cfg bump must be present in order to get feedback as sb_cfg.
>> +check ovn-nbctl --wait=sb ls-add sw1
>> +#
>> +# Get following messages from log file:
>> +# 'unix... transact ... Southbound' --> Indicates the pack of updates has 
>> been
>> +#                                       committed to the sb db.
>> +# 'unix... transact ... sb_cfg'  --> Indicates the sb_cfg has been 
>> committed to
>> +#                                    the sb db.
>> +# 'Initializing new run' --> Indicates the engine has been called.
>> +#
>> +# Then, take first letter of messages, here 'u' or 'I' and form them into 
>> one
>> +# string.
>> +#
>> +# Finally, a 'good' string should have two 'u' conjuncted with several 'I'
>> +# both as prefix and suffix, while 'bad' string will have 'I' between two 
>> 'u'.
>> +#
>> +call_seq=`grep -E \
>> + "(transact.*sb_cfg)|(transact.*Southbound)|(Initializing new run)" \
>> + northd/ovn-northd.log | cut -d "|" -f 5- | cut -b 1 | tr -d '\n'`
>> +
>> +AT_CHECK([echo $call_seq | grep -qE "^I*uuI*$"], [0])
>> +
>> +AT_CLEANUP
>> +])


_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn, v3] northd: Remove delay in sb_cfg propagation.

Reply via email to