Hi Mark, Aleksandr, On 5/5/25 3:55 PM, Mark Michelson wrote: > On 5/5/25 05:19, Smirnov Aleksandr (K2 Cloud) wrote: >> Hello Mark, >> >> Could you please review Dumitru's comment on poll_immediate_wake() usage? >> I have no my own solid point of view on this. >> >> Thank you, >> >> Alexander >> >> >> On 3/31/25 10:40 PM, Dumitru Ceara wrote: >>> On 3/10/25 11:40 AM, Aleksandr Smirnov wrote: >>>> According to the ovn-architecture(7) sb_cfg counter should be >>>> propagated at the moment northd completed transaction of related >>>> changes >>>> to the southbound db. However, a processing engine call happened right >>>> between two events, a sb transaction commitment, and initiating >>>> the sb_cfg write. Normally, that processing engine call has nothing >>>> to do, because it has only update from sb db that fires back result >>>> of recent transaction from northd. But if, by some reason, engine >>>> change handler falls into full recompute there will be big delay >>>> before sb_cfg propagated to nb db, and in fact it really happened >>>> in old release, for example 22.09. >>>> >>>> The fix defers engine run for one iteration (of main loop) >>>> if we have sb_cfg ready to propagate. >>>> >>>> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud> >>>> --- >>> Hi Aleksandr, >>> >>> Thanks for the patch and sorry for the delay in reviewing. >>> >>>> v2: Address Mark's comments. >>>> v3: Address Vladislav's comments. >>>> --- >>>> northd/ovn-northd.c | 22 ++++++++++++++++++++-- >>>> tests/ovn-northd.at | 37 +++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 57 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>>> index 72eedbfdb..68a31d0b0 100644 >>>> --- a/northd/ovn-northd.c >>>> +++ b/northd/ovn-northd.c >>>> @@ -798,6 +798,15 @@ run_memory_trimmer(struct ovsdb_idl *ovnnb_idl, >>>> bool activity) >>>> memory_trimmer_wait(mt); >>>> } >>>> +static bool >>>> +sb_cfg_is_in_sync(struct ovsdb_idl *ovnnb_idl, >>>> + struct ovsdb_idl_loop *sb_loop) >>>> +{ >>>> + const struct nbrec_nb_global *nb = >>>> nbrec_nb_global_first(ovnnb_idl); >>>> + >>>> + return !(nb && sb_loop->cur_cfg && nb->sb_cfg != sb_loop- >>>> >cur_cfg); >>>> +} >>>> + >>>> int >>>> main(int argc, char *argv[]) >>>> { >>>> @@ -1064,8 +1073,17 @@ main(int argc, char *argv[]) >>>> if (ovnnb_txn && ovnsb_txn && >>>> inc_proc_northd_can_run(&eng_ctx)) { >>>> int64_t loop_start_time = time_wall_msec(); >>>> - activity = inc_proc_northd_run(ovnnb_txn, >>>> ovnsb_txn, >>>> - &eng_ctx); >>>> + >>>> + if (sb_cfg_is_in_sync(ovnnb_idl_loop.idl, >>>> + &ovnsb_idl_loop)) { >>>> + activity = inc_proc_northd_run(ovnnb_txn, >>>> + ovnsb_txn, >>>> + &eng_ctx); >>>> + } else { >>>> + poll_immediate_wake(); >>> I guess this call to poll_immediate_wake() was added after Mark's >>> comment on v1 [0]: >>> >>>> I think you should add a call to poll_immediate_wake() here. In >>>> theory, we should write the new sb_cfg value to the northbound >>>> database. Then that should trigger an update from the northbound >>>> database that will wake up northd and process what was skipped in this >>>> loop. However, since we also know that we definitely have data at hand >>>> that the incremental engine needs to process, we should not rely on >>>> the database transactions to work as expected before we process that >>>> data. Calling poll_immediate_wake() will ensure the loop wakes up >>>> immediately and performs an incremental engine run. >>> [0] https://mail.openvswitch.org/pipermail/ovs-dev/2025- >>> February/420812.html >>> >>> However, even without it we should wake up for the next incoming event, >>> whichever happens first; that is one of: >>> - northd's SB transaction to bump SB_Global.nb_cfg (done by >>> update_sequence_numbers() executed just below) completes >>> - a different NB/SB update wakes us >>> - a different scheduled poll wake timeout (e.g., mac binding refresh) >>> expires >>> >>> Calling poll_immediate_wake() seems a little excessive but I might be >>> wrong. Am I missing a case here? > > I don't think you're missing a case here. My idea for calling > poll_immediate_wake() was couched in paranoia and principle. > > On the paranoia side, you've listed the expected ways that the loop > should be woken up based on current code. However, if any bugs should be > introduced to those systems, or if there are database connection issues > [1], or there is some change to how the mechanisms work, and they end up > not waking the loop up, then our reliance on them may cause problems. > > On the principle side, knowing that we have data that requires > processing, it only makes sense that we should tell the poll loop to > wake up to process that data. This makes the code work in the face of > unknown bugs or unintended consequences of changes. > > If there is active harm in calling poll_immediate_wake() here, then that > is a different matter, and we should avoid it. However, I don't think > that's the case. >
You're right Mark, it's better to not rely on waking up as side effects of other events. I applied the patch to main, thanks! Best regards, Dumitru > Thanks, > Mark > > [1] I realize that connection issues have more severe consequences than > simply not waking up the polling loop, but for the purposes of this > discussion, that is the sort of thing that could cause an expected wake- > up not to occur. > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev