On Fri, May 30, 2025 at 10:00:58AM +0000, Smirnov Aleksandr (K2 Cloud) wrote:
> On 5/27/25 5:54 PM, Felix Huettner wrote:
> 
> Hi Felix,
> 
> I am sorry, I misunderstood your plan with countdown.
> If you want to trigger it after first update is utilized all is correct.
> So, I cancel my review complain and will use in my code suggested function
> instead of direct compare of ovnsb_expected_cond_seqno vs ovnsb_cond_seqno

Hi Alexander,

no worries we sill found a issue with this discussion, thanks a lot.

Felix

> 
> Thank you,
> 
> Alexander
> 
> > On Fri, May 23, 2025 at 10:02:59AM +0000, Smirnov Aleksandr (K2 Cloud) 
> > wrote:
> >> Hi Felix,
> >>
> >> Answering your propose, yes, I can reuse this solution instead of direct
> >> check seqno vs expected seqno.
> >> However, I have inspected what you are doing in your patch and want to
> >> point out to one mistake (at least by my opinion).
> > Hi Alexander,
> >
> > thanks for the review.
> >
> >> When I tried to play with seqno/expected seqno numbers I realized that
> >> they only set to meaningful values BELOW the
> >> line
> >> ovnsb_expected_cond_seqno =
> >>                                   update_sb_monitors(
> >>                                       ovnsb_idl_loop.idl, chassis,
> >> &runtime_data->local_lports,
> >> &runtime_data->lbinding_data.bindings,
> >> &runtime_data->local_datapaths,
> >>                                       sb_monitor_all);
> > So per default ovnsb_expected_cond_seqno is initialized to UINT_MAX.
> > Then i see two locations where it might be set to an actual value:
> >
> > 1. At the start of the main loop in update_sb_db, but only if
> >     monitor_all is set
> > 2. The point you mentioned that happens after the engine computed
> >     completely.
> >
> >> So, if you want to make meaningful comparison of seqno(s) to answer the
> >> question 'are there upcoming updates' you have to that check after
> >> update_sb_monitors() call and before next iteration of main loop.
> > Or at the start of the main loop if we are sure that 
> > ovnsb_expected_cond_seqno
> > is valid. I did this in my patch with a
> > `ovnsb_expected_cond_seqno != UINT_MAX`.
> >
> >> Otherwise, (again by my opinion) your code will always catch guard
> >> checks (time,  max number, etc)
> > I am not sure if i get that point correctly. I want to share an example
> > case based on my understanding of what you mean. You can tell me if i got
> > it right:
> >
> > We assume a case where monitor-all=false.
> > When starting ovn-controller we will connect to the local ovsdb and
> > southbound. We then run the engine the first time and set the initial
> > monitoring conditions.
> > At this point ovnsb_expected_cond_seqno is valid and larger than
> > ovnsb_cond_seqno.
> >
> > Before we now get the initial monitor results from southbound something
> > happens that prevents the engine from running (e.g. northd_version_match
> > is false).
> > At some point the initial monitor updates arrive which will cause
> > ovnsb_expected_cond_seqno == ovnsb_cond_seqno. At least in my patch any
> > further main loop iteration would now cause 
> > daemon_started_recently_countdown
> > to be triggered, even though the engine never had the chance to update
> > and define a new ovnsb_expected_cond_seqno.
> >
> > If the engine can then start running again daemon_started_recently()
> > might return true, even though we did not yet allow for up to 20 times
> > of initial condition changes.
> >
> >> This is still dark area for me, but at least I need to share my
> >> understanding of seqno life cycle.
> > Same for me and i think you found a valid point.
> >
> > However it would be great if you could confirm the above understading.
> > If you meant a different point then maybe i need to fix something else
> > as well.
> >
> > I will try to fix the above and send a new version of the patchset if
> > you confirm that this was the point you meant.
> >
> > So thank you very much for finding that.
> >
> > Thanks,
> > Felix
> >
> >> Thank you,
> >>
> >> Alexander
> >>
> >> On 5/7/25 5:26 PM, Felix Huettner wrote:
> >>> On Tue, Mar 18, 2025 at 01:08:12PM +0300, Aleksandr Smirnov wrote:
> >>>> Delay initial flow upload to OVS until all monitored updates received.
> >>>> This is a kind of replacement of the wait-before-clear parameter.
> >>>> Instead of waiting a certain amount of time we will wait
> >>>> until the final monitored update comes from SB DB.
> >>>> An event we watch in the controller's main loop is current condition
> >>>> sequence number compared vs expected condition sequence number.
> >>>> If they are not equal we still have to receive updates in response
> >>>> to recent monitor condition change request. This check makes sense
> >>>> only in code that lies after update_sb_monitors() function call.
> >>>>
> >>>> Note, this update will only work if wait-before-clear == 0,
> >>>> i.e. you can still rely on wait-before-clear behavior.
> >>>>
> >>>> Signed-off-by: Aleksandr Smirnov <alekssmirnov@k2.cloud>
> >>>> ---
> >>>>    controller/ofctrl.c         | 29 ++++++++++++++++-
> >>>>    controller/ofctrl.h         |  3 +-
> >>>>    controller/ovn-controller.c |  4 ++-
> >>>>    tests/ovn-controller.at     | 62 +++++++++++++++++++++++++++++++++++++
> >>>>    4 files changed, 95 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >>>> index 4a3d35b97..304b9bbc8 100644
> >>>> --- a/controller/ofctrl.c
> >>>> +++ b/controller/ofctrl.c
> >>>> @@ -349,6 +349,9 @@ static uint64_t cur_cfg;
> >>>>    /* Current state. */
> >>>>    static enum ofctrl_state state;
> >>>>    
> >>>> +/* Release wait before clear stage. */
> >>>> +static bool wait_before_clear_proceed = false;
> >>>> +
> >>>>    /* The time (ms) to stay in the state S_WAIT_BEFORE_CLEAR. Read from
> >>>>     * external_ids: ovn-ofctrl-wait-before-clear. */
> >>>>    static unsigned int wait_before_clear_time = 0;
> >>>> @@ -446,6 +449,7 @@ run_S_NEW(void)
> >>>>        struct ofpbuf *buf = ofpraw_alloc(OFPRAW_NXT_TLV_TABLE_REQUEST,
> >>>>                                          rconn_get_version(swconn), 0);
> >>>>        xid = queue_msg(buf);
> >>>> +    wait_before_clear_proceed = false;
> >>>>        state = S_TLV_TABLE_REQUESTED;
> >>>>    }
> >>>>    
> >>>> @@ -638,6 +642,14 @@ error:
> >>>>    static void
> >>>>    run_S_WAIT_BEFORE_CLEAR(void)
> >>>>    {
> >>>> +    if (wait_before_clear_time == 0) {
> >>>> +        if (wait_before_clear_proceed) {
> >>>> +            state = S_CLEAR_FLOWS;
> >>>> +        }
> >>>> +
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>        if (!wait_before_clear_time ||
> >>>>            (wait_before_clear_expire &&
> >>>>             time_msec() >= wait_before_clear_expire)) {
> >>>> @@ -2695,11 +2707,26 @@ ofctrl_put(struct ovn_desired_flow_table 
> >>>> *lflow_table,
> >>>>               uint64_t req_cfg,
> >>>>               bool lflows_changed,
> >>>>               bool pflows_changed,
> >>>> -           struct tracked_acl_ids *tracked_acl_ids)
> >>>> +           struct tracked_acl_ids *tracked_acl_ids,
> >>>> +           bool monitor_cond_complete)
> >>>>    {
> >>>>        static bool skipped_last_time = false;
> >>>>        static uint64_t old_req_cfg = 0;
> >>>>        bool need_put = false;
> >>>> +
> >>>> +    if (state == S_WAIT_BEFORE_CLEAR) {
> >>>> +        /* If no more monitored condition changes expected
> >>>> +           Release wait before clear stage and skip
> >>>> +           over poll wait. */
> >>>> +        if (monitor_cond_complete) {
> >>>> +            wait_before_clear_proceed = true;
> >>>> +            poll_immediate_wake();
> >>>> +        }
> >>>> +
> >>>> +        skipped_last_time = true;
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>>        if (lflows_changed || pflows_changed || skipped_last_time ||
> >>>>            ofctrl_initial_clear) {
> >>>>            need_put = true;
> >>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> >>>> index d1ee69cb0..76e2fbece 100644
> >>>> --- a/controller/ofctrl.h
> >>>> +++ b/controller/ofctrl.h
> >>>> @@ -68,7 +68,8 @@ void ofctrl_put(struct ovn_desired_flow_table 
> >>>> *lflow_table,
> >>>>                    uint64_t nb_cfg,
> >>>>                    bool lflow_changed,
> >>>>                    bool pflow_changed,
> >>>> -                struct tracked_acl_ids *tracked_acl_ids);
> >>>> +                struct tracked_acl_ids *tracked_acl_ids,
> >>>> +                bool monitor_cond_complete);
> >>>>    bool ofctrl_has_backlog(void);
> >>>>    void ofctrl_wait(void);
> >>>>    void ofctrl_destroy(void);
> >>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>>> index bc8acddf1..2623fc758 100644
> >>>> --- a/controller/ovn-controller.c
> >>>> +++ b/controller/ovn-controller.c
> >>>> @@ -6384,7 +6384,9 @@ main(int argc, char *argv[])
> >>>>                                       ofctrl_seqno_get_req_cfg(),
> >>>>                                       
> >>>> engine_node_changed(&en_lflow_output),
> >>>>                                       
> >>>> engine_node_changed(&en_pflow_output),
> >>>> -                                   tracked_acl_ids);
> >>>> +                                   tracked_acl_ids,
> >>>> +                                   ovnsb_cond_seqno
> >>>> +                                   == ovnsb_expected_cond_seqno);
> >>> Hi Aleksandr,
> >>>
> >>> i just wanted to let you know that we could probably combine this with
> >>> https://patchwork.ozlabs.org/project/ovn/patch/67010dca0a8dcc390e873349d662c9163e01050f.1746623091.git.felix.huettner@stackit.cloud/
> >>> then you could use "!daemon_started_recently()" here.
> >>>
> >>> But that is just for information.
> >>>
> >>> Thanks a lot,
> >>> Felix
> >>>
> >>>>                            stopwatch_stop(OFCTRL_PUT_STOPWATCH_NAME, 
> >>>> time_msec());
> >>>>                        }
> >>>>                        stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME,
> >>>> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> >>>> index efb0a1741..812616711 100644
> >>>> --- a/tests/ovn-controller.at
> >>>> +++ b/tests/ovn-controller.at
> >>>> @@ -2463,6 +2463,68 @@ AT_CHECK([ovs-ofctl dump-flows br-int | grep 
> >>>> group -c], [0], [3
> >>>>    OVN_CLEANUP([hv1])
> >>>>    AT_CLEANUP
> >>>>    
> >>>> +OVN_FOR_EACH_NORTHD([
> >>>> +AT_SETUP([ovn-controller - ofctrl delay until all monitored updates 
> >>>> come])
> >>>> +
> >>>> +# Prepare testing configuration
> >>>> +ovn_start
> >>>> +
> >>>> +net_add n1
> >>>> +sim_add hv1
> >>>> +as hv1
> >>>> +check ovs-vsctl add-br br-phys
> >>>> +ovn_attach n1 br-phys 192.168.0.1
> >>>> +check ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >>>> +    set interface hv1-vif1 external-ids:iface-id=ls1-lp1
> >>>> +
> >>>> +check ovn-nbctl ls-add ls1
> >>>> +
> >>>> +check ovn-nbctl lsp-add ls1 ls1-lp1 \
> >>>> +-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01 10.1.2.3"
> >>>> +
> >>>> +check ovn-nbctl lb-add lb1 1.1.1.1 10.1.2.3 \
> >>>> +-- ls-lb-add ls1 lb1
> >>>> +
> >>>> +check ovn-nbctl lb-add lb2 2.2.2.2 10.1.2.4 \
> >>>> +-- ls-lb-add ls1 lb2
> >>>> +
> >>>> +check ovn-nbctl --wait=hv sync
> >>>> +
> >>>> +# Stop ovn-controller
> >>>> +OVS_APP_EXIT_AND_WAIT([ovn-controller])
> >>>> +
> >>>> +# Sine we test initial flow upload controller is restarted.
> >>>> +# Clear log file and start controller.
> >>>> +rm -f hv1/ovn-controller.log
> >>>> +start_daemon ovn-controller -vfile:jsonrpc:dbg -vfile:ofctrl:dbg
> >>>> +
> >>>> +# Monitor log file until flow finally uploaded to OVS
> >>>> +OVS_WAIT_UNTIL([grep -q 'Setting lport.*in OVS' hv1/ovn-controller.log])
> >>>> +
> >>>> +# Analyse log file, select records about:
> >>>> +# 1. monitor_cond changes made for SB DB (message class is 'jsonrpc')
> >>>> +# 2. 'clearing all flows' message which is issued after 'wait before
> >>>> +#    clear' stage released (message class is 'ofctrl')
> >>>> +#
> >>>> +# We expect here that all monitoring condition changes should be made 
> >>>> before
> >>>> +# OVS flow cleared / uploaded.
> >>>> +# For now all monitoring updates comes in three iterations: initial,
> >>>> +# direct dps, indirect dps that corresponds to
> >>>> +# three messages of type 1 followed by one message of type 2
> >>>> +#
> >>>> +# For monitor-all=true one message of type 1 followed by one message of 
> >>>> type 2
> >>>> +#
> >>>> +# Then we cut off message class and take first letter
> >>>> +# (j for jsonrpc and o for ofctrl)
> >>>> +#
> >>>> +call_seq=`grep -E \
> >>>> + "(clearing all flows)|(monitor_cond.*South)" \
> >>>> + hv1/ovn-controller.log | cut -d "|" -f 3- | cut -b 1 | tr -d '\n'`
> >>>> +AT_CHECK([echo $call_seq | grep -qE "^j+o$"], [0])
> >>>> +
> >>>> +OVN_CLEANUP([hv1])
> >>>> +AT_CLEANUP
> >>>> +])
> >>>>    
> >>>>    AT_SETUP([ovn-controller - check ovn-chassis-mac-mappings])
> >>>>    
> >>>> -- 
> >>>> 2.47.0
> >>>>
> >>>> _______________________________________________
> >>>> dev mailing list
> >>>> d...@openvswitch.org
> >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> 
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to