On Mon, Mar 18, 2024 at 11:27 AM Mark Michelson <[email protected]> wrote:
>
> Hi Han,
>
> I have a comment below
>
> On 3/5/24 01:27, Han Zhou wrote:
> > The ovn-ofctrl-wait-before-clear setting is designed to minimize
> > downtime during the initial start-up of the ovn-controller. For this
> > purpose, the ovn-controller should wait only once upon entering the
> > S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
> > to the OVS, such as those occurring during an OVS restart/upgrade,
> > should not trigger this wait. However, the current implemention always
> > waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
> > can inadvertently delay flow installations during OVS restart/upgrade,
> > potentially causing more harm than good. (The extent of the impact
> > varies based on the method used to restart OVS, including whether flow
> > save/restore tools and the flow-restore-wait feature are employed.)
> >
> > This patch avoids the unnecessary wait after the initial one.
> >
> > Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to
reduce down time during upgrade.")
> > Signed-off-by: Han Zhou <[email protected]>
> > ---
> >   controller/ofctrl.c     | 1 -
> >   tests/ovn-controller.at | 9 +++++++--
> >   2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index f14cd79a8dbb..0d72ecbaa167 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
> >       if (!wait_before_clear_time ||
> >           (wait_before_clear_expire &&
> >            time_msec() >= wait_before_clear_expire)) {
> > -        wait_before_clear_expire = 0;
> >           state = S_CLEAR_FLOWS;
> >           return;
> >       }
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 37f1ded1bd26..b65e11722cbb 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller
coverage/read-counter lflow_run)
> >   AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
> >   ])
> >
> > -# Restart OVS this time, and wait until flows are reinstalled
> > +# Restart OVS this time. Flows should be reinstalled without waiting.
> >   OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> >   start_daemon ovs-vswitchd --enable-dummy=system -vvconn
-vofproto_dpif -vunixctl
> > -OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep
-vF 2.2.2.2])
> > +
> > +# Sleep for 3s, which is long enough for the flows to be installed, but
> > +# shorter than the wait-before-clear (5s), to make sure the flows are
installed
> > +# without waiting.
> > +sleep 3
>
> This change makes me nervous. The comment makes sense. However, I worry
> that on slow or loaded systems, relying on the flows to be written
> within 3 seconds may not always work out.
>
> If there were a way to peek into the ofctrl state machine and check that
> we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work
> better. But that is something that is hard to justify exposing.
>
> I came up with this possible idea:
>   * set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60
> seconds)
>   * Restart ovs
>   * Use OVS_WAIT_UNTIL(...), just like the test used to do.
>
> This way, we get plenty of opportunities to ensure the flows were
> written. In most cases, this probably will actually be quicker than the
> 3 second sleep added in this patch. However, if it takes longer than 3
> seconds, then the test can still pass. If the flows get written
> properly, then we know ovn-controller did not wait for the
> wait-before-clear time.
>

Hi Mark, thanks for your comment! I agree with you that sleep for 3s is not
very reliable. Your suggestion looks better, but I think there is still a
potential problem. The approach assumes that ovn-controller will always
apply the new settings of ofctrl-wait-before-clear. It is true for the
current implementation, but there is nothing preventing us from removing
this logic, so that ovn-controller ignores any ofctrl-wait-before-clear
setting changes after startup. In fact, it is more reasonable to ignore it.
So, let's assume ovn-controller doesn't take care of the changes of the
settings. In this test case, the setting is initially 5s when
ovn-controller starts, and later changing it to 60s doesn't take effect and
ovn-controller still uses the 5s value. And then let's assume
ovn-controller still waits for the 5s after OVS is restarted, and the test
case will pass because OVS_WAIT_UNTIL can wait for much longer than 5s. So
the test will be incorrect.

To avoid this potential problem, I tried another approach. I figured that
simply adding a "ovn-nbctl --wait=hv sync" is sufficient to ensure
ovn-controller had the chance to install flows, without the need to sleep.
So please see if the below diff looks good:

---------------------------------------------------------------------------------------------------------------------------
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 0da6570c5882..3202f0beff46 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2328,10 +2328,10 @@ AT_CHECK_UNQUOTED([echo $lflow_run_1], [0],
[$lflow_run_2
 OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
 start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif
-vunixctl

-# Sleep for 3s, which is long enough for the flows to be installed, but
-# shorter than the wait-before-clear (5s), to make sure the flows are
installed
-# without waiting.
-sleep 3
+# Sync to make sure ovn-controller is given enough time to install the
flows.
+check ovn-nbctl --wait=hv sync
+
+# Flow should be installed without any extra waiting.
 AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])

 check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
---------------------------------------------------------------------------------------------------------------------------

Thanks,
Han

> > +AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF
2.2.2.2], [0], [ignore])
> >
> >   check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
> >   -- ls-lb-add ls1 lb3
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to