Hi Han,

I have a comment below

On 3/5/24 01:27, Han Zhou wrote:
The ovn-ofctrl-wait-before-clear setting is designed to minimize
downtime during the initial start-up of the ovn-controller. For this
purpose, the ovn-controller should wait only once upon entering the
S_WAIT_BEFORE_CLEAR state for the first time. Subsequent reconnections
to the OVS, such as those occurring during an OVS restart/upgrade,
should not trigger this wait. However, the current implemention always
waits for the configured time in the S_WAIT_BEFORE_CLEAR state, which
can inadvertently delay flow installations during OVS restart/upgrade,
potentially causing more harm than good. (The extent of the impact
varies based on the method used to restart OVS, including whether flow
save/restore tools and the flow-restore-wait feature are employed.)

This patch avoids the unnecessary wait after the initial one.

Fixes: 896adfd2d8b3 ("ofctrl: Support ovn-ofctrl-wait-before-clear to reduce down 
time during upgrade.")
Signed-off-by: Han Zhou <[email protected]>
---
  controller/ofctrl.c     | 1 -
  tests/ovn-controller.at | 9 +++++++--
  2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index f14cd79a8dbb..0d72ecbaa167 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -634,7 +634,6 @@ run_S_WAIT_BEFORE_CLEAR(void)
      if (!wait_before_clear_time ||
          (wait_before_clear_expire &&
           time_msec() >= wait_before_clear_expire)) {
-        wait_before_clear_expire = 0;
          state = S_CLEAR_FLOWS;
          return;
      }
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 37f1ded1bd26..b65e11722cbb 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -2284,10 +2284,15 @@ lflow_run_2=$(ovn-appctl -t ovn-controller 
coverage/read-counter lflow_run)
  AT_CHECK_UNQUOTED([echo $lflow_run_1], [0], [$lflow_run_2
  ])
-# Restart OVS this time, and wait until flows are reinstalled
+# Restart OVS this time. Flows should be reinstalled without waiting.
  OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
  start_daemon ovs-vswitchd --enable-dummy=system -vvconn -vofproto_dpif 
-vunixctl
-OVS_WAIT_UNTIL([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 
2.2.2.2])
+
+# Sleep for 3s, which is long enough for the flows to be installed, but
+# shorter than the wait-before-clear (5s), to make sure the flows are installed
+# without waiting.
+sleep 3

This change makes me nervous. The comment makes sense. However, I worry that on slow or loaded systems, relying on the flows to be written within 3 seconds may not always work out.

If there were a way to peek into the ofctrl state machine and check that we have moved off of S_WAIT_BEFORE_CLEAR by this point, that might work better. But that is something that is hard to justify exposing.

I came up with this possible idea:
* set wait-before-clear to a time longer than OVS_CTL_TIMEOUT (e.g. 60 seconds)
 * Restart ovs
 * Use OVS_WAIT_UNTIL(...), just like the test used to do.

This way, we get plenty of opportunities to ensure the flows were written. In most cases, this probably will actually be quicker than the 3 second sleep added in this patch. However, if it takes longer than 3 seconds, then the test can still pass. If the flows get written properly, then we know ovn-controller did not wait for the wait-before-clear time.

+AT_CHECK([ovs-ofctl dump-flows br-int | grep -F 10.1.2.4 | grep -vF 2.2.2.2], 
[0], [ignore])
check ovn-nbctl --wait=hv lb-add lb3 3.3.3.3 10.1.2.5 \
  -- ls-lb-add ls1 lb3

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to