The test incorrectly assumed that "ovn-nbctl --wait=hv sync" will always
send an OpenFlow barrier to ovs-vswitchd.  That doesn't happen unless
there are other OpenFlow (rule or group) changes that need to be
programmed in the datapath.

An initial attempt to fix this [0] uncovered the fact that there's no
easy way to cover all possible scenarios [1].  It seems that the effort
to fix the test is not justified for the case it tests for (that
inactivity probes are handled by the features module - that is quite
obviously handled in the code).  It's therefore more reasonable to just
skip the test (to avoid noise in CI).

Spotted in CI, mostly on oversubscribed systems.  A log sample that
shows that ovn-controller didn't generate any barrier request for the
nbctl sync request:

  2023-11-15T12:12:22.937Z|00084|vconn|DBG|unix#4: received: 
OFPT_BARRIER_REQUEST (OF1.5) (xid=0x13):
  2023-11-15T12:12:22.937Z|00085|vconn|DBG|unix#4: sent (Success): 
OFPT_BARRIER_REPLY (OF1.5) (xid=0x13):
  ...
  2023-11-15T12:12:23.032Z|00090|unixctl|DBG|received request 
time/warp["60000"], id=0
  2023-11-15T12:12:23.032Z|00091|unixctl|DBG|replying with success, id=0: 
"warped"
  2023-11-15T12:12:23.042Z|00094|vconn|DBG|unix#3: sent (Success): 
OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00095|vconn|DBG|unix#4: sent (Success): 
OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00097|vconn|DBG|unix#5: sent (Success): 
OFPT_ECHO_REQUEST (OF1.5) (xid=0x0): 0 bytes of payload
  2023-11-15T12:12:23.042Z|00098|unixctl|DBG|received request 
time/warp["60000"], id=0
  2023-11-15T12:12:23.042Z|00099|unixctl|DBG|replying with success, id=0: 
"warped"
  2023-11-15T12:12:23.052Z|00100|rconn|ERR|br-int<->unix#3: no response to 
inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00101|rconn|ERR|br-int<->unix#4: no response to 
inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00102|rconn|ERR|br-int<->unix#5: no response to 
inactivity probe after 60 seconds, disconnecting
  2023-11-15T12:12:23.052Z|00103|unixctl|DBG|received request 
time/warp["60000"], id=0

[0] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409473.html
[1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-November/409530.html

Fixes: ff00142808dc ("controller: Fixed ovs/ovn(features) connection lost when 
running more than 120 seconds")
Signed-off-by: Dumitru Ceara <[email protected]>
---
V2:
- removed the test instead of trying to fix it (after discussion with
Xavier).
---
 tests/ovn.at | 31 -------------------------------
 1 file changed, 31 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index b8c61f87fb..198387d93e 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -35342,37 +35342,6 @@ OVN_CLEANUP([hv1],[hv2])
 AT_CLEANUP
 ])
 
-OVN_FOR_EACH_NORTHD([
-AT_SETUP([feature inactivity probe])
-ovn_start
-net_add n1
-
-sim_add hv1
-as hv1
-check ovs-vsctl add-br br-phys
-ovn_attach n1 br-phys 192.168.0.1
-
-dnl Ensure that there are 4 openflow connections.
-OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version' 
hv1/ovs-vswitchd.log)" -eq "4"])
-
-dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
-dnl openflow connections in the meantime.  This should allow ovs-vswitchd
-dnl to probe the openflow connections at least twice.
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-as hv1 ovs-appctl time/warp 60000
-check ovn-nbctl --wait=hv sync
-
-AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
-OVN_CLEANUP([hv1])
-AT_CLEANUP
-])
-
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([Logical flows with Chassis_Template_Var references])
 AT_KEYWORDS([templates])
-- 
2.39.3

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to