On 11/23/22 09:10, Ales Musil wrote:
> On Tue, Nov 22, 2022 at 9:24 PM Xavier Simonart <[email protected]> wrote:
> 
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2144084
>>
>> Signed-off-by: Xavier Simonart <[email protected]>
>> ---
>>  lib/features.c | 16 ++++++++++------
>>  tests/ovn.at   | 18 ++++++++++++++++++
>>  2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/features.c b/lib/features.c
>> index f15ec42bb..571e24ded 100644
>> --- a/lib/features.c
>> +++ b/lib/features.c
>> @@ -26,6 +26,7 @@
>>  #include "openvswitch/rconn.h"
>>  #include "openvswitch/ofp-msgs.h"
>>  #include "openvswitch/ofp-meter.h"
>> +#include "openvswitch/ofp-util.h"
>>  #include "ovn/features.h"
>>
>>  VLOG_DEFINE_THIS_MODULE(features);
>> @@ -90,6 +91,8 @@ ovs_feature_rconn_setup(const char *br_name)
>>  static bool
>>  ovs_feature_get_openflow_cap(const char *br_name)
>>  {
>> +    struct ofpbuf *msg;
>> +
>>      if (!br_name) {
>>          return false;
>>      }
>> @@ -102,15 +105,14 @@ ovs_feature_get_openflow_cap(const char *br_name)
>>      }
>>
>>      /* send new requests just after reconnect. */
>> -    if (conn_seq_no == rconn_get_connection_seqno(swconn)) {
>> -        return false;
>> +    if (conn_seq_no != rconn_get_connection_seqno(swconn)) {
>> +        /* dump datapath meter capabilities. */
>> +        msg = ofpraw_alloc(OFPRAW_OFPST13_METER_FEATURES_REQUEST,
>> +                           rconn_get_version(swconn), 0);
>> +        rconn_send(swconn, msg, NULL);
>>      }
>>
>>      bool ret = false;
>> -    /* dump datapath meter capabilities. */
>> -    struct ofpbuf *msg =
>> ofpraw_alloc(OFPRAW_OFPST13_METER_FEATURES_REQUEST,
>> -                                      rconn_get_version(swconn), 0);
>> -    rconn_send(swconn, msg, NULL);
>>      for (int i = 0; i < 50; i++) {
>>          msg = rconn_recv(swconn);
>>          if (!msg) {
>> @@ -137,6 +139,8 @@ ovs_feature_get_openflow_cap(const char *br_name)
>>                  }
>>              }
>>              conn_seq_no = rconn_get_connection_seqno(swconn);
>> +        } else if (type == OFPTYPE_ECHO_REQUEST) {
>> +            rconn_send(swconn, ofputil_encode_echo_reply(oh), NULL);
>>          }
>>          ofpbuf_delete(msg);
>>      }
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index cf1ea991d..3a1c231f5 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -33069,3 +33069,21 @@ check ovn-nbctl --wait=hv sync
>>  OVN_CLEANUP([hv1])
>>  AT_CLEANUP
>>  ])
>> +
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([feature inactivity probe])
>> +ovn_start
>> +net_add n1
>> +
>> +sim_add hv1
>> +as hv1
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +
>> +# Wait for more than 2x 60 seconds
>> +sleep 125
>> +
>> +AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
>> +OVN_CLEANUP([hv1])
>> +AT_CLEANUP
>> +])
>> --
>> 2.31.1
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Hi Xavier,
> 

Hi Xavier, Ales,

> the change itself looks good, however I'm very concerned about the test.
> I wonder if we can roll with confirmation that this just works, without
> actually having
> test that sleeps for more than 2 minutes.
> 
> Dumitru, Mark, Han any thoughts?
> 

I agree, this is too long.  I played a bit with it and this is what I
came up with:

diff --git a/tests/ovn.at b/tests/ovn.at
index fc5cf257ef..6d21446ddd 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -33253,8 +33253,21 @@ as hv1
 check ovs-vsctl add-br br-phys
 ovn_attach n1 br-phys 192.168.0.1
 
-# Wait for more than 2x 60 seconds
-sleep 125
+dnl Ensure that there are at least 3 openflow connections.
+OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version' 
hv1/ovs-vswitchd.log)" -eq "3"])
+
+dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
+dnl openflow connections in the meantime.  This should allow ovs-vswitchd
+dnl to probe the openflow connections at least twice.
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
 
 AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
 OVN_CLEANUP([hv1])
---

It's still not 100% bullet proof but I think it should be ok.

However, this brings me to the next issue: if the features
openflow connection goes down we don't reconnect from
ovn-controller.

I think that's because on the vswitchd side, closing the punix
stream doesn't do anything.  And on the ovn-controller side we
have no chance of detecting the connection drop either because
probes are disabled and because we only request meter features
after a reconnect.

So I made this change to explicit request probes on the connection
(like we do for ofctrl):

diff --git a/lib/features.c b/lib/features.c
index a1e2e6dc1c..898a0cade5 100644
--- a/lib/features.c
+++ b/lib/features.c
@@ -31,6 +31,8 @@
 
 VLOG_DEFINE_THIS_MODULE(features);
 
+#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
+
 struct ovs_feature {
     enum ovs_feature_value value;
     const char *name;
@@ -75,7 +77,8 @@ static void
 ovs_feature_rconn_setup(const char *br_name)
 {
     if (!swconn) {
-        swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
+        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
+                              DSCP_DEFAULT, 1 << OFP15_VERSION);
     }
 
     if (!rconn_is_connected(swconn)) {
@@ -86,6 +89,7 @@ ovs_feature_rconn_setup(const char *br_name)
         }
         free(target);
     }
+    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
 }
 
 static bool
---

What do you think?

Thanks,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to