On 11/23/22 09:10, Ales Musil wrote:
> On Tue, Nov 22, 2022 at 9:24 PM Xavier Simonart <[email protected]> wrote:
>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2144084
>>
>> Signed-off-by: Xavier Simonart <[email protected]>
>> ---
>> lib/features.c | 16 ++++++++++------
>> tests/ovn.at | 18 ++++++++++++++++++
>> 2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/lib/features.c b/lib/features.c
>> index f15ec42bb..571e24ded 100644
>> --- a/lib/features.c
>> +++ b/lib/features.c
>> @@ -26,6 +26,7 @@
>> #include "openvswitch/rconn.h"
>> #include "openvswitch/ofp-msgs.h"
>> #include "openvswitch/ofp-meter.h"
>> +#include "openvswitch/ofp-util.h"
>> #include "ovn/features.h"
>>
>> VLOG_DEFINE_THIS_MODULE(features);
>> @@ -90,6 +91,8 @@ ovs_feature_rconn_setup(const char *br_name)
>> static bool
>> ovs_feature_get_openflow_cap(const char *br_name)
>> {
>> + struct ofpbuf *msg;
>> +
>> if (!br_name) {
>> return false;
>> }
>> @@ -102,15 +105,14 @@ ovs_feature_get_openflow_cap(const char *br_name)
>> }
>>
>> /* send new requests just after reconnect. */
>> - if (conn_seq_no == rconn_get_connection_seqno(swconn)) {
>> - return false;
>> + if (conn_seq_no != rconn_get_connection_seqno(swconn)) {
>> + /* dump datapath meter capabilities. */
>> + msg = ofpraw_alloc(OFPRAW_OFPST13_METER_FEATURES_REQUEST,
>> + rconn_get_version(swconn), 0);
>> + rconn_send(swconn, msg, NULL);
>> }
>>
>> bool ret = false;
>> - /* dump datapath meter capabilities. */
>> - struct ofpbuf *msg =
>> ofpraw_alloc(OFPRAW_OFPST13_METER_FEATURES_REQUEST,
>> - rconn_get_version(swconn), 0);
>> - rconn_send(swconn, msg, NULL);
>> for (int i = 0; i < 50; i++) {
>> msg = rconn_recv(swconn);
>> if (!msg) {
>> @@ -137,6 +139,8 @@ ovs_feature_get_openflow_cap(const char *br_name)
>> }
>> }
>> conn_seq_no = rconn_get_connection_seqno(swconn);
>> + } else if (type == OFPTYPE_ECHO_REQUEST) {
>> + rconn_send(swconn, ofputil_encode_echo_reply(oh), NULL);
>> }
>> ofpbuf_delete(msg);
>> }
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index cf1ea991d..3a1c231f5 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -33069,3 +33069,21 @@ check ovn-nbctl --wait=hv sync
>> OVN_CLEANUP([hv1])
>> AT_CLEANUP
>> ])
>> +
>> +OVN_FOR_EACH_NORTHD([
>> +AT_SETUP([feature inactivity probe])
>> +ovn_start
>> +net_add n1
>> +
>> +sim_add hv1
>> +as hv1
>> +check ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +
>> +# Wait for more than 2x 60 seconds
>> +sleep 125
>> +
>> +AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
>> +OVN_CLEANUP([hv1])
>> +AT_CLEANUP
>> +])
>> --
>> 2.31.1
>>
>> _______________________________________________
>> dev mailing list
>> [email protected]
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
> Hi Xavier,
>
Hi Xavier, Ales,
> the change itself looks good, however I'm very concerned about the test.
> I wonder if we can roll with confirmation that this just works, without
> actually having
> test that sleeps for more than 2 minutes.
>
> Dumitru, Mark, Han any thoughts?
>
I agree, this is too long. I played a bit with it and this is what I
came up with:
diff --git a/tests/ovn.at b/tests/ovn.at
index fc5cf257ef..6d21446ddd 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -33253,8 +33253,21 @@ as hv1
check ovs-vsctl add-br br-phys
ovn_attach n1 br-phys 192.168.0.1
-# Wait for more than 2x 60 seconds
-sleep 125
+dnl Ensure that there are at least 3 openflow connections.
+OVS_WAIT_UNTIL([test "$(grep -c 'negotiated OpenFlow version'
hv1/ovs-vswitchd.log)" -eq "3"])
+
+dnl "Wait" 3 times 60 seconds and ensure ovn-controller writes to the
+dnl openflow connections in the meantime. This should allow ovs-vswitchd
+dnl to probe the openflow connections at least twice.
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
+
+as hv1 ovs-appctl time/warp 60000
+check ovn-nbctl --wait=hv sync
AT_CHECK([test -z "`grep disconnecting hv1/ovs-vswitchd.log`"])
OVN_CLEANUP([hv1])
---
It's still not 100% bullet proof but I think it should be ok.
However, this brings me to the next issue: if the features
openflow connection goes down we don't reconnect from
ovn-controller.
I think that's because on the vswitchd side, closing the punix
stream doesn't do anything. And on the ovn-controller side we
have no chance of detecting the connection drop either because
probes are disabled and because we only request meter features
after a reconnect.
So I made this change to explicit request probes on the connection
(like we do for ofctrl):
diff --git a/lib/features.c b/lib/features.c
index a1e2e6dc1c..898a0cade5 100644
--- a/lib/features.c
+++ b/lib/features.c
@@ -31,6 +31,8 @@
VLOG_DEFINE_THIS_MODULE(features);
+#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
+
struct ovs_feature {
enum ovs_feature_value value;
const char *name;
@@ -75,7 +77,8 @@ static void
ovs_feature_rconn_setup(const char *br_name)
{
if (!swconn) {
- swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
+ swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
+ DSCP_DEFAULT, 1 << OFP15_VERSION);
}
if (!rconn_is_connected(swconn)) {
@@ -86,6 +89,7 @@ ovs_feature_rconn_setup(const char *br_name)
}
free(target);
}
+ rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
}
static bool
---
What do you think?
Thanks,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev