Hi Mark, thanks for taking time to look on this!
Your point with a hung OVS is really an interesting case. From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. I hope this makes sense. Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. regards, Vladislav Odintsov > On 9 Jun 2023, at 18:28, Mark Michelson <[email protected]> wrote: > > Hi, > > Thanks for linking the email thread, since I had the exact same concern that > Numan did about detecting if ovs-vswitchd goes down. It sounds like because > of the nature of unix sockets, this disconnection is still detected by OVN > and failover can occur as expected. > > But what about if ovs-vswitchd is still running but in a "compromised" state. > For instance, what if a coding error in ovs-vswitchd results in it running an > infinite loop? In such a case, the unix connection would still be alive, but > OVN would not be able to communicate with the ovs-vswitchd process. > > Does this situation eventually lead to OVN disconnecting anyway after it > fills up the unix socket's buffer? If so, how long does that take? Or does it > lead to OVN slowly growing in memory usage while it waits forever to be able > to write to the socket? > > I think removing the hard-coded 5 second probe intervals from pinctrl.c and > features.c is a good idea. But I think we should leave the configurable probe > interval used by ofctrl.c for the case I described before. Since Han's patch > to OVS should disable probe intervals from the OVS side by default, we would > not trigger bad behavior simply because OVN has a lot of work to do during a > recompute. What do you think? >> On 6/8/23 10:15, Vladislav Odintsov wrote: >> OpenFlow connection is established over unix socket, which is a reliable >> connection and doesn't require additional probing. >> With this patch openflow probing in ovn-controller -> ovs-vswitchd >> direction is disabled for all three connections: >> - OF flows management connection, >> - OF features negotiation connection, >> - pinctrl connection. >> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >> non-needed anymore. >> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >> be done in the next patch. >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >> Signed-off-by: Vladislav Odintsov <[email protected]> >> --- >> NEWS | 5 +++++ >> controller/ofctrl.c | 14 ++------------ >> controller/ofctrl.h | 4 +--- >> controller/ovn-controller.8.xml | 14 -------------- >> controller/ovn-controller.c | 21 +-------------------- >> controller/pinctrl.c | 2 +- >> lib/features.c | 7 ++----- >> 7 files changed, 12 insertions(+), 55 deletions(-) >> diff --git a/NEWS b/NEWS >> index 645acea1f..bd63b187b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -1,5 +1,10 @@ >> Post v23.06.0 >> ------------- >> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >> + OF connection is established over unix socket, which is a reliable >> + connection method and doesn't require additional probing. >> + external_ids:ovn-openflow-probe-interval configuration option for >> + ovn-controller is no longer matter. >> OVN v23.06.0 - 01 Jun 2023 >> -------------------------- >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >> index 64a444ff6..788634494 100644 >> --- a/controller/ofctrl.c >> +++ b/controller/ofctrl.c >> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum >> ofptype); >> void >> ofctrl_init(struct ovn_extend_table *group_table, >> - struct ovn_extend_table *meter_table, >> - int inactivity_probe_interval) >> + struct ovn_extend_table *meter_table) >> { >> - swconn = rconn_create(inactivity_probe_interval, 0, >> - DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> tx_counter = rconn_packet_counter_create(); >> hmap_init(&installed_lflows); >> hmap_init(&installed_pflows); >> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >> return rconn_is_connected(swconn); >> } >> -void >> -ofctrl_set_probe_interval(int probe_interval) >> -{ >> - if (swconn) { >> - rconn_set_probe_interval(swconn, probe_interval); >> - } >> -} >> - >> void >> ofctrl_get_memory_usage(struct simap *usage) >> { >> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >> index 105f9370b..46bfccd85 100644 >> --- a/controller/ofctrl.h >> +++ b/controller/ofctrl.h >> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >> /* Interface for OVN main loop. */ >> void ofctrl_init(struct ovn_extend_table *group_table, >> - struct ovn_extend_table *meter_table, >> - int inactivity_probe_interval); >> + struct ovn_extend_table *meter_table); >> bool ofctrl_run(const struct ovsrec_bridge *br_int, >> const struct ovsrec_open_vswitch_table *, >> struct shash *pending_ct_zones); >> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct >> ovn_desired_flow_table *, >> bool ofctrl_is_connected(void); >> -void ofctrl_set_probe_interval(int probe_interval); >> void ofctrl_get_memory_usage(struct simap *usage); >> #endif /* controller/ofctrl.h */ >> diff --git a/controller/ovn-controller.8.xml >> b/controller/ovn-controller.8.xml >> index f61f43008..52eb137d3 100644 >> --- a/controller/ovn-controller.8.xml >> +++ b/controller/ovn-controller.8.xml >> @@ -147,20 +147,6 @@ >> </p> >> </dd> >> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >> - <dd> >> - <p> >> - The inactivity probe interval of the OpenFlow connection to the >> - OpenvSwitch integration bridge, in seconds. >> - If the value is zero, it disables the connection keepalive >> feature. >> - </p> >> - >> - <p> >> - If the value is nonzero, then it will be forced to a value of >> - at least 5s. >> - </p> >> - </dd> >> - >> <dt><code>external_ids:ovn-encap-type</code></dt> >> <dd> >> <p> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 3a81a13fb..732e7a690 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >> #define DEFAULT_BRIDGE_NAME "br-int" >> #define DEFAULT_DATAPATH "system" >> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table >> *ssl_table) >> } >> } >> -static int >> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >> -{ >> - const struct ovsrec_open_vswitch *cfg = >> ovsrec_open_vswitch_first(ovs_idl); >> - if (!cfg) { >> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >> - } >> - >> - const struct ovsrec_open_vswitch_table *ovs_table = >> - ovsrec_open_vswitch_table_get(ovs_idl); >> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >> - return get_chassis_external_id_value_int( >> - &cfg->external_ids, chassis_id, >> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >> -} >> - >> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >> * updates 'sbdb_idl' with that pointer. */ >> static void >> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >> engine_get_internal_data(&en_lb_data); >> ofctrl_init(&lflow_output_data->group_table, >> - &lflow_output_data->meter_table, >> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >> + &lflow_output_data->meter_table); >> ofctrl_seqno_init(); >> unixctl_command_register("group-table-list", "", 0, 0, >> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >> &reset_ovnsb_idl_min_index, >> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >> - >> ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >> struct ovsdb_idl_txn *ovnsb_idl_txn >> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >> index c396ad4c2..a86db3f32 100644 >> --- a/controller/pinctrl.c >> +++ b/controller/pinctrl.c >> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >> static long long int svc_monitors_next_run_time = LLONG_MAX; >> static long long int send_prefixd_time = LLONG_MAX; >> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >> long long int bfd_time = LLONG_MAX; >> diff --git a/lib/features.c b/lib/features.c >> index 97c9c86f0..ad3357570 100644 >> --- a/lib/features.c >> +++ b/lib/features.c >> @@ -28,11 +28,10 @@ >> #include "openvswitch/ofp-meter.h" >> #include "openvswitch/ofp-util.h" >> #include "ovn/features.h" >> +#include "controller/ofctrl.h" >> VLOG_DEFINE_THIS_MODULE(features); >> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >> - >> struct ovs_feature { >> enum ovs_feature_value value; >> const char *name; >> @@ -82,8 +81,7 @@ static void >> ovs_feature_rconn_setup(const char *br_name) >> { >> if (!swconn) { >> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >> - DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> } >> if (!rconn_is_connected(swconn)) { >> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >> } >> free(target); >> } >> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >> } >> static bool > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
