Re: [ovs-dev] [PATCH] ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes

2019-02-04 Thread Mark Michelson

Acked-by: Mark Michelson 

On 2/4/19 5:31 AM, nusid...@redhat.com wrote:

From: Numan Siddique 

On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't
get updated in the ovs db. ovn-controller will be reading the old BFD status
even though ovs-vswitchd is crashed. This results in the chassiredirect port
claim flapping between the master chassis and the chasiss with the next higher
priority if ovs-vswitchd crashes in the master chassis.

All the other chassis notices the BFD status down with the master chassis
and hence the next higher priority claims the port. But according to
the master chassis, the BFD status is fine and it again claims back the
chassisredirect port. And this results in flapping. The issue gets resolved
when ovs-vswitchd comes back but until then it leads to lot of SB DB
transactions and high CPU usage in ovn-controller's.

This patch fixes the issue by checking the OF connection status of the
ovn-controller with ovs-vswitchd and calculates the active bfd tunnels
only if it's connected.

Signed-off-by: Numan Siddique 
---
  ovn/controller/ofctrl.c |  6 ++
  ovn/controller/ofctrl.h |  3 +++
  ovn/controller/ovn-controller.c | 13 -
  tests/ovn.at| 30 +-
  4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 218612787..95b95b607 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, 
const char *flow_s,
  
  return NULL;

  }
+
+bool
+ofctrl_is_connected(void)
+{
+return rconn_is_connected(swconn);
+}
diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
index ae0cfa513..f7521801b 100644
--- a/ovn/controller/ofctrl.h
+++ b/ovn/controller/ofctrl.h
@@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, 
uint8_t table_id,
 const struct match *,
 const struct ofpbuf *ofpacts,
 bool log_duplicate_flow);
+
+bool ofctrl_is_connected(void);
+
  #endif /* ovn/ofctrl.h */
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 4e9a5865f..b0f55a870 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -689,7 +689,18 @@ main(int argc, char *argv[])
  ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int,
  sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id,
  sbrec_sb_global_first(ovnsb_idl_loop.idl));
-bfd_calculate_active_tunnels(br_int, _tunnels);
+
+if (ofctrl_is_connected()) {
+/* Calculate the active tunnels only if have an an active
+ * OpenFlow connection to br-int.
+ * If we don't have a connection to br-int, it could mean
+ * ovs-vswitchd is down for some reason and the BFD status
+ * in the Interface rows could be stale. So its better to
+ * consider 'active_tunnels' set to be empty.
+ * */
+bfd_calculate_active_tunnels(br_int, _tunnels);
+}
+
  binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name,
  sbrec_datapath_binding_by_key,
  sbrec_port_binding_by_datapath,
diff --git a/tests/ovn.at b/tests/ovn.at
index f54f24c74..feafe1f00 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \
  
  test_ip_packet gw2 gw1
  
-OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])

+# Both gw1 and gw2 at this point should have claimed the cr-alice
+# once each.
+gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l`
+gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l`
+
+# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port.
+# So gw1 should claim twice and gw1 only once.
+
+kill `cat gw2/ovs-vswitchd.pid`
+
+# gw1 should claim the cr-alice
+gw1_claim_ct=$((gw1_claim_ct+1))
+
+OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \
+| grep -c "cr-alice: Claiming"`])
+
+AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \
+grep -c "cr-alice: Claiming"`])
+
+ovn-sbctl show
+test_ip_packet gw1 gw2
+
+as gw2
+ovs-appctl -t ovn-controller exit
+ovs-appctl -t ovs-vswitchd exit
+ps -aef | grep ovn-c
+
+OVN_CLEANUP([hv1],[gw1],[ext1])
+
  AT_CLEANUP
  
  AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port])




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes

2019-02-04 Thread nusiddiq
From: Numan Siddique 

On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't
get updated in the ovs db. ovn-controller will be reading the old BFD status
even though ovs-vswitchd is crashed. This results in the chassiredirect port
claim flapping between the master chassis and the chasiss with the next higher
priority if ovs-vswitchd crashes in the master chassis.

All the other chassis notices the BFD status down with the master chassis
and hence the next higher priority claims the port. But according to
the master chassis, the BFD status is fine and it again claims back the
chassisredirect port. And this results in flapping. The issue gets resolved
when ovs-vswitchd comes back but until then it leads to lot of SB DB
transactions and high CPU usage in ovn-controller's.

This patch fixes the issue by checking the OF connection status of the
ovn-controller with ovs-vswitchd and calculates the active bfd tunnels
only if it's connected.

Signed-off-by: Numan Siddique 
---
 ovn/controller/ofctrl.c |  6 ++
 ovn/controller/ofctrl.h |  3 +++
 ovn/controller/ovn-controller.c | 13 -
 tests/ovn.at| 30 +-
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c
index 218612787..95b95b607 100644
--- a/ovn/controller/ofctrl.c
+++ b/ovn/controller/ofctrl.c
@@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, 
const char *flow_s,
 
 return NULL;
 }
+
+bool
+ofctrl_is_connected(void)
+{
+return rconn_is_connected(swconn);
+}
diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h
index ae0cfa513..f7521801b 100644
--- a/ovn/controller/ofctrl.h
+++ b/ovn/controller/ofctrl.h
@@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, 
uint8_t table_id,
const struct match *,
const struct ofpbuf *ofpacts,
bool log_duplicate_flow);
+
+bool ofctrl_is_connected(void);
+
 #endif /* ovn/ofctrl.h */
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 4e9a5865f..b0f55a870 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -689,7 +689,18 @@ main(int argc, char *argv[])
 ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int,
 sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id,
 sbrec_sb_global_first(ovnsb_idl_loop.idl));
-bfd_calculate_active_tunnels(br_int, _tunnels);
+
+if (ofctrl_is_connected()) {
+/* Calculate the active tunnels only if have an an active
+ * OpenFlow connection to br-int.
+ * If we don't have a connection to br-int, it could mean
+ * ovs-vswitchd is down for some reason and the BFD status
+ * in the Interface rows could be stale. So its better to
+ * consider 'active_tunnels' set to be empty.
+ * */
+bfd_calculate_active_tunnels(br_int, _tunnels);
+}
+
 binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name,
 sbrec_datapath_binding_by_key,
 sbrec_port_binding_by_datapath,
diff --git a/tests/ovn.at b/tests/ovn.at
index f54f24c74..feafe1f00 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \
 
 test_ip_packet gw2 gw1
 
-OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
+# Both gw1 and gw2 at this point should have claimed the cr-alice
+# once each.
+gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l`
+gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l`
+
+# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port.
+# So gw1 should claim twice and gw1 only once.
+
+kill `cat gw2/ovs-vswitchd.pid`
+
+# gw1 should claim the cr-alice
+gw1_claim_ct=$((gw1_claim_ct+1))
+
+OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \
+| grep -c "cr-alice: Claiming"`])
+
+AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \
+grep -c "cr-alice: Claiming"`])
+
+ovn-sbctl show
+test_ip_packet gw1 gw2
+
+as gw2
+ovs-appctl -t ovn-controller exit
+ovs-appctl -t ovs-vswitchd exit
+ps -aef | grep ovn-c
+
+OVN_CLEANUP([hv1],[gw1],[ext1])
+
 AT_CLEANUP
 
 AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router 
gateway port])
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev