Re: [ovs-dev] [PATCH] ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes
Acked-by: Mark Michelson On 2/4/19 5:31 AM, nusid...@redhat.com wrote: From: Numan Siddique On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't get updated in the ovs db. ovn-controller will be reading the old BFD status even though ovs-vswitchd is crashed. This results in the chassiredirect port claim flapping between the master chassis and the chasiss with the next higher priority if ovs-vswitchd crashes in the master chassis. All the other chassis notices the BFD status down with the master chassis and hence the next higher priority claims the port. But according to the master chassis, the BFD status is fine and it again claims back the chassisredirect port. And this results in flapping. The issue gets resolved when ovs-vswitchd comes back but until then it leads to lot of SB DB transactions and high CPU usage in ovn-controller's. This patch fixes the issue by checking the OF connection status of the ovn-controller with ovs-vswitchd and calculates the active bfd tunnels only if it's connected. Signed-off-by: Numan Siddique --- ovn/controller/ofctrl.c | 6 ++ ovn/controller/ofctrl.h | 3 +++ ovn/controller/ovn-controller.c | 13 - tests/ovn.at| 30 +- 4 files changed, 50 insertions(+), 2 deletions(-) diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c index 218612787..95b95b607 100644 --- a/ovn/controller/ofctrl.c +++ b/ovn/controller/ofctrl.c @@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, const char *flow_s, return NULL; } + +bool +ofctrl_is_connected(void) +{ +return rconn_is_connected(swconn); +} diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h index ae0cfa513..f7521801b 100644 --- a/ovn/controller/ofctrl.h +++ b/ovn/controller/ofctrl.h @@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, uint8_t table_id, const struct match *, const struct ofpbuf *ofpacts, bool log_duplicate_flow); + +bool ofctrl_is_connected(void); + #endif /* ovn/ofctrl.h */ diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 4e9a5865f..b0f55a870 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -689,7 +689,18 @@ main(int argc, char *argv[]) ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int, sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id, sbrec_sb_global_first(ovnsb_idl_loop.idl)); -bfd_calculate_active_tunnels(br_int, _tunnels); + +if (ofctrl_is_connected()) { +/* Calculate the active tunnels only if have an an active + * OpenFlow connection to br-int. + * If we don't have a connection to br-int, it could mean + * ovs-vswitchd is down for some reason and the BFD status + * in the Interface rows could be stale. So its better to + * consider 'active_tunnels' set to be empty. + * */ +bfd_calculate_active_tunnels(br_int, _tunnels); +} + binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name, sbrec_datapath_binding_by_key, sbrec_port_binding_by_datapath, diff --git a/tests/ovn.at b/tests/ovn.at index f54f24c74..feafe1f00 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \ test_ip_packet gw2 gw1 -OVN_CLEANUP([hv1],[gw1],[gw2],[ext1]) +# Both gw1 and gw2 at this point should have claimed the cr-alice +# once each. +gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l` +gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l` + +# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port. +# So gw1 should claim twice and gw1 only once. + +kill `cat gw2/ovs-vswitchd.pid` + +# gw1 should claim the cr-alice +gw1_claim_ct=$((gw1_claim_ct+1)) + +OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \ +| grep -c "cr-alice: Claiming"`]) + +AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \ +grep -c "cr-alice: Claiming"`]) + +ovn-sbctl show +test_ip_packet gw1 gw2 + +as gw2 +ovs-appctl -t ovn-controller exit +ovs-appctl -t ovs-vswitchd exit +ps -aef | grep ovn-c + +OVN_CLEANUP([hv1],[gw1],[ext1]) + AT_CLEANUP AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port]) ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH] ovn-controller: Fix chassisredirect port flapping when ovs-vswitchd crashes
From: Numan Siddique On a chassis when ovs-vswitchd crashes for some reason, the BFD status doesn't get updated in the ovs db. ovn-controller will be reading the old BFD status even though ovs-vswitchd is crashed. This results in the chassiredirect port claim flapping between the master chassis and the chasiss with the next higher priority if ovs-vswitchd crashes in the master chassis. All the other chassis notices the BFD status down with the master chassis and hence the next higher priority claims the port. But according to the master chassis, the BFD status is fine and it again claims back the chassisredirect port. And this results in flapping. The issue gets resolved when ovs-vswitchd comes back but until then it leads to lot of SB DB transactions and high CPU usage in ovn-controller's. This patch fixes the issue by checking the OF connection status of the ovn-controller with ovs-vswitchd and calculates the active bfd tunnels only if it's connected. Signed-off-by: Numan Siddique --- ovn/controller/ofctrl.c | 6 ++ ovn/controller/ofctrl.h | 3 +++ ovn/controller/ovn-controller.c | 13 - tests/ovn.at| 30 +- 4 files changed, 50 insertions(+), 2 deletions(-) diff --git a/ovn/controller/ofctrl.c b/ovn/controller/ofctrl.c index 218612787..95b95b607 100644 --- a/ovn/controller/ofctrl.c +++ b/ovn/controller/ofctrl.c @@ -1265,3 +1265,9 @@ ofctrl_inject_pkt(const struct ovsrec_bridge *br_int, const char *flow_s, return NULL; } + +bool +ofctrl_is_connected(void) +{ +return rconn_is_connected(swconn); +} diff --git a/ovn/controller/ofctrl.h b/ovn/controller/ofctrl.h index ae0cfa513..f7521801b 100644 --- a/ovn/controller/ofctrl.h +++ b/ovn/controller/ofctrl.h @@ -60,4 +60,7 @@ void ofctrl_check_and_add_flow(struct hmap *desired_flows, uint8_t table_id, const struct match *, const struct ofpbuf *ofpacts, bool log_duplicate_flow); + +bool ofctrl_is_connected(void); + #endif /* ovn/ofctrl.h */ diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 4e9a5865f..b0f55a870 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -689,7 +689,18 @@ main(int argc, char *argv[]) ovsrec_bridge_table_get(ovs_idl_loop.idl), br_int, sbrec_chassis_table_get(ovnsb_idl_loop.idl), chassis_id, sbrec_sb_global_first(ovnsb_idl_loop.idl)); -bfd_calculate_active_tunnels(br_int, _tunnels); + +if (ofctrl_is_connected()) { +/* Calculate the active tunnels only if have an an active + * OpenFlow connection to br-int. + * If we don't have a connection to br-int, it could mean + * ovs-vswitchd is down for some reason and the BFD status + * in the Interface rows could be stale. So its better to + * consider 'active_tunnels' set to be empty. + * */ +bfd_calculate_active_tunnels(br_int, _tunnels); +} + binding_run(ovnsb_idl_txn, ovs_idl_txn, sbrec_chassis_by_name, sbrec_datapath_binding_by_key, sbrec_port_binding_by_datapath, diff --git a/tests/ovn.at b/tests/ovn.at index f54f24c74..feafe1f00 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -8055,7 +8055,35 @@ ovn-nbctl --timeout=3 --wait=hv \ test_ip_packet gw2 gw1 -OVN_CLEANUP([hv1],[gw1],[gw2],[ext1]) +# Both gw1 and gw2 at this point should have claimed the cr-alice +# once each. +gw1_claim_ct=`grep "cr-alice: Claiming" gw1/ovn-controller.log | wc -l` +gw2_claim_ct=`grep "cr-alice: Claiming" gw2/ovn-controller.log | wc -l` + +# Kill ovs-vswitchd in gw2. gw1 should claim the gateway port. +# So gw1 should claim twice and gw1 only once. + +kill `cat gw2/ovs-vswitchd.pid` + +# gw1 should claim the cr-alice +gw1_claim_ct=$((gw1_claim_ct+1)) + +OVS_WAIT_UNTIL([test $gw1_claim_ct = `cat gw1/ovn-controller.log \ +| grep -c "cr-alice: Claiming"`]) + +AT_CHECK([test $gw2_claim_ct = `cat gw2/ovn-controller.log | \ +grep -c "cr-alice: Claiming"`]) + +ovn-sbctl show +test_ip_packet gw1 gw2 + +as gw2 +ovs-appctl -t ovn-controller exit +ovs-appctl -t ovs-vswitchd exit +ps -aef | grep ovn-c + +OVN_CLEANUP([hv1],[gw1],[ext1]) + AT_CLEANUP AT_SETUP([ovn -- 4 HV, 3 LS, 2 LR, packet test with HA distributed router gateway port]) -- 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev