Currently during port migration, two chassis (source and destination) can try to claim the same logical switch port simultaneously for a short-period of time until the tap is deleted on source hypervisor. ovn-controllers on these 2 hosts constantly receives port-binding updates about other chassis claiming the port and as a result it tries to claim the port again (because its chassis has a tap interface referencing the LSP). This flapping ends once CMS cleans up tap interface from the source chassis.
Now following steps occur during a single iteration inc-proc-eng during flapping: 1. PB update received on OVN controller about other chassis owning the port. 2. ovn-controller tries to claim the port. 3. It installs the OVS flows for the port and updates the runtime_data to include this port in locally relevant ports. 4. If some change to runtime data happens as part of 3, port-groups containing the affected ports are recomputed. It uses related_lports runtime data to compute the port-groups. Finally, ovn-controller sends a port-binding update to SB changing the chassis to itself. At a later point of time, SB sends the notification to ovn-controller about (4) being completed. Once CMS deletes the tap interface, ovn-controller receives the notification and updates the runtime data accordingly. Issue: ovs-flows are (sometimes)not cleaned up upon port migration. If the notification of OVS interface deletion is received before SB acks the PortBinding update, then ovn-controller does not cleanup related_lports leading to incorrect port-groups computation. i.e if the order of events is as follows: 1. PB update received on OVN controller about other chassis owning the port. 2. ovn-controller claims the port, installs OVS flows and sends the PortBinding update to SB. 3. OVS interface deletion notification received by ovn-controller. 4. SB ack received for step-2 PB update. This commit fixes this issue by removing the logical_port from related port even in case there is no binding available locally. Signed-off-by: Priyankar Jain <[email protected]> --- controller/binding.c | 1 + 1 file changed, 1 insertion(+) diff --git a/controller/binding.c b/controller/binding.c index 9b0647b70..9889be5c7 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb, || is_additional_chassis(pb, b_ctx_in->chassis_rec)) { /* Release the lport if there is no lbinding. */ if (!lbinding_set || !can_bind) { + remove_related_lport(pb, b_ctx_out); return release_lport(pb, b_ctx_in->chassis_rec, !b_ctx_in->ovnsb_idl_txn, b_ctx_out->tracked_dp_bindings, -- 2.37.1 (Apple Git-137.1) _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
