Currently during port migration, two chassis (source and destination)
can try to claim the same logical switch port simultaneously for a
short-period of time until the tap is deleted on source hypervisor.
ovn-controllers on these 2 hosts constantly receives port-binding
updates about other chassis claiming the port and as a result it tries
to claim the port again (because its chassis has a tap interface
referencing the LSP). This flapping ends once CMS cleans up tap
interface from the source chassis.

Now following steps occur during a single iteration inc-proc-eng during
flapping:

1. PB update received on OVN controller about other chassis owning the
   port.
2. ovn-controller tries to claim the port.
3. It installs the OVS flows for the port and updates the runtime_data
   to include this port in locally relevant ports.
4. If some change to runtime data happens as part of 3, port-groups
   containing the affected ports are recomputed. It uses related_lports
   runtime data to compute the port-groups.

Finally, ovn-controller sends a port-binding update to SB changing the
chassis to itself.
At a later point of time, SB sends the notification to ovn-controller
about (4) being completed.

Once CMS deletes the tap interface, ovn-controller receives the
notification and updates the runtime data accordingly.

Issue: ovs-flows are (sometimes)not cleaned up upon port migration.

If the notification of OVS interface deletion is received before SB
acks the PortBinding update, then ovn-controller does not cleanup
related_lports leading to incorrect port-groups computation.

i.e if the order of events is as follows:

1. PB update received on OVN controller about other chassis owning the
   port.
2. ovn-controller claims the port, installs OVS flows and sends the
   PortBinding update to SB.
3. OVS interface deletion notification received by ovn-controller.
4. SB ack received for step-2 PB update.

This commit fixes this issue by removing the logical_port from related
port even in case there is no binding available locally.

Signed-off-by: Priyankar Jain <[email protected]>
---
 controller/binding.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/controller/binding.c b/controller/binding.c
index 9b0647b70..9889be5c7 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1568,6 +1568,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb,
             || is_additional_chassis(pb, b_ctx_in->chassis_rec)) {
         /* Release the lport if there is no lbinding. */
         if (!lbinding_set || !can_bind) {
+            remove_related_lport(pb, b_ctx_out);
             return release_lport(pb, b_ctx_in->chassis_rec,
                                  !b_ctx_in->ovnsb_idl_txn,
                                  b_ctx_out->tracked_dp_bindings,
-- 
2.37.1 (Apple Git-137.1)

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to