On 3/7/24 5:39 AM, Ilya Maximets wrote:
On 2/27/24 20:14, Han Zhou wrote:
For kernel datapath, when a new tunnel port is created in the same
transaction in which an old tunnel port with the same tunnel
configuration is deleted, the new tunnel port creation will fail and
left in an error state. This can be easily reproduced in OVN by
triggering a chassis deletion and addition with the same encap in the
same SB DB transaction, such as:

ovn-sbctl chassis-add aaaaaa geneve 1.2.3.4
ovn-sbctl chassis-del aaaaaa -- chassis-add bbbbbb 1.2.3.4

ovs-vsctl show | grep error
error: "could not add network device ovn-bbbbbb-0 to ofproto (File exists)"

Related logs in OVS:
—
2024-02-23T05:41:49.978Z|405933|bridge|INFO|bridge br-int: deleted interface 
ovn-aaaaaa-0 on port 113
2024-02-23T05:41:49.989Z|405935|tunnel|WARN|ovn-bbbbbb-0: attempting to add tunnel 
port with same config as port 'ovn-aaaaaa-0' (::->1.2.3.4, key=flow, legacy_l2, 
dp port=9)
2024-02-23T05:41:49.989Z|405936|ofproto|WARN|br-int: could not add port 
ovn-bbbbbb-0 (File exists)
2024-02-23T05:41:49.989Z|405937|bridge|WARN|could not add network device 
ovn-bbbbbb-0 to ofproto (File exists)
—

Hi, Han.  Thanks for the patch!


Depending on when there are other OVSDB changes, it may take a long time
for the device to be added successfully, triggered by the next OVS
iteration.

(note: native tunnel ports do not have this problem)

I don't think this is correct.  The code path is common for both system
and native tunnels.  I can reproduce the issues in a sandbox with:

$ make -j8 sandbox SANDBOXFLAGS="\-\-dummy='system'"
[tutorial]$ ovs-vsctl add-port br0 tunnel_port \
                 -- set Interface tunnel_port \
                        type=geneve options:remote_ip=flow options:key=123
[tutorial]$ ovs-vsctl del-port tunnel_port \
                 -- add-port br0 tunnel_port2 \
                 -- set Interface tunnel_port2 \
                        type=geneve options:remote_ip=flow options:key=123
ovs-vsctl: Error detected while setting up 'tunnel_port2':
could not add network device tunnel_port2 to ofproto (File exists).
See ovs-vswitchd log for details.

The same should work in a testsuite as well, i.e. we should be able to
create a test for this scenario.

Note: The --dummy=system prevents OVS from replacing tunnel ports with
       dummy ones.


The problem is how the tunnel port deletion and creation are handled. In
bridge_reconfigure(), port deletion is handled before creation, to avoid
such resource conflict. However, for kernel tunnel ports, the real clean
up is performed at the end of the bridge_reconfigure() in the:
bridge_run__()->ofproto_run()->...->ofproto_dpif:port_destruct()

We cannot call bridge_run__() at an earlier point before all
reconfigurations are done, so this patch tries a generic approach to
just re-run the bridge_reconfigure() when there are any port creations
encountered "File exists" error, which indicates a possible resource
conflict may have happened due to a later deleted port, and retry may
succeed.

Signed-off-by: Han Zhou <[email protected]>
---
This is RFC because I am not sure if there is a better way to solve the problem
more specifically by executing the port_destruct for the old port before trying
to create the new port. The fix may be more complex though.

I don't think re-trying is a good approach in general.  We should likely
just destroy the tnl_port structure right away, similarly to how we clean
up stp, lldp and bundles in ofproto_port_unregister().  Maybe we can add
a new callback similar to bundle_remove() and call tnl_port_del() from it?
(I didn't try, so I'm not 100% sure this will not cause any issues.)

What do you think?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Hi Ilya and Han,

We hit this issue some days ago. As our analysis, it was introduced by
commit fe83f81df977 ("netdev: Remove netdev from global shash when the user is changing interface configuration.")

Reproduce:
  ovs-vsctl add-port br-int vxlan0 \
    -- set interface vxlan0 type=vxlan options:remote_ip=10.10.10.1

  sleep 2

  ovs-vsctl --if-exists del-port vxlan0 \
    -- add-port br-int vxlan1 \
    -- set interface vxlan1 type=vxlan options:remote_ip=10.10.10.1
  ovs-vsctl: Error detected while setting up 'vxlan1': could not add
  network device vxlan1 to ofproto (File exists).

vswitchd log:
  bridge|INFO|bridge br-int: added interface vxlan0 on port 1106
  bridge|INFO|bridge br-int: deleted interface vxlan0 on port 1106
tunnel|WARN|vxlan1: attempting to add tunnel port with same config as port 'vxlan0' (::->10.10.10.1, key=0, legacy_l2, dp port=122)
  ofproto|WARN|br-int: could not add port vxlan1 (File exists)
  bridge|WARN|could not add network device vxlan1 to ofproto (File exists)

CallTrace:
  bridge_reconfigure
    bridge_del_ports
      port_destroy
        iface_destroy__
          netdev_remove         <------ netdev vxlan0 removed
    bridge_delete_or_reconfigure_ports
      OFPROTO_PORT_FOR_EACH
        ofproto_port_dump_next
          port_dump_next
          port_query_by_name    <------ netdev_shash do not contain vxlan0
        ofproto_port_del        <------ vxlan0 do not del in ofproto
    bridge_add_ports
      bridge_add_ports__
        iface_create
          iface_do_create
            ofproto_port_add    <------ vxlan1 add failed


We tried to fix this by the following diff:

---
 ofproto/ofproto-dpif.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index d7544ef..654468e 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -3907,14 +3907,16 @@ port_query_by_name(const struct ofproto *ofproto_, const char *devname,

     if (sset_contains(&ofproto->ghost_ports, devname)) {
         const char *type = netdev_get_type_from_name(devname);
+        const struct ofport *ofport =
+ shash_find_data(&ofproto->up.port_by_name, devname);
+        if (!type && ofport && ofport->netdev) {
+            type = netdev_get_type(ofport->netdev);
+        }

/* We may be called before ofproto->up.port_by_name is populated with * the appropriate ofport. For this reason, we must get the name and
          * type from the netdev layer directly. */
         if (type) {
-            const struct ofport *ofport;
-
-            ofport = shash_find_data(&ofproto->up.port_by_name, devname);
ofproto_port->ofp_port = ofport ? ofport->ofp_port : OFPP_NONE;
             ofproto_port->name = xstrdup(devname);
             ofproto_port->type = xstrdup(type);

Best regards, Tao

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to