A main thread executing the 'revalidator/purge' command could race with
revalidator threads that can be dumping/sweeping the purged flows at the
same time.

This race can be reproduced (with dpif debug logs) by running the
conntrack - ICMP related unit tests with the userspace datapath:

2023-10-09T14:11:55.242Z|00177|unixctl|DBG|received request
        revalidator/purge[], id=0
2023-10-09T14:11:55.242Z|00044|dpif(revalidator17)|DBG|netdev@ovs-netdev:
        flow_dump ufid:68ff6817-fb3b-4b30-8412-9cf175318294 <empty>,
        packets:0, bytes:0, used:never
2023-10-09T14:11:55.242Z|00178|dpif|DBG|netdev@ovs-netdev: flow_del
        ufid:07046e91-30a6-4862-9048-1a76b5a88a5b
        recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),
        ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),
        packet_type(ns=0,id=0),
        eth(src=a6:0a:bf:e2:f3:f2,dst=62:23:0f:f6:2c:75),
        eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0,
        ttl=64,frag=no),udp(src=37380,dst=10000), packets:0, bytes:0,
        used:never
...
2023-10-09T14:11:55.242Z|00049|dpif(revalidator17)|WARN|netdev@ovs-netdev:
        failed to flow_get (No such file or directory)
        ufid:07046e91-30a6-4862-9048-1a76b5a88a5b <empty>, packets:0,
        bytes:0, used:never
2023-10-09T14:11:55.242Z|00050|ofproto_dpif_upcall(revalidator17)|WARN|
        Failed to acquire udpif_key corresponding to unexpected flow
        (No such file or directory):
        ufid:07046e91-30a6-4862-9048-1a76b5a88a5b
...
2023-10-09T14:11:55.242Z|00183|unixctl|DBG|replying with success, id=0: ""

To avoid this race, a first part of the fix is to pause (if not already
paused) the revalidators while the main thread is purging the datapath
flows.

Then a second issue is observed by running the same unit test with the
kernel datapath. Its dpif implementation dumps flows via a netlink request
(see dpif_flow_dump_create(), dpif_netlink_flow_dump_create(),
nl_dump_start(), nl_sock_send__()) in the leader revalidator thread,
before pausing revalidators:

2023-10-09T14:44:28.742Z|00122|unixctl|DBG|received request
        revalidator/purge[], id=0
...
2023-10-09T14:44:28.742Z|00125|dpif|DBG|system@ovs-system: flow_del
        ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 recirc_id(0),dp_hash(0),
        skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),
        ct_mark(0),ct_label(0),eth(src=a6:0a:bf:e2:f3:f2,
        dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1,
        tip=10.1.1.2,op=1,sha=a6:0a:bf:e2:f3:f2,tha=00:00:00:00:00:00),
        packets:0, bytes:0, used:never
...
2023-10-09T14:44:28.742Z|00129|unixctl|DBG|replying with success, id=0: ""
...
2023-10-09T14:44:28.742Z|00006|dpif(revalidator21)|DBG|system@ovs-system:
        flow_dump ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>,
        packets:0, bytes:0, used:never
...
2023-10-09T14:44:28.742Z|00012|dpif(revalidator21)|WARN|system@ovs-system:
        failed to flow_get (No such file or directory)
        ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, packets:0,
        bytes:0, used:never
2023-10-09T14:44:28.742Z|00013|ofproto_dpif_upcall(revalidator21)|WARN|
        Failed to acquire udpif_key corresponding to unexpected flow
        (No such file or directory):
        ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9

To avoid evaluating already deleted flows, the second part of the fix is
to ensure that dumping from the leader revalidator thread is done out of
any pause request.

Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.")
Signed-off-by: David Marchand <david.march...@redhat.com>
---
Note: with this fix, I can make most of system-traffic tests run fine
under make check-dpdk check-kernel check-offloads.
"Most", because a unit test "datapath - basic truncate action" fails for
me though it predates this patch, and I did not understand why it fails
yet.

The check-offloads target can run fine if removing the exception on
"failed to flow_get" and "failed to acquire ukey" warning logs.

---
 ofproto/ofproto-dpif-upcall.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index cde03abc6d..e49e9f7e50 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -990,7 +990,7 @@ udpif_revalidator(void *arg)
             udpif->reval_exit = latch_is_set(&udpif->exit_latch);
 
             start_time = time_msec();
-            if (!udpif->reval_exit) {
+            if (!udpif->reval_exit && !udpif->pause) {
                 bool terse_dump;
 
                 terse_dump = udpif_use_ufid(udpif);
@@ -1000,10 +1000,15 @@ udpif_revalidator(void *arg)
             }
         }
 
-        /* Wait for the leader to start the flow dump. */
+        /* Wait for the leader to reach this point. */
         ovs_barrier_block(&udpif->reval_barrier);
         if (udpif->pause) {
             revalidator_pause(revalidator);
+            if (!udpif->reval_exit) {
+                /* The main thread resumed all validators, but the leader
+                 * dumped nothing, go to next iteration. */
+                continue;
+            }
         }
 
         if (udpif->reval_exit) {
@@ -3217,11 +3222,19 @@ upcall_unixctl_purge(struct unixctl_conn *conn, int 
argc OVS_UNUSED,
     struct udpif *udpif;
 
     LIST_FOR_EACH (udpif, list_node, &all_udpifs) {
+        bool wake_up = false;
         int n;
 
+        if (!latch_is_set(&udpif->pause_latch)) {
+            udpif_pause_revalidators(udpif);
+            wake_up = true;
+        }
         for (n = 0; n < udpif->n_revalidators; n++) {
             revalidator_purge(&udpif->revalidators[n]);
         }
+        if (wake_up) {
+            udpif_resume_revalidators(udpif);
+        }
     }
     unixctl_command_reply(conn, "");
 }
-- 
2.41.0

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to