On 9 Oct 2023, at 17:06, David Marchand wrote:

> A main thread executing the 'revalidator/purge' command could race with
> revalidator threads that can be dumping/sweeping the purged flows at the
> same time.
>
> This race can be reproduced (with dpif debug logs) by running the
> conntrack - ICMP related unit tests with the userspace datapath:
>
> 2023-10-09T14:11:55.242Z|00177|unixctl|DBG|received request
>       revalidator/purge[], id=0
> 2023-10-09T14:11:55.242Z|00044|dpif(revalidator17)|DBG|netdev@ovs-netdev:
>       flow_dump ufid:68ff6817-fb3b-4b30-8412-9cf175318294 <empty>,
>       packets:0, bytes:0, used:never
> 2023-10-09T14:11:55.242Z|00178|dpif|DBG|netdev@ovs-netdev: flow_del
>       ufid:07046e91-30a6-4862-9048-1a76b5a88a5b
>       recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),
>       ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),
>       packet_type(ns=0,id=0),
>       eth(src=a6:0a:bf:e2:f3:f2,dst=62:23:0f:f6:2c:75),
>       eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=17,tos=0,
>       ttl=64,frag=no),udp(src=37380,dst=10000), packets:0, bytes:0,
>       used:never
> ...
> 2023-10-09T14:11:55.242Z|00049|dpif(revalidator17)|WARN|netdev@ovs-netdev:
>       failed to flow_get (No such file or directory)
>       ufid:07046e91-30a6-4862-9048-1a76b5a88a5b <empty>, packets:0,
>       bytes:0, used:never
> 2023-10-09T14:11:55.242Z|00050|ofproto_dpif_upcall(revalidator17)|WARN|
>       Failed to acquire udpif_key corresponding to unexpected flow
>       (No such file or directory):
>       ufid:07046e91-30a6-4862-9048-1a76b5a88a5b
> ...
> 2023-10-09T14:11:55.242Z|00183|unixctl|DBG|replying with success, id=0: ""
>
> To avoid this race, a first part of the fix is to pause (if not already
> paused) the revalidators while the main thread is purging the datapath
> flows.
>
> Then a second issue is observed by running the same unit test with the
> kernel datapath. Its dpif implementation dumps flows via a netlink request
> (see dpif_flow_dump_create(), dpif_netlink_flow_dump_create(),
> nl_dump_start(), nl_sock_send__()) in the leader revalidator thread,
> before pausing revalidators:
>
> 2023-10-09T14:44:28.742Z|00122|unixctl|DBG|received request
>       revalidator/purge[], id=0
> ...
> 2023-10-09T14:44:28.742Z|00125|dpif|DBG|system@ovs-system: flow_del
>       ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 recirc_id(0),dp_hash(0),
>       skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),
>       ct_mark(0),ct_label(0),eth(src=a6:0a:bf:e2:f3:f2,
>       dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.1.1.1,
>       tip=10.1.1.2,op=1,sha=a6:0a:bf:e2:f3:f2,tha=00:00:00:00:00:00),
>       packets:0, bytes:0, used:never
> ...
> 2023-10-09T14:44:28.742Z|00129|unixctl|DBG|replying with success, id=0: ""
> ...
> 2023-10-09T14:44:28.742Z|00006|dpif(revalidator21)|DBG|system@ovs-system:
>       flow_dump ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>,
>       packets:0, bytes:0, used:never
> ...
> 2023-10-09T14:44:28.742Z|00012|dpif(revalidator21)|WARN|system@ovs-system:
>       failed to flow_get (No such file or directory)
>       ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9 <empty>, packets:0,
>       bytes:0, used:never
> 2023-10-09T14:44:28.742Z|00013|ofproto_dpif_upcall(revalidator21)|WARN|
>       Failed to acquire udpif_key corresponding to unexpected flow
>       (No such file or directory):
>       ufid:70102d81-30a1-44b9-aa76-3d02a9ffd2c9
>
> To avoid evaluating already deleted flows, the second part of the fix is
> to ensure that dumping from the leader revalidator thread is done out of
> any pause request.
>
> Fixes: 98bb4286970d ("tests: Add command to purge revalidators of flows.")
> Signed-off-by: David Marchand <david.march...@redhat.com>
> ---
> Note: with this fix, I can make most of system-traffic tests run fine
> under make check-dpdk check-kernel check-offloads.
> "Most", because a unit test "datapath - basic truncate action" fails for
> me though it predates this patch, and I did not understand why it fails
> yet.

I’m not getting this error in my test environment. I tested all datapaths and 
did not get any errors.


> The check-offloads target can run fine if removing the exception on
> "failed to flow_get" and "failed to acquire ukey" warning logs.

Would it be good to add another patch in the series, to remove these?


This patch looks good to me!

Acked-by: Eelco Chaudron <echau...@redhat.com>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to