On 7/8/25 1:34 PM, Ilya Maximets wrote:
> When a packet enters kernel datapath and there is no flow to handle it,
> packet goes to userspace through a MISS upcall.  With per-CPU upcall
> dispatch mechanism, we're using the current CPU id to select the
> Netlink PID on which to send this packet.  This allows us to send
> packets from the same traffic flow through the same handler.
> 
> The handler will process the packet, install required flow into the
> kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.
> 
> While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
> recirculation action that will pass the (likely modified) packet
> through the flow lookup again.  And if the flow is not found, the
> packet will be sent to userspace again through another MISS upcall.
> 
> However, the handler thread in userspace is likely running on a
> different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
> in the syscall context of that thread.  So, when the time comes to
> send the packet through another upcall, the per-CPU dispatch will
> choose a different Netlink PID, and this packet will end up processed
> by a different handler thread on a different CPU.
> 
> The process continues as long as there are new recirculations, each
> time the packet goes to a different handler thread before it is sent
> out of the OVS datapath to the destination port.  In real setups the
> number of recirculations can go up to 4 or 5, sometimes more.
> 
> There is always a chance to re-order packets while processing upcalls,
> because userspace will first install the flow and then re-inject the
> original packet.  So, there is a race window when the flow is already
> installed and the second packet can match it inside the kernel and be
> forwarded to the destination before the first packet is re-injected.
> But the fact that packets are going through multiple upcalls handled
> by different userspace threads makes the reordering noticeably more
> likely, because we not only have a race between the kernel and a
> userspace handler (which is hard to avoid), but also between multiple
> userspace handlers.
> 
> For example, let's assume that 10 packets got enqueued through a MISS
> upcall for handler-1, it will start processing them, will install the
> flow into the kernel and start re-injecting packets back, from where
> they will go through another MISS to handler-2.  Handler-2 will install
> the flow into the kernel and start re-injecting the packets, while
> handler-1 continues to re-inject the last of the 10 packets, they will
> hit the flow installed by handler-2 and be forwarded without going to
> the handler-2, while handler-2 still re-injects the first of these 10
> packets.  Given multiple recirculations and misses, these 10 packets
> may end up completely mixed up on the output from the datapath.
> 
> Let's provide the original upcall PID via the new ntlink attribute
> OVS_PACKET_ATTR_UPCALL_PID.  This way the upcall triggered during the
> execution will go to the same handler.  Packets will be enqueued to
> the same socket and re-injected in the same order.  This doesn't
> eliminate re-ordering as stated above, since we still have a race
> between the kernel and the handler thread, but it allows to eliminate
> races between multiple handlers.
> 
> The openvswitch kernel module ignores unknown attributes for the
> OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older
> kernels.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-1479
> Link: 
> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/
> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
> ---
> 
> Version 1:
>   * No changes since RFC.  The kernel change got merged into net-next:
>       
> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/
>     Normally, we would wait for it to be in the Linus' tree, but it will
>     not happen before the branching.
> 
>  include/linux/openvswitch.h   | 6 ++++++
>  lib/dpif-netlink.c            | 7 +++++++
>  lib/dpif.h                    | 3 +++
>  ofproto/ofproto-dpif-upcall.c | 5 +++++
>  4 files changed, 21 insertions(+)

Fedora's mirrors are not feeling good today.

Recheck-request: github-robot
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to