Re: [ovs-dev] [RFC] dpif-netlink: Provide original upcall pid in 'execute' commands.

Eelco Chaudron via dev Fri, 04 Jul 2025 05:13:16 -0700


On 28 Jun 2025, at 0:07, Ilya Maximets wrote:


> When a packet enters kernel datapath and there is no flow to handle it,
> packet goes to userspace through a MISS upcall.  With per-CPU upcall
> dispatch mechanism, we're using the current CPU id to select the
> Netlink PID on which to send this packet.  This allows us to send
> packets from the same traffic flow through the same handler.
>
> The handler will process the packet, install required flow into the
> kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.
>
> While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
> recirculation action that will pass the (likely modified) packet
> through the flow lookup again.  And if the flow is not found, the
> packet will be sent to userspace again through another MISS upcall.
>
> However, the handler thread in userspace is likely running on a
> different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
> in the syscall context of that thread.  So, when the time comes to
> send the packet through another upcall, the per-CPU dispatch will
> choose a different Netlink PID, and this packet will end up processed
> by a different handler thread on a different CPU.
>
> The process continues as long as there are new recirculations, each
> time the packet goes to a different handler thread before it is sent
> out of the OVS datapath to the destination port.  In real setups the
> number of recirculations can go up to 4 or 5, sometimes more.
>
> There is always a chance to re-order packets while processing upcalls,
> because userspace will first install the flow and then re-inject the
> original packet.  So, there is a race window when the flow is already
> installed and the second packet can match it inside the kernel and be
> forwarded to the destination before the first packet is re-injected.
> But the fact that packets are going through multiple upcalls handled
> by different userspace threads makes the reordering noticeably more
> likely, because we not only have a race between the kernel and a
> userspace handler (which is hard to avoid), but also between multiple
> userspace handlers.
>
> For example, let's assume that 10 packets got enqueued through a MISS
> upcall for handler-1, it will start processing them, will install the
> flow into the kernel and start re-injecting packets back, from where
> they will go through another MISS to handler-2.  Handler-2 will install
> the flow into the kernel and start re-injecting the packets, while
> handler-1 continues to re-inject the last of the 10 packets, they will
> hit the flow installed by handler-2 and be forwarded without going to
> the handler-2, while handler-2 still re-injects the first of these 10
> packets.  Given multiple recirculations and misses, these 10 packets
> may end up completely mixed up on the output from the datapath.
>
> Let's provide the original upcall PID via the new ntlink attribute
> OVS_PACKET_ATTR_UPCALL_PID.  This way the upcall triggered during the
> execution will go to the same handler.  Packets will be enqueued to
> the same socket and re-injected in the same order.  This doesn't
> eliminate re-ordering as stated above, since we still have a race
> between the kernel and the handler thread, but it allows to eliminate
> races between multiple handlers.
>
> The openvswitch kernel module ignores unknown attributes for the
> OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older
> kernels.
>
> Reported-at: https://issues.redhat.com/browse/FDP-1479
> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>

This change looks good to me. Did minimal testing with and without the related 
kernel patch applied. So I guess I will (re)review and ack the non-RFC one, 
once it’s available.

Cheers,

Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [RFC] dpif-netlink: Provide original upcall pid in 'execute' commands.

Reply via email to