On 7/9/25 6:35 PM, Flavio Leitner wrote:
> On Tue,  8 Jul 2025 13:34:02 +0200
> Ilya Maximets <i.maxim...@ovn.org> wrote:
> 
>> When a packet enters kernel datapath and there is no flow to handle it,
>> packet goes to userspace through a MISS upcall.  With per-CPU upcall
>> dispatch mechanism, we're using the current CPU id to select the
>> Netlink PID on which to send this packet.  This allows us to send
>> packets from the same traffic flow through the same handler.
>>
>> The handler will process the packet, install required flow into the
>> kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE.
>>
>> While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a
>> recirculation action that will pass the (likely modified) packet
>> through the flow lookup again.  And if the flow is not found, the
>> packet will be sent to userspace again through another MISS upcall.
>>
>> However, the handler thread in userspace is likely running on a
>> different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled
>> in the syscall context of that thread.  So, when the time comes to
>> send the packet through another upcall, the per-CPU dispatch will
>> choose a different Netlink PID, and this packet will end up processed
>> by a different handler thread on a different CPU.
>>
>> The process continues as long as there are new recirculations, each
>> time the packet goes to a different handler thread before it is sent
>> out of the OVS datapath to the destination port.  In real setups the
>> number of recirculations can go up to 4 or 5, sometimes more.
>>
>> There is always a chance to re-order packets while processing upcalls,
>> because userspace will first install the flow and then re-inject the
>> original packet.  So, there is a race window when the flow is already
>> installed and the second packet can match it inside the kernel and be
>> forwarded to the destination before the first packet is re-injected.
>> But the fact that packets are going through multiple upcalls handled
>> by different userspace threads makes the reordering noticeably more
>> likely, because we not only have a race between the kernel and a
>> userspace handler (which is hard to avoid), but also between multiple
>> userspace handlers.
>>
>> For example, let's assume that 10 packets got enqueued through a MISS
>> upcall for handler-1, it will start processing them, will install the
>> flow into the kernel and start re-injecting packets back, from where
>> they will go through another MISS to handler-2.  Handler-2 will install
>> the flow into the kernel and start re-injecting the packets, while
>> handler-1 continues to re-inject the last of the 10 packets, they will
>> hit the flow installed by handler-2 and be forwarded without going to
>> the handler-2, while handler-2 still re-injects the first of these 10
>> packets.  Given multiple recirculations and misses, these 10 packets
>> may end up completely mixed up on the output from the datapath.
>>
>> Let's provide the original upcall PID via the new ntlink attribute
>> OVS_PACKET_ATTR_UPCALL_PID.  This way the upcall triggered during the
>> execution will go to the same handler.  Packets will be enqueued to
>> the same socket and re-injected in the same order.  This doesn't
>> eliminate re-ordering as stated above, since we still have a race
>> between the kernel and the handler thread, but it allows to eliminate
>> races between multiple handlers.
>>
>> The openvswitch kernel module ignores unknown attributes for the
>> OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older
>> kernels.
>>
>> Reported-at: https://issues.redhat.com/browse/FDP-1479
>> Link:
>> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> 
>>
>> ---
>>
>> Version 1:
>>   * No changes since RFC.  The kernel change got merged into net-next:
>>       
>> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/
>>     Normally, we would wait for it to be in the Linus' tree, but it will
>>     not happen before the branching.
> 
> 
> Acked-by: Flavio Leitner <f...@sysclose.org>

Thanks, Eelco and Flavio!  Applied.

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to