On 7/9/25 6:35 PM, Flavio Leitner wrote: > On Tue, 8 Jul 2025 13:34:02 +0200 > Ilya Maximets <i.maxim...@ovn.org> wrote: > >> When a packet enters kernel datapath and there is no flow to handle it, >> packet goes to userspace through a MISS upcall. With per-CPU upcall >> dispatch mechanism, we're using the current CPU id to select the >> Netlink PID on which to send this packet. This allows us to send >> packets from the same traffic flow through the same handler. >> >> The handler will process the packet, install required flow into the >> kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. >> >> While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a >> recirculation action that will pass the (likely modified) packet >> through the flow lookup again. And if the flow is not found, the >> packet will be sent to userspace again through another MISS upcall. >> >> However, the handler thread in userspace is likely running on a >> different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled >> in the syscall context of that thread. So, when the time comes to >> send the packet through another upcall, the per-CPU dispatch will >> choose a different Netlink PID, and this packet will end up processed >> by a different handler thread on a different CPU. >> >> The process continues as long as there are new recirculations, each >> time the packet goes to a different handler thread before it is sent >> out of the OVS datapath to the destination port. In real setups the >> number of recirculations can go up to 4 or 5, sometimes more. >> >> There is always a chance to re-order packets while processing upcalls, >> because userspace will first install the flow and then re-inject the >> original packet. So, there is a race window when the flow is already >> installed and the second packet can match it inside the kernel and be >> forwarded to the destination before the first packet is re-injected. >> But the fact that packets are going through multiple upcalls handled >> by different userspace threads makes the reordering noticeably more >> likely, because we not only have a race between the kernel and a >> userspace handler (which is hard to avoid), but also between multiple >> userspace handlers. >> >> For example, let's assume that 10 packets got enqueued through a MISS >> upcall for handler-1, it will start processing them, will install the >> flow into the kernel and start re-injecting packets back, from where >> they will go through another MISS to handler-2. Handler-2 will install >> the flow into the kernel and start re-injecting the packets, while >> handler-1 continues to re-inject the last of the 10 packets, they will >> hit the flow installed by handler-2 and be forwarded without going to >> the handler-2, while handler-2 still re-injects the first of these 10 >> packets. Given multiple recirculations and misses, these 10 packets >> may end up completely mixed up on the output from the datapath. >> >> Let's provide the original upcall PID via the new ntlink attribute >> OVS_PACKET_ATTR_UPCALL_PID. This way the upcall triggered during the >> execution will go to the same handler. Packets will be enqueued to >> the same socket and re-injected in the same order. This doesn't >> eliminate re-ordering as stated above, since we still have a race >> between the kernel and the handler thread, but it allows to eliminate >> races between multiple handlers. >> >> The openvswitch kernel module ignores unknown attributes for the >> OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older >> kernels. >> >> Reported-at: https://issues.redhat.com/browse/FDP-1479 >> Link: >> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/ >> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >> >> --- >> >> Version 1: >> * No changes since RFC. The kernel change got merged into net-next: >> >> https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/ >> Normally, we would wait for it to be in the Linus' tree, but it will >> not happen before the branching. > > > Acked-by: Flavio Leitner <f...@sysclose.org>
Thanks, Eelco and Flavio! Applied. Best regards, Ilya Maximets. _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev