On 7/8/25 1:34 PM, Ilya Maximets wrote: > When a packet enters kernel datapath and there is no flow to handle it, > packet goes to userspace through a MISS upcall. With per-CPU upcall > dispatch mechanism, we're using the current CPU id to select the > Netlink PID on which to send this packet. This allows us to send > packets from the same traffic flow through the same handler. > > The handler will process the packet, install required flow into the > kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. > > While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a > recirculation action that will pass the (likely modified) packet > through the flow lookup again. And if the flow is not found, the > packet will be sent to userspace again through another MISS upcall. > > However, the handler thread in userspace is likely running on a > different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled > in the syscall context of that thread. So, when the time comes to > send the packet through another upcall, the per-CPU dispatch will > choose a different Netlink PID, and this packet will end up processed > by a different handler thread on a different CPU. > > The process continues as long as there are new recirculations, each > time the packet goes to a different handler thread before it is sent > out of the OVS datapath to the destination port. In real setups the > number of recirculations can go up to 4 or 5, sometimes more. > > There is always a chance to re-order packets while processing upcalls, > because userspace will first install the flow and then re-inject the > original packet. So, there is a race window when the flow is already > installed and the second packet can match it inside the kernel and be > forwarded to the destination before the first packet is re-injected. > But the fact that packets are going through multiple upcalls handled > by different userspace threads makes the reordering noticeably more > likely, because we not only have a race between the kernel and a > userspace handler (which is hard to avoid), but also between multiple > userspace handlers. > > For example, let's assume that 10 packets got enqueued through a MISS > upcall for handler-1, it will start processing them, will install the > flow into the kernel and start re-injecting packets back, from where > they will go through another MISS to handler-2. Handler-2 will install > the flow into the kernel and start re-injecting the packets, while > handler-1 continues to re-inject the last of the 10 packets, they will > hit the flow installed by handler-2 and be forwarded without going to > the handler-2, while handler-2 still re-injects the first of these 10 > packets. Given multiple recirculations and misses, these 10 packets > may end up completely mixed up on the output from the datapath. > > Let's provide the original upcall PID via the new ntlink attribute > OVS_PACKET_ATTR_UPCALL_PID. This way the upcall triggered during the > execution will go to the same handler. Packets will be enqueued to > the same socket and re-injected in the same order. This doesn't > eliminate re-ordering as stated above, since we still have a race > between the kernel and the handler thread, but it allows to eliminate > races between multiple handlers. > > The openvswitch kernel module ignores unknown attributes for the > OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older > kernels. > > Reported-at: https://issues.redhat.com/browse/FDP-1479 > Link: > https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/ > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> > --- > > Version 1: > * No changes since RFC. The kernel change got merged into net-next: > > https://lore.kernel.org/netdev/20250702155043.2331772-1-i.maxim...@ovn.org/ > Normally, we would wait for it to be in the Linus' tree, but it will > not happen before the branching. > > include/linux/openvswitch.h | 6 ++++++ > lib/dpif-netlink.c | 7 +++++++ > lib/dpif.h | 3 +++ > ofproto/ofproto-dpif-upcall.c | 5 +++++ > 4 files changed, 21 insertions(+)
Fedora's mirrors are not feeling good today. Recheck-request: github-robot _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev