On 28 Jun 2025, at 0:07, Ilya Maximets wrote:
> When a packet enters kernel datapath and there is no flow to handle it, > packet goes to userspace through a MISS upcall. With per-CPU upcall > dispatch mechanism, we're using the current CPU id to select the > Netlink PID on which to send this packet. This allows us to send > packets from the same traffic flow through the same handler. > > The handler will process the packet, install required flow into the > kernel and re-inject the original packet via OVS_PACKET_CMD_EXECUTE. > > While handling OVS_PACKET_CMD_EXECUTE, however, we may hit a > recirculation action that will pass the (likely modified) packet > through the flow lookup again. And if the flow is not found, the > packet will be sent to userspace again through another MISS upcall. > > However, the handler thread in userspace is likely running on a > different CPU core, and the OVS_PACKET_CMD_EXECUTE request is handled > in the syscall context of that thread. So, when the time comes to > send the packet through another upcall, the per-CPU dispatch will > choose a different Netlink PID, and this packet will end up processed > by a different handler thread on a different CPU. > > The process continues as long as there are new recirculations, each > time the packet goes to a different handler thread before it is sent > out of the OVS datapath to the destination port. In real setups the > number of recirculations can go up to 4 or 5, sometimes more. > > There is always a chance to re-order packets while processing upcalls, > because userspace will first install the flow and then re-inject the > original packet. So, there is a race window when the flow is already > installed and the second packet can match it inside the kernel and be > forwarded to the destination before the first packet is re-injected. > But the fact that packets are going through multiple upcalls handled > by different userspace threads makes the reordering noticeably more > likely, because we not only have a race between the kernel and a > userspace handler (which is hard to avoid), but also between multiple > userspace handlers. > > For example, let's assume that 10 packets got enqueued through a MISS > upcall for handler-1, it will start processing them, will install the > flow into the kernel and start re-injecting packets back, from where > they will go through another MISS to handler-2. Handler-2 will install > the flow into the kernel and start re-injecting the packets, while > handler-1 continues to re-inject the last of the 10 packets, they will > hit the flow installed by handler-2 and be forwarded without going to > the handler-2, while handler-2 still re-injects the first of these 10 > packets. Given multiple recirculations and misses, these 10 packets > may end up completely mixed up on the output from the datapath. > > Let's provide the original upcall PID via the new ntlink attribute > OVS_PACKET_ATTR_UPCALL_PID. This way the upcall triggered during the > execution will go to the same handler. Packets will be enqueued to > the same socket and re-injected in the same order. This doesn't > eliminate re-ordering as stated above, since we still have a race > between the kernel and the handler thread, but it allows to eliminate > races between multiple handlers. > > The openvswitch kernel module ignores unknown attributes for the > OVS_PACKET_CMD_EXECUTE, so it's safe to provide it even on older > kernels. > > Reported-at: https://issues.redhat.com/browse/FDP-1479 > Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> This change looks good to me. Did minimal testing with and without the related kernel patch applied. So I guess I will (re)review and ack the non-RFC one, once it’s available. Cheers, Eelco _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev