On Thu, 25 Jan 2024 10:48:07 +0200 Pavel Vazharov <[email protected]> wrote:
> Hi there, > > I'd like to ask for advice for a weird issue that I'm facing trying to run > XDP on top of a bonding device (802.3ad) (and also on the physical > interfaces behind the bond). > > I've a DPDK application which runs on top of XDP sockets, using the DPDK > AF_XDP > driver <https://doc.dpdk.org/guides/nics/af_xdp.html>. It was a pure DPDK > application but lately it was migrated to run on top of XDP sockets because > we need to split the traffic entering the machine between the DPDK > application and other "standard-Linux" applications running on the same > machine. > The application works fine when running on top of a single interface but it > has problems when it runs on top of a bonding interface. It needs to be > able to run with multiple XDP sockets where each socket (or group of XDP > sockets) is/are handled in a separate thread. However, the bonding device > is reported with a single queue and thus the application can't open more > than one XDP socket for it. So I've tried binding the XDP sockets to the > queues of the physical interfaces. For example: > - 3 interfaces each one is set to have 8 queues > - I've created 3 virtual af_xdp devices each one with 8 queues i.e. in > summary 24 XDP sockets each bound to a separate queue (this functionality > is provided by the DPDK itself). > - I've run the application on 2 threads where the first thread handled the > first 12 queues (XDP sockets) and the second thread handled the next 12 > queues (XDP socket) i.e. the first thread worked with all 8 queues from > af_xdp device 0 and the first 4 queues from af_xdp device 1. The second > thread worked with the next 4 queues from af_xdp device 1 and all 8 queues > from af_xdp device 2. I've also tried another distribution scheme (see > below). The given threads just call the receve/transmit functions provided > by the DPDK for the assigned queues. > - The problem is that with this scheme the network device on the other side > reports: "The member of the LACP mode Eth-Trunk interface received an > abnormal LACPDU, which may be caused by optical fiber misconnection". And > this error is always reported for the last device/interface in the bonding > and the bonding/LACP doesn't work. > - Another thing is that if I run the DPDK application on a single thread, > and the sending/receiving on all queues is handled on a single thread, then > the bonding seems to work correctly and the above error is not reported. > - I've checked the code multiple times and I'm sure that each thread is > accessing its own group of queues/sockets. > - I've tried 2 different schemes of accessing but each one led to the same > issue. For example (device_idx - queue_idx), I've tried these two orders of > accessing: > Thread 1 Thread2 > (0 - 0) (1 - 4) > (0 - 1) (1 - 5) > ... (1 - 6) > ... (1 - 7) > (0 - 7) (2 - 0) > (1 - 0) (2 - 1) > (1 - 1) ... > (1 - 2) ... > (1 - 3) (2 - 7) > > Thread 1 Thread2 > (0 - 0) (0 - 4) > (1 - 0) (1 - 4) > (2 - 0) (2 - 4) > (0 - 1) (0 - 5) > (1 - 1) (1 - 5) > (2 - 1) (2 - 5) > ... ... > (0 - 3) (0 - 7) > (1 - 3) (1 - 7) > (2 - 3) (2 - 7) > > And here are my questions based on the above situation: > 1. I assumed that it's not possible to run multiple XDP sockets on top of > the bonding device itself and I need to "bind" the XDP sockets on the > physical interfaces behind the bonding device. Am I right about this or am > I missing something? > 2. Is the bonding logic (LACP management traffic) affected by the access > pattern of the XDP sockets? > 3. Is this scheme supposed to work or it's just that the design is wrong? I > mean, maybe a group of queues/sockets shouldn't be handled on a given > thread but only a single queue should be handled on a given application > thread. It's just that the physical devices have more queues setup on them > than the number of threads in the DPDK application and thus multiple queues > need to be handled on a single application thread. > > Any ideas are appreciated! > > Regards, > Pavel. Look at recent discussions on netdev mailing list. Linux bonding device still needs more work to fully support XDP.
