2025-01-04 11:22 (UTC-0500), Alan Beadle: > Hi everyone, > > I'm still stuck on this. Most likely I am doing something wrong in the > initialization phase. I am trying to follow the standard code example > for symmetric multi-process, but since my code is doing very different > things from this example I cannot even begin to guess where I am going > wrong. I do not even know if what I am trying to do is permissible in > the DPDK API. > > It would be very helpful if someone could provide an initialization > checklist for my use case (below). > > As explained previously, I have several separately launched processes. > These processes already share a memory region for local communication. > I want all of these processes to have equal ability to read incoming > packets, place pointers to the mbufs in shared memory, and wake each > other up when packets destined for a particular one of these processes > arrives. I have one X550-T2 NIC and I am only using one one of the > physical ports. It connects to a second machine which is doing > essentially the same thing, running the same DPDK code. > > In summary, each of my multiple processes should all be able to > equally receive packets of behalf of each other, and leave pointers to > rx'ed mbufs for each other in shared memory according to which process > the mbuf was destined for. Outbound packets may also be shared with > local peer processes for reading. In order to do this I am also > bumping the mbuf refcount until the peer process has read the mbuf. > > I already thought I had all of this working fine, but it turns out > that they were all taking turns on the same physical core, and > everything breaks when they are run concurrently on separate cores. I > have seen conflicting information in online threads about the thread > safety of the various DPDK functions that I am using. I tried adding > synchronization around DPDK allocation and tx/rx bursts to no avail. > My code detects weird errors where either mbufs contain unexpected > things (invalid reuse?) or tx bursts start to fail in one of the > processes. > > Frankly I also feel very confused about how ports, queues, mempools, > etc work and I suspect that a lot of what I have been reading is > outdated or faulty information. > > Any guidance at all would be greatly appreciated! > -Alan > > On Tue, Dec 31, 2024 at 12:49 PM Alan Beadle <ab.bea...@gmail.com> wrote: > > > > Hi everyone, > > > > I am working on a multi-process DPDK application. It uses one NIC, one > > port, and both separate processes send as well as receive, and they > > share memory for synchronization and IPC. > > > > I had previously made a mistake in setting up the lcores, and all of > > the processes were assigned to the same physical core. This seems to > > have concealed some DPDK thread safety issues which I am now dealing > > with. > > > > I understand that rte_eth_tx_burst() and rte_eth_rx_burst() are not > > thread safe. Previously I did not have any synchronization around > > these functions. Now that I am successfully using separate cores, I > > have added a shared spinlock around all invocations of these > > functions, as well as around all mbuf frees and allocations. > > > > However, when my code sends a packet, it checks the return value of > > rte_eth_tx_burst() to ensure that the packet was actually sent. If it > > fails to send, my app exits with an error. This was not previously > > happening, but now it happens every time I run it. I thought this was > > due to the lack of synchronization but it is still happening after I > > added the lock. Why would rte_eth_tx_burst() be failing now? > > > > Thank you, > > -Alan
Hi Alan, A lot is still unclear, let's start gradually. Thread-unsafe are queues, not calls to rte_eth_rx/tx_burst(). You can call rte_eth_rx/tx_burst() concurrently without synchronization if they operate on different queues. Typically you assign each lcore to operate on one or more queues, but no queue to be operated by multiple lcores. Otherwise you need to synchronize access, which obviously hurts scaling. Does this hold in your case? Lcore is a thread to which DPDK can dispatch work. By default it is pinned to one physical core unless --lcores is used. What is lcore-to-CPU mapping in your case? What is the design of your app regarding processes, lcores, and queues? That is: which process runs which lcores and which queues to the latter serve? P.S. Please don't top-post.