> Thread-unsafe are queues, not calls to rte_eth_rx/tx_burst(). > You can call rte_eth_rx/tx_burst() concurrently without synchronization > if they operate on different queues. > Typically you assign each lcore to operate on one or more queues, > but no queue to be operated by multiple lcores. > Otherwise you need to synchronize access, which obviously hurts scaling. > Does this hold in your case?
> Lcore is a thread to which DPDK can dispatch work. > By default it is pinned to one physical core unless --lcores is used. > What is lcore-to-CPU mapping in your case? > > What is the design of your app regarding processes, lcores, and queues? > That is: which process runs which lcores and which queues to the latter serve? > > P.S. Please don't top-post. Hi Dmitry, On one machine I have two processes and on the other there are three processes. For simplicity we can focus on the former but the only difference is the number of secondary processes. It is also worth noting that I am developing a shared library which uses DPDK internally, rather than the app code directly using DPDK. Therefore, all of my DPDK command line args for rte_eal_init() are actually kept in char arrays inside of the library code. I am setting up one tx queue and one rx queue via the primary process init code. The code here resembles the "basic forwarding" sample application (in the skeleton/ subdir). Please let me know whether it would be possible for each process to use entirely separate queues and still pass mbuf pointers around. Before I worry about scaling though, I want correctness first. My application is more likely to be CPU-bound than network-bound but for other reasons (this is a research project) I must use user-mode networking, which is why I am using DPDK. I will explain the processes and lcores below. First of all though, my application uses several types of packets. There are handshake packets (acknacks and the like) and data packets. The data packets are addressed to specific subsets of my secondary processes (for now just 1 or 2 secondary processes exist per machine, but support for even more is in principle part of my design). Sometimes the data should also be read by other peer processes on the same machine (including the daemon/primary process) so I chose to make the mbufs readable instead of allocating a duplicate local buffer. It is important that mbuf pointers from one process will work in the others. Otherwise all of my data would need to be duplicated into non-dpdk shared buffers too. The first process is the "daemon". This is the primary process. It uses DPDK through my shared library (which uses DPDK internally, as explained above). The daemon just polls the NIC and periodically cleans up my non-DPDK data structures in shared memory. The intent is to rely on the daemon to watch for packets during periods of low activity and avoid unnecessary CPU usage. When a packet arrives it can wake the correct secondary process by finding a futex in shared memory for that process. On both machines the daemon is mapped to core 2 with the parameter "-l 2". The second process is the "server". It uses separate app code from the daemon but calls into the same library. Like the daemon, it receives and parses packets. The server can originate new data packets, and can also reply to inbound data packets with more data packets to be sent back to processes on the other machine. It sleeps on a shared futex during periods of inactivity. If there were additional secondary processes (as is the case on the other machine) it could wake them when packets arrive for those other processes, again using futexes in shared memory. On both machines this second process is mapped to core 4 with the parameter "-l 4". The other machine has another secondary process (a third process) which is on core 6 with "-l 6". For the purposes of this discussion, it behaves similarly to the server process above (sends, receives, and sometime sleeps). Thank you, -Alan