Hi all, I've got a question about mbuf pool and ring sizes - DPDK 17.02 PMD.
I've got a pipelined application running with RSS on a Cavium CN83XX. 40GE, 4 RSS queues wide and a pipeline 3 deep, ISOLCPUs with only DPDK running on each of the 12 worker cores. There are two RTE SP/SC rings per RSS queue for communication between the pipeline stages - the rings are 1024 deep, 512 cache, and an mbuf pool of 16K-1. Performance is generally good - 40G in and 40G out with 1M flows of 512 byte packets, EXCEPT for intermittent drops on the order of a few dozen to a few hundred packets/second. I did some timing measurements and found that sometimes a packet can take much longer to get through the pipeline, despite being identical (except for destination address) and taking an identical(ish) code path - sometimes two to three orders of magnitude longer. I tried measuring where the extra time was going, but pretty much everything I tried perturbed the system, so I wasn't easily able to get a clear answer. One of my suspicions is the per-lcore mbuf cache flush/fill, since the rx and tx are being done by different cores. Is there an efficient way to manage the mbuf pool in this case than rte_pktmbuf_pool_create? Some cores don't allocate or free mbufs, so I'm also curious if I'm losing mbufs to the caches on those cores. Since I have memory to burn I figured I could absorb any glitches by increasing the RX/TX descriptor pool, mbuf pool, and ring sizes, allowing more packets to be buffered during the glitches. This didn't help, which I guess makes sense if my issue is lock contentioon on the mbuf cache, which I can't make larger. Almost all of the DPDK examples and applications I could find use roughly the same parameters - 128-512 buffer descriptors, 4-16K mbuf pool, 1K ring sizes, etc. It seems that there are diminishing returns for increasing much beyond these values, why is that?