Hello, We have systems with say 10 cores, where 8 of them are dedicated to dpdk workers while the other two are used by about 40 regular pthread threads (running with default priority, SCHED_OTHER) and the OS. Each of the workers and the threads have an rte_ring for IPC, configured with SC and the classical MP mode.
When the system is under heavy load, one of the preemptible threads sometimes gets stuck in rte_ring_enqueue() over 10 seconds and gets aborted by our healthcheck thread. In the core dump I can see that other threads and workers are also enqueuing on the same ring at that time (specifically they are all at rte_wait_until_equal_32()). The env_abstraction_layer documentation describes the “non-preemptive” constraint of rte_ring, and although it doesn't sound like our case should deadlock due to this, perhaps given the many threads and workers the performance penalty could reach such a long hang? The documentation of rte_ring describes alternative MP modes such as RTS and HTS that help to avoid the Lock-Waiter-Preemption (LWP) problem, does it mean it would avoid the problem described in the env_abstraction_layer above? I wonder if in our case where we have non-preemptive dpdk workers and preemptive pthread threads, whether we can mix the MP mode while enqueuing on the same ring? i.e. configure the ring with RTS/HTS mode and have the dpdk workers use rte_ring_mp_enqueue() while other threads use the generic rte_ring_enqueue() which will result RTS/HTS enqueue? Also, as both RTS and HTS help to avoid the LWP problem, and HTS does it with a single CAS while RTS requires two of them according to the doc, is there any reason to use RTS over HTS? Thanks a lot!
