Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

Van Haaren, Harry Fri, 14 Feb 2025 01:23:45 -0800

> From: NAGENDRA BALAGANI <nagendra.balag...@oracle.com>
> Sent: Friday, February 14, 2025 8:43 AM
> To: users@dpdk.org <users@dpdk.org>
> Subject: Query Regarding Race Condition Between Packet Reception and Device 
> Stop in DPDK
>
> Hi Team,


Ni Nagendra,

> We are facing a race condition in our DPDK application where one thread is 
> reading packets from queue using rte_eth_rx_burst() , while another thread is 
> attempting to stop the device using rte_eth_dev_stop(). This is causing 
> instability, as the reading thread may still be accessing queues while the 
> device is being stopped.

This is as expected - it is not valid to stop a device while other cores are 
using it.

> Could you please suggest the best way to mitigate this race condition without 
> impacting fast path performance? We want to ensure safe synchronization while 
> maintaining high throughput.

There are many implementations possible, but the end result of them all is 
"ensure that the dataplane core is NOT polling a device that is stopping".

1) One implementation is using a "force_quit" boolean value (see 
dpdk/examples/l2fwd/main.c for example). This approach changes the lcore's 
"while (1)" polling loop, and turns it into a "while (!force_quit)". (Note some 
nuance around "volatile" keyword for the boolean to ensure reloading on each 
iteration, but that's off topic).

2) Another more flexible/powerful implementation could be some form of message 
passing. For example imagine the dataplane thread and control plane (stopping 
ethdev) thread are capable of communicating by sending an "event" to eachother. 
When a "stop polling" event is recieved by the dataplane thread, it disables 
polling just that eth device/queue, and responds with a "stopped polling" 
reply. On recieving the "stopped polling" event, the thread that wants to stop 
the eth device can now safely do so.

Both of these implementations will have no datapath performance impact:
1) a single boolean value check (shared state cache-line, likely in the core's 
cache) per iteration of polling of the app is super lightweight
2) an "event ringbuffer" check (when empty, also shared-state, likely in cache) 
per iteration is also super light.

General notes on the above:
There's even an option to only check the boolean/event-ringbuffer once every N 
iterations: this will cause even less overhead, but will increase the latency 
of event action/reply on the datapath thread. As almost always, it depends on 
what's important for your use-case!

The main difference between implementation 1) and 2) above can be captured by 
this phrase: "Do not communicate by sharing memory; instead, share memory by 
communicating.", which I read at the Rust docs here: 
https://doc.rust-lang.org/book/ch16-02-message-passing.html. 1) literally 
shares memory (both threads access the force_quit value directly). 2) focusses 
on communicating: which enables avoiding the race condition in a more 
powerful/elegant way (and future proof too - it allows adding new event types 
cleanly, which the force_quit bool value does not.) I like this 
design-mentality, as it is a good/high-performance/scalable way of interacting 
between threads, and scales to future needs too: so I recommend approach 2.

> Looking forward to your insights.
>
> Regards,
> Nagendra

Regards, -Harry

Re: Query Regarding Race Condition Between Packet Reception and Device Stop in DPDK

Reply via email to