Hi Tom,

On Wed, 23 Jul 2025, Tom Barbette wrote:


Hi all,

 

As Ivan mentioned, this is exactly what we did in RSS++.

 

For the concern about RSS reprogramming in « live », it depends on the NIC. I 
remember the Intel card we used could use the “global” API just fine. For the 
Mellanox cards we had to use
the rte_flow RSS action as reprogramming the global RETA table would lead to a 
(partial ?) device restart and would lead to the loss of many packets. We had 
to play with priority and

Valid point, indeed. So some drivers, just like with the MTU update in started
state, may need internal port restart. Thanks for clarifying this.

prefixes, but

rte_flow and mlx5 support has evolved since then, it might be a bit simpler, 
just using priorities and groups maybe.

 

The biggest challenge was the state, as written in the paper. We ended up with 
using the rte_flow rules anyway so we can use an epoch “mark” action that marks 
the version of the
distribution table and allow an efficient passing of the state of flows going 
from one core to another.

The code of RSS++ is still coupled a bit to FastClick, but it was mostly 
separated already here : 
https://github.com/tbarbette/fastclick/tree/main/vendor/nicscheduler

We also had a version for the Linux Kernel with XDP for counting.

 

We can chat about that if you want.

 

NB : my address has changed, I’m not at kth anymore.

I apologise for confusing it. Found it at the top of https://github.com/rsspp .

Thank you.


 

Cheers,

Tom

 

 

De : Stephen Hemminger <step...@networkplumber.org>
Date : mardi, 15 juillet 2025 à 23:40
À : Scott Wasson <swas...@microsoft.com>
Cc : users@dpdk.org <users@dpdk.org>
Objet : Re: rte_eth_dev_rss_reta_update() locking considerations?

On Tue, 15 Jul 2025 16:15:22 +0000
Scott Wasson <swas...@microsoft.com> wrote:

> Hi,
>
> We're using multiqueue, and RSS doesn't always balance the load very well.  I 
had a clever idea to periodically measure the load distribution (cpu load on the 
IO cores)  in the
background pthread, and use rte_eth_dev_rss_reta_update() to adjust the 
redirection table dynamically if the imbalance exceeds a given threshold.  In 
practice it seems to work nicely.  
But I'm concerned about:
>
>https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdoc.dpdk.org%2Fapi%2Frte__ethdev_8h.html%23a3c1540852c9cf1e576a883902c2e310d&data=05%7C02%7Ctom.barbette%40uclouvain.be
%7Cebeee334aef74a19446308ddc3e83545%7C7ab090d4fa2e4ecfbc7c4127b4d582ec%7C1%7C0%7C638882124267510617%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIs
IkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=BopyVlMOW0CGCdLDk9Q%2BLf87r81NOzCG%2Bv4w4rMezDI%3D&reserved=0
>
> Which states:
>
> By default, all the functions of the Ethernet Device API exported by a PMD 
are lock-free functions which assume to not be invoked in parallel on different 
logical cores to work on the
same target object. For instance, the receive function of a PMD cannot be 
invoked in parallel on two logical cores to poll the same Rx queue [of the same 
port]. Of course, this function
can be invoked in parallel by different logical cores on different Rx queues. 
It is the responsibility of the upper level application to enforce this rule.
>
> In this context, what is the "target object"?  The queue_id of the port?  Or 
the port itself?  Would I need to add port-level spinlocks around every invocation of 
rte_eth_dev_*()? 
That's a hard no, it would destroy performance.
>
> Alternatively, if I were to periodically call rte_eth_dev_rss_reta_update() 
from the IO cores instead of the background core, as the above paragraph suggests, 
that doesn't seem correct
either.  The function takes a reta_conf[] array that affects all RETA entries 
for that port and maps them to a queue_id.  Is it safe to remap RETA entries 
for a given port on one IO core
while another IO core is potentially reading from its rx queue for that same 
port?  That problem seems not much different from remapping in the background 
core as I am now.
>
> I'm starting to suspect this function was intended to be initialized once on 
startup before rte_eth_dev_start(), and/or the ports must be stopped before 
calling it.  If that's the
case, then I'll call this idea too clever by half and give it up now.
>
> Thanks in advance for your help!
>
> -Scott
>

There is no locking in driver path for control.
It is expected that application will manage access to control path (RSS being 
one example)
so that only one thread modifies the PMD.


Reply via email to