>-----Original Message-----
>From: Ilya Maximets <[email protected]>
>Sent: Thursday, 23 October 2025 15:30
>To: Eelco Chaudron <[email protected]>; [email protected]; Kevin
>Traynor <[email protected]>
>Cc: Eli Britstein <[email protected]>; Simon Horman <[email protected]>;
>Maor Dickman <[email protected]>; [email protected]
>Subject: Re: [ovs-dev] PMD Scheduling: Grouping RX Queues from Related Ports
>
>External email: Use caution opening links or attachments
>
>
>On 10/23/25 1:48 PM, Eelco Chaudron via dev wrote:
>> Hi all,
>>
>> We’d like to bring a design discussion to the community regarding a
>> requirement for RX queues from different ports to be grouped on the same
>PMD.
>> We’ve had some initial talks with the NVIDIA team (who are CC’d), and
>> I think this discussion will benefit from upstream feedback and involvement.
>>
>> Here is the background and context:
>>
>> The goal is to automatically (i.e., without user configuration) group
>> together the same queue IDs from different, but related, ports. A key
>> use case is an E-Switch manager (e.g., p0) and its VF representatives (e.g.,
>pf0vf0, pf0vf1).
>
>Could you explain why this is a requirement to poll the same queue ID of
>different, though related, ports by the same thread?  It's not obvious.
>I suspect, in a typical setup with hardware offload most of the ports will be
>related this way.
[Eli Britstein] 

With DOCA ports, we call rx-burst only for the ESW manager port (of a specific 
queue). In the same burst we get packets from this port (e.g. p0) as well as of 
all its representors (pf0vf0, pf0vf1 etc).
The HW is configured to set the mark field as a metadata with the port-id of 
that packet.
Then, we go over this burst and classify the packets, to a per-port (of that 
queue #) data structure.
OVS model is calling "input" per port. We then return the burst of that data 
structure.
Since this data structure is not thread safe, it works for us if we force the 
processing of a specific queue for all those ports to be processed in the same 
PMD thread.
That PMD thread will loop over all of them (by its poll_list). For each it 
calls netdev_rxq_recv().
Under the hood, we do the above (reading a burst from HW only for the ESW 
manager, classifying and returning the classified burst).

For the scheduling (just for a reference, in our downstream code) we did the 
scheduling in 2 phases (change in sched_numa_list_schedule):
The first iteration skips representor ports. Only ESW manager ports are 
scheduled (summing up cycles if needed for itself and its representors). The 
scheduled RXQs are kept in a list.
The 2nd iteration schedules the representor ports. They are not scheduled 
according to any algorithm but only get the scheduled PMD from the one of their 
ESW manager (with the help of the list from the first iteration).

This is tailored for DOCA mode. As part of the effort, we want to upstream DOCA 
support we wanted a more generic support.

>
>>
>> This new grouping logic must also respect existing scheduling
>> algorithms like ‘cycles’. For example, if ‘cycles’ is used, the
>> scheduler would need to base its decision on the sum of cycles for all RX
>queues within that group.
>>
>> For this, we think we need some kind of netdev API that tells the
>> rxq_scheduling() function which port-queues belong to a group. Once
>> this group is known, the algorithm can perform the proper calculation on the
>aggregated group.
>>
>> Does this approach sound reasonable? We are very open to other ideas
>> on how to discover these related queues.
>>
>> Kevin, I’ve copied you in, as you did most of the existing
>> implementation, so any feedback is appreciated.
>>
>> Cheers,
>> Eelco
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to