On 27 May 2026, at 16:37, Gaetan Rivet wrote:
> On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote:
>> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
>>>
>>>
>>> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
>>>
>>>> This patch adds support for specific PMD thread initialization,
>>>> deinitialization, and a callback execution to perform work as
>>>> part of the PMD thread loop. This allows hardware offload
>>>> providers to handle any specific asynchronous or batching work.
>>>>
>>>> This patch also adds cycle statistics for the provider-specific
>>>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
>>>
>>> Bringing back the discussion on the earlier patch between Ilya and Gaetan
>>> to this revision :)
>>>
>>> Ilya:
>>> Hi, Eelco. As we talked before, this infrastructure resembles the async
>>> work infra that was proposed in the past for the use case of async vhost
>>> processing. And I don't see any real use case proposed for it here nor
>>> in the RFC, where the question was asked, but not replied.
>>>
>>> Gaetan:
>>>
>>
>> Hi Gaetan,
>>
>> A few questions below. I'm not so clear on the DOCA threading
>> requirements, so questions may be broad.
>>
>>> Hi Ilya, Eelco,
>>>
>>> Thanks for the patch and for the review.
>>>
>>> The use-case on our side is distributed data-structures in DOCA that
>>> requires each participating threads to do maintenance work periodically.
>>>
>>> Specifically, offload threads will insert offload objects.
>>> Those will reserve entries in a map that can be resized. The DOCA
>>> implementation requires any thread that owns an entry to perform the
>>> work of moving it to the new bucket / space after resize is initiated.
>>>
>>> This is a pervasive design choice in DOCA, they write most of their APIs
>>> assuming participating threads are periodically calling into these
>>> maintenance functions.
>>>
>>
>> What is a "particpating thread" ? IIUC, the pmd thread passes down the
>> flow pattern/action and the offload thread inserts the offload into the NIC.
>>
>> In that case, is it the offload thread that owns the entry ?
>>
>
> Participating threads are any threads that registered to DOCA-flow as
> offloading threads. In our case, it means:
>
> * The main thread
> --> When probing a port, starting it requires installing
> DOCA offloads to execute RSS in particular, and a few other
> 'admin' offloads (optional rate-limiting on VF to avoid
> noisy-neighbors, etc).
>
> * The offload thread(s) (in the OVS sense)
> A thread in OVS managing dp-flow offloads asynchronously.
>
> * The polling thread(s)
> CT-offload is much simpler and faster than dp-flow offload.
> Executing offload insertion synchronously from the fastpath
> is beneficial.
>
> In our case, 'participating threads' are any thread owning an offload
> queue in DOCA-flow.
>
> We have a few exceptions for the main thread, mainly that we force all
> offload operations to be fully synchronous there: we do not want to
> publish a new netdev if its 'admin' offloads have not yet been received
> and successfully acknowledged by the hardware, so we force waiting
> operations for it: it does not need to do regular upkeep etc.
>
>>> Some of such work is also time-sensitive, for example the current
>>> implementation requires a CT offload thread to receive completions after
>>> some hardware initialization. Until this completion is done, the CT
>>> offload entry is not fully usable (cannot be queried for activity /
>>> counters). We cannot leave batches of CT offload entry waiting for
>>> completion, assuming that at some later point, we will eventually
>>> re-execute something in our offload provider: it leaves a few stranded
>>> connection objects incomplete.
>>>
>>> This has the result of having hardware execution of a flow with CT
>>> actions, but no activity counters: the software datapath then deletes
>>> the connection and/or flow due to inactivity.
>>>
>>
>> Can this periodic work be done by the offload thread ? If it is fast
>> enough for inserting the offload, then maybe it is fast enough for this.
>>
>
> The PMD thread owns the offload queue. If another thread has to execute
> its upkeep work, it means sharing the queue between threads.
>
>> Some DPDK PMDs use alarms for periodic maintenance work, could they be
>> used inside DOCA for this?
>>
>
> Those upkeep functions are exposed by DOCA and part of the DOCA-flow
> API. DOCA does not expose an event framework to schedule this kind of
> work, it requires DOCA applications to explicitly call those functions.
>
>> If it needs to be on the PMD thread, is the work significant (i.e. more
>> than a few % cpu) and how variable is it ? Could it be added inside the
>> call to rte_eth_rx_burst polling ?
>>
>
> It can be significant.
> The work is anything requiring the use of the offload queue owned by
> this thread. The principle is that the owning thread must execute it.
>
> Currently, with CT offloads we have:
>
> * offload queue polling for HW completion (requests have been
> executed: add / mod / del were executed)
>
> * CT-del: A conn was offloaded by PMD 1. The connection either expired
> or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del
> request to PMD-1: PMD-1 must poll for CT-del requests and
> execute them locally.
>
> * Offload flush: when a port is deleted, all owning threads must
> process a blocking flush request from the main thread. The main
> thread only proceeds once all participating threads have completed
> their flush.
>
> Completion is a very lightweight work, but we must execute it.
> Generally we do only completion polling as needed: we only clear enough
> room in the offload queue for the current batch of requests we want to
> enqueue, but we have an issue on idle: some stray completion can
> be left in the queue and won't be processed if we rely only on activity.
> Currently DOCA-flow does not support leaving the completions until the
> port is deleted: they need to be processed.
>
> CT-del can be significant in some cases. We have a 'rolling-window' case
> of constant open + close of short connections, and in this worst case,
> CT-del takes ~30% (both local and distant). Some portion of it comes from
> CT-del messages, in particular in case of multiple PMDs.
>
> Offload flush is generally quick, but we must answer the flush message
> quickly to block the main thread as little as possible.
>
> Some of the messages must be handled even if there is no RX-burst: a PMD
> that is waiting for reload will need to execute a flush message that it
> has received.
Hi Gaetan,
I guess Kevin is suggesting to hide this work in netdev_doca_rxq_recv(),
as it will always be called as long as DOCA ports are present on the
PMD. Or are there cases where this is not the case?
dp_netdev_process_rxq_port()
netdev_rxq_recv()
netdev_doca_rxq_recv()
Kevin, please confirm.
> I think completions and flushes would be the main issues with the
> rx-burst approach.
[...]
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev