On Fri May 29, 2026 at 6:26 PM CEST, Kevin Traynor wrote:
> On 5/28/26 10:29 AM, Eelco Chaudron wrote:
> > 
> > 
> > On 27 May 2026, at 16:37, Gaetan Rivet wrote:
> > 
> >> On Thu Apr 2, 2026 at 12:41 PM CEST, Kevin Traynor via dev wrote:
> >>> On 4/1/26 1:03 PM, Eelco Chaudron via dev wrote:
> >>>>
> >>>>
> >>>> On 1 Apr 2026, at 13:57, Eelco Chaudron via dev wrote:
> >>>>
> >>>>> This patch adds support for specific PMD thread initialization,
> >>>>> deinitialization, and a callback execution to perform work as
> >>>>> part of the PMD thread loop. This allows hardware offload
> >>>>> providers to handle any specific asynchronous or batching work.
> >>>>>
> >>>>> This patch also adds cycle statistics for the provider-specific
> >>>>> callbacks to the 'ovs-appctl dpif-netdev/pmd-perf-show' command.
> >>>>
> >>>> Bringing back the discussion on the earlier patch between Ilya and 
> >>>> Gaetan to this revision :)
> >>>>
> >>>> Ilya:
> >>>>   Hi, Eelco.  As we talked before, this infrastructure resembles the 
> >>>> async
> >>>>   work infra that was proposed in the past for the use case of async 
> >>>> vhost
> >>>>   processing.  And I don't see any real use case proposed for it here nor
> >>>>   in the RFC, where the question was asked, but not replied.
> >>>>
> >>>> Gaetan:
> >>>>
> >>>
> >>> Hi Gaetan,
> >>>
> >>> A few questions below. I'm not so clear on the DOCA threading
> >>> requirements, so questions may be broad.
> >>>
> >>>>   Hi Ilya, Eelco,
> >>>>
> >>>>   Thanks for the patch and for the review.
> >>>>
> >>>>   The use-case on our side is distributed data-structures in DOCA that
> >>>>   requires each participating threads to do maintenance work 
> >>>> periodically.
> >>>>
> >>>>   Specifically, offload threads will insert offload objects.
> >>>>   Those will reserve entries in a map that can be resized. The DOCA
> >>>>   implementation requires any thread that owns an entry to perform the
> >>>>   work of moving it to the new bucket / space after resize is initiated.
> >>>>
> >>>>   This is a pervasive design choice in DOCA, they write most of their 
> >>>> APIs
> >>>>   assuming participating threads are periodically calling into these
> >>>>   maintenance functions.
> >>>>
> >>>
> >>> What is a "particpating thread" ? IIUC, the pmd thread passes down the
> >>> flow pattern/action and the offload thread inserts the offload into the 
> >>> NIC.
> >>>
> >>> In that case, is it the offload thread that owns the entry ?
> >>>
> >>
> >> Participating threads are any threads that registered to DOCA-flow as
> >> offloading threads. In our case, it means:
> >>
> >>   * The main thread
> >>       --> When probing a port, starting it requires installing
> >>           DOCA offloads to execute RSS in particular, and a few other
> >>           'admin' offloads (optional rate-limiting on VF to avoid
> >>           noisy-neighbors, etc).
> >>
> >>   * The offload thread(s) (in the OVS sense)
> >>       A thread in OVS managing dp-flow offloads asynchronously.
> >>
> >>   * The polling thread(s)
> >>       CT-offload is much simpler and faster than dp-flow offload.
> >>       Executing offload insertion synchronously from the fastpath
> >>       is beneficial.
> >>
> >> In our case, 'participating threads' are any thread owning an offload
> >> queue in DOCA-flow.
> >>
> >> We have a few exceptions for the main thread, mainly that we force all
> >> offload operations to be fully synchronous there: we do not want to
> >> publish a new netdev if its 'admin' offloads have not yet been received
> >> and successfully acknowledged by the hardware, so we force waiting
> >> operations for it: it does not need to do regular upkeep etc.
> >>
> >>>>   Some of such work is also time-sensitive, for example the current
> >>>>   implementation requires a CT offload thread to receive completions 
> >>>> after
> >>>>   some hardware initialization. Until this completion is done, the CT
> >>>>   offload entry is not fully usable (cannot be queried for activity /
> >>>>   counters). We cannot leave batches of CT offload entry waiting for
> >>>>   completion, assuming that at some later point, we will eventually
> >>>>   re-execute something in our offload provider: it leaves a few stranded
> >>>>   connection objects incomplete.
> >>>>
> >>>>   This has the result of having hardware execution of a flow with CT
> >>>>   actions, but no activity counters: the software datapath then deletes
> >>>>   the connection and/or flow due to inactivity.
> >>>>
> >>>
> >>> Can this periodic work be done by the offload thread ? If it is fast
> >>> enough for inserting the offload, then maybe it is fast enough for this.
> >>>
> >>
> >> The PMD thread owns the offload queue. If another thread has to execute
> >> its upkeep work, it means sharing the queue between threads.
> >>
> >>> Some DPDK PMDs use alarms for periodic maintenance work, could they be
> >>> used inside DOCA for this?
> >>>
> >>
> >> Those upkeep functions are exposed by DOCA and part of the DOCA-flow
> >> API. DOCA does not expose an event framework to schedule this kind of
> >> work, it requires DOCA applications to explicitly call those functions.
> >>
> >>> If it needs to be on the PMD thread, is the work significant (i.e. more
> >>> than a few % cpu) and how variable is it ? Could it be added inside the
> >>> call to rte_eth_rx_burst polling ?
> >>>
> >>
> >> It can be significant.
> >> The work is anything requiring the use of the offload queue owned by
> >> this thread. The principle is that the owning thread must execute it.
> >>
> >> Currently, with CT offloads we have:
> >>
> >>   * offload queue polling for HW completion (requests have been
> >>     executed: add / mod / del were executed)
> >>
> >>   * CT-del: A conn was offloaded by PMD 1. The connection either expired
> >>     or another PMD 2 closed it: ct-clean or PMD-2 send a CT-del
> >>     request to PMD-1: PMD-1 must poll for CT-del requests and
> >>     execute them locally.
> >>
> >>   * Offload flush: when a port is deleted, all owning threads must
> >>     process a blocking flush request from the main thread. The main
> >>     thread only proceeds once all participating threads have completed
> >>     their flush.
> >>
> >> Completion is a very lightweight work, but we must execute it.
> >> Generally we do only completion polling as needed: we only clear enough
> >> room in the offload queue for the current batch of requests we want to
> >> enqueue, but we have an issue on idle: some stray completion can
> >> be left in the queue and won't be processed if we rely only on activity.
> >> Currently DOCA-flow does not support leaving the completions until the
> >> port is deleted: they need to be processed.
> >>
> >> CT-del can be significant in some cases. We have a 'rolling-window' case
> >> of constant open + close of short connections, and in this worst case,
> >> CT-del takes ~30% (both local and distant). Some portion of it comes from
> >> CT-del messages, in particular in case of multiple PMDs.
> >>
> >> Offload flush is generally quick, but we must answer the flush message
> >> quickly to block the main thread as little as possible.
> >>
> >> Some of the messages must be handled even if there is no RX-burst: a PMD
> >> that is waiting for reload will need to execute a flush message that it
> >> has received.
> > 
> > Hi Gaetan,
> > 
> > I guess Kevin is suggesting to hide this work in netdev_doca_rxq_recv(),
> > as it will always be called as long as DOCA ports are present on the
> > PMD. Or are there cases where this is not the case?
> > 
> > dp_netdev_process_rxq_port()
> >   netdev_rxq_recv()
> >     netdev_doca_rxq_recv()
> > 
> > Kevin, please confirm.
>
> Yes, that's what I was suggesting. The work is rxq specific and we
> already have an rxq specific call that is called in a loop so why not do
> it there and include the cycles needed for the maintenance work in the
> measured cycles needed for that rxq.
>
> > 
> >> I think completions and flushes would be the main issues with the
> >> rx-burst approach.

Hi,

We had an issue with this kind of approach with flush commands.
A PMD can be registered as a DOCA offload thread, in which case it
will receive a blocking flush request on port deletion.
This happens even if that port is not scheduled on that PMD.

The issue arises when the PMD has no netdev-doca rxq scheduled: it
is registered as a DOCA offload thread, but will never process its flush
requests. A typical example might be on multi-NUMA, where by default 1
PMD is created per NUMA, and ports are configured with 1 rxq. With a
single NIC, its rxq is configured on the closest PMD, leaving the other
one idle. The idle PMD is still registered as a DOCA offload thread, as
nothing forbids the user from adding a port on its NUMA at a future
time.

In this case, the idle PMD would never enter the right rxq-burst command
to process its offload messages.

All other cases would seem fine however, I think it almost works.
I just don't have a solid approach for this flush issue.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to