On 1/15/26 10:19 AM, Eelco Chaudron wrote:
> This patch series introduces a major architectural
> refactoring of Open vSwitch's hardware offload
> infrastructure. It replaces the tightly coupled
> `netdev-offload` implementation with a new, modular
> `dpif-offload-provider` framework.
> 
> MOTIVATION
> -------------------------------------------------------------
> The existing `netdev-offload` API tightly couples datapath
> implementations (like `dpif-netdev`) with specific offload
> technologies (DPDK's rte_flow). This design has several
> limitations:
> 
> - Rigid Architecture: It creates complex dependencies,
>   making the code difficult to maintain and extend.
> 
> - Limited Flexibility: Supporting multiple offload backends
>   simultaneously or adding new ones is cumbersome.
> 
> - Inconsistent APIs: The logic for handling different
>   offload types is scattered, leading to an inconsistent
>   and hard-to-follow API surface.
> 
> This refactoring aims to resolve these issues by creating a
> clean separation of concerns, improving modularity, and
> establishing a clear path for future hardware offload
> integrations.
> 
> PROPOSED SOLUTION: THE `DPIF-OFFLOAD-PROVIDER` FRAMEWORK
> -------------------------------------------------------------
> This series introduces the `dpif-offload-provider`
> framework, which functions similarly to the existing
> `dpif-provider` pattern. It treats hardware offload as a
> distinct layer with multiple, dynamically selectable
> backends.
> 
> Key features of the new framework include:
> 
> 1. Modular Architecture: A clean separation between the
>    generic datapath interface and specific offload
>    provider implementations (e.g., `dpif-offload-tc`,
>    `dpif-offload-dpdk`). `dpif` layers are now generic
>    clients of the offload API.
> 
> 2. Provider-based System: Allows multiple offload backends
>    to coexist.
> 
> 3. Unified and Asynchronous API: Establishes a consistent
>    API across all offload providers. For userspace
>    datapaths, the API is extended to support asynchronous
>    flow operations with callbacks, making `dpif-netdev` a
>    more efficient client.
> 
> 4. Enhanced Configuration: Provides granular control over
>    offload provider selection through a global and per-port
>    priority system (`hw-offload-priority`), allowing
>    fine-tuned policies for different hardware.
> 
> 5. Improved Testing: Includes a new test framework
>    specifically for validating DPDKs rte_flow offloads,
>    enhancing long-term maintainability.
> 
> PATCH SERIES ORGANIZATION
> -------------------------------------------------------------
> This large series is organized logically to facilitate
> review:
> 
> 1. Framework Foundation: The initial patches establish the
>    core `dpif-offload-provider` framework, including the
>    necessary APIs for port management, flow mark
>    allocation, configuration, and a dummy provider for
>    testing.
> 
> 2. Provider Implementation: These patches introduce the new
>    `dpif-offload-tc` and `dpif-offload-dpdk` providers,
>    building out their specific implementations on top of
>    the new framework.
> 
> 3. API Migration and Decoupling: The bulk of the series
>    systematically migrates functionality from the legacy
>    `netdev-offload` layer to the new providers. Key
>    commits here decouple `dpif-netdev` and, crucially,
>    `dpif-netlink` from their hardware offload
>    entanglements.
> 
> 4. Cleanup: The final patches remove the now-redundant
>    global APIs and structures from `netdev-offload`,
>    completing the transition.
> 
> BACKWARD COMPATIBILITY
> -------------------------------------------------------------
> This refactoring maintains full API compatibility from a
> user's perspective. All existing `ovs-vsctl` and
> `ovs-appctl` commands continue to function as before. The
> changes are primarily internal architectural improvements
> designed to make OVS more robust and extensible.
> 
> REQUEST FOR COMMENTS
> -------------------------------------------------------------
> This is a significant architectural change that affects
> core OVS infrastructure. We welcome feedback on:
> 
> - The overall architectural approach and the
>   `dpif-offload-provider` concept.
> - The API design, particularly the new asynchronous model
>   for `dpif-netdev`.
> - The migration strategy and any potential backward
>   compatibility concerns.
> - Performance implications of the new framework.
> 
> -------------------------------------------------------------
> Changes from v1:
>   - Fixed issues reported by Aaron and my AI experiments.
>   - See individual patches for specific changes.
> 
> Changes from v2:
>   - See individual patches for specific changes.
> 
> Changes from v3:
>   - In patch 7, removed leftover netdev_close(port->netdev)
>     causing netdev reference count issues.
>   - Changed naming for dpif_offload_impl_type enum entries.
>   - Merged previous patch 36, 'dpif-netdev: Add full name
>     to the dp_netdev structure', with the next patch.
> 
> Changes from v4:
>   - Applied review comments from Aaron, see individual patches.
>   - Added Eli's ACK to the individual patches.
> 
> Changes from v5:
>   - Addressed Ilya's comments, see upstream discussion for
>     details.
> 
> Eelco Chaudron (40):
>   dpif-offload-provider: Add dpif-offload-provider implementation.
>   dpif-offload: Add provider for tc offload.
>   dpif-offload: Add provider for dpdk (rte_flow).
>   dpif-offload: Allow configuration of offload provider priority.
>   dpif-offload: Move hw-offload configuration to dpif-offload.
>   dpif-offload: Add offload provider set_config API.
>   dpif-offload: Add port registration and management APIs.
>   dpif-offload-tc: Add port management framework.
>   dpif-offload-dpdk: Add port management framework.
>   dpif-offload: Validate mandatory port class callbacks on registration.
>   dpif-offload: Allow per-port offload provider priority config.
>   dpif-offload: Introduce provider debug information API.
>   dpif-offload: Call flow-flush netdev-offload APIs via dpif-offload.
>   dpif-offload: Call meter netdev-offload APIs via dpif-offload.
>   dpif-offload: Move the flow_get_n_flows() netdev-offload API to dpif.
>   dpif-offload: Move hw_post_process netdev API to dpif.
>   dpif-offload: Add flow dump APIs to dpif-offload.
>   dpif-offload: Move the tc flow dump netdev APIs to dpif-offload.
>   dpif-netlink: Remove netlink-offload integration.
>   dpif-netlink: Add API to get offloaded netdev from port_id.
>   dpif-offload: Add API to find offload implementation type.
>   dpif-offload: Add operate implementation to dpif-offload.
>   netdev-offload: Temporarily move thread-related APIs to dpif-netdev.
>   dpif-offload: Add port dump APIs to dpif-offload.
>   dpif-netdev: Remove indirect DPDK netdev offload API calls.
>   dpif: Add dpif_get_features() API.
>   dpif-offload: Add flow operations to dpif-offload-tc.
>   dpif-netlink: Remove entangled hardware offload.
>   dpif-offload-tc: Remove netdev-offload dependency.
>   netdev_dummy: Remove hardware offload override.
>   dpif-offload: Move the netdev_any_oor() API to dpif-offload.
>   netdev-offload: Remove the global netdev-offload API.
>   dpif-offload: Add inline flow APIs for userspace datapaths.
>   dpif_netdev: Fix nullable memcpy in queue_netdev_flow_put().
>   dpif-offload: Move offload_stats_get() API to dpif-offload.
>   dpif-offload-dpdk: Abstract rte_flow implementation from dpif-netdev.
>   dpif-offload-dummy: Add flow add/del/get APIs.
>   netdev-offload: Fold netdev-offload APIs and files into dpif-offload.
>   tests: Fix NSH decap header test for real Ethernet devices.
>   tests: Add a simple DPDK rte_flow test framework.
> 
>  Documentation/topics/testing.rst              |   19 +
>  include/openvswitch/json.h                    |    1 +
>  include/openvswitch/netdev.h                  |    1 +
>  include/openvswitch/thread.h                  |    6 +
>  lib/automake.mk                               |   17 +-
>  lib/dp-packet.h                               |    1 +
>  lib/dpctl.c                                   |   50 +-
>  lib/dpdk.c                                    |    2 -
>  lib/dpif-netdev-avx512.c                      |    4 +-
>  lib/dpif-netdev-private-flow.h                |    9 +-
>  lib/dpif-netdev.c                             | 1243 +++---------
>  lib/dpif-netlink.c                            |  557 +-----
>  ...load-dpdk.c => dpif-offload-dpdk-netdev.c} |  592 ++++--
>  lib/dpif-offload-dpdk-private.h               |   73 +
>  lib/dpif-offload-dpdk.c                       | 1114 +++++++++++
>  lib/dpif-offload-dummy.c                      |  890 +++++++++
>  lib/dpif-offload-provider.h                   |  412 ++++
>  ...-offload-tc.c => dpif-offload-tc-netdev.c} |  229 ++-
>  lib/dpif-offload-tc-private.h                 |   73 +
>  lib/dpif-offload-tc.c                         |  810 ++++++++
>  lib/dpif-offload.c                            | 1773 +++++++++++++++++
>  lib/dpif-offload.h                            |  227 +++
>  lib/dpif-provider.h                           |   66 +-
>  lib/dpif.c                                    |  164 +-
>  lib/dpif.h                                    |   14 +-
>  lib/dummy.h                                   |    9 +
>  lib/json.c                                    |    7 +
>  lib/netdev-dpdk.c                             |    9 +-
>  lib/netdev-dpdk.h                             |    2 +-
>  lib/netdev-dummy.c                            |  199 +-
>  lib/netdev-linux.c                            |    3 +-
>  lib/netdev-offload-provider.h                 |  148 --
>  lib/netdev-offload.c                          |  910 ---------
>  lib/netdev-offload.h                          |  169 --
>  lib/netdev-provider.h                         |   10 +-
>  lib/netdev.c                                  |   71 +-
>  lib/netdev.h                                  |   22 +
>  lib/tc.c                                      |    2 +-
>  ofproto/ofproto-dpif-upcall.c                 |   50 +-
>  ofproto/ofproto-dpif.c                        |   82 +-
>  tests/.gitignore                              |    3 +
>  tests/automake.mk                             |   24 +
>  tests/dpif-netdev.at                          |   37 +-
>  tests/ofproto-dpif.at                         |  170 ++
>  tests/ofproto-macros.at                       |   17 +-
>  tests/sendpkt.py                              |   12 +-
>  tests/system-dpdk-offloads-macros.at          |  236 +++
>  tests/system-dpdk-offloads-testsuite.at       |   28 +
>  tests/system-dpdk-offloads.at                 |  223 +++
>  tests/system-dpdk.at                          |   35 +
>  tests/system-kmod-macros.at                   |    5 +
>  tests/system-offloads-testsuite-macros.at     |    5 +
>  tests/system-offloads-traffic.at              |   46 +
>  tests/system-traffic.at                       |    9 +-
>  tests/system-userspace-macros.at              |    5 +
>  vswitchd/bridge.c                             |    7 +-
>  vswitchd/vswitch.xml                          |   44 +
>  57 files changed, 7601 insertions(+), 3345 deletions(-)
>  rename lib/{netdev-offload-dpdk.c => dpif-offload-dpdk-netdev.c} (83%)
>  create mode 100644 lib/dpif-offload-dpdk-private.h
>  create mode 100644 lib/dpif-offload-dpdk.c
>  create mode 100644 lib/dpif-offload-dummy.c
>  create mode 100644 lib/dpif-offload-provider.h
>  rename lib/{netdev-offload-tc.c => dpif-offload-tc-netdev.c} (95%)
>  create mode 100644 lib/dpif-offload-tc-private.h
>  create mode 100644 lib/dpif-offload-tc.c
>  create mode 100644 lib/dpif-offload.c
>  create mode 100644 lib/dpif-offload.h
>  delete mode 100644 lib/netdev-offload-provider.h
>  delete mode 100644 lib/netdev-offload.c
>  delete mode 100644 lib/netdev-offload.h
>  create mode 100644 tests/system-dpdk-offloads-macros.at
>  create mode 100644 tests/system-dpdk-offloads-testsuite.at
>  create mode 100644 tests/system-dpdk-offloads.at
> 

Thanks, Eelco for the update!  I read through the set again and it
reads much easier after the renames.  There are still a few small
style issues throughout the set, but they are all minor like an
extra empty line at the EOF of dpif-offload.c in patch 7, a few
more places with a missing double space or underscore or the thread
safety annotations not on a separate line.  There is also a missed
rename of dpif_offload_flow_get_n_offloaded() into something like
dpif_ofload_flow_count(), since the underlying callback was renamed.
And the provider_collection_add() should return a positive errno
after all, since it's passed directly into ovs_strerror().

As discussed off-list, instead of me writing all of these nits down
in the emails and then you fixing them, it's simpler if I just
handle the merge and fix the nits on commit.

Also, found a couple more issues that we should address:

1. In the first patch, the dpif_offload_dump_next() function is
   describing a case that should be impossible, but still tries
   to handle it.  We shouldn't do that.  And instead we need to
   just take the collection reference in dpif_offload_dump_start()
   to make sure the collection doesn't go away during the dump.
   And then we can fully rely on the LIST_FOR_EACH_CONTINUE.

2. The per-port priority configuration should be per-interface
   instead as datapath doesn't deal with ports, it only works
   with interfaces.  And some ports, like bonds can have multiple
   interfaces.  It may not be a big deal as in a normal case
   I don't think we would need different offload for interfaces
   in the same port, but it's not really correct to configure
   ports, when it is applied on the interface.

And there are two things that I flagged in v5, but we decided to
handle separately:

3. Mass-rename of offload provider structures and functions to
   make them coherent and shorter than they are.  Most of them
   are not exported and don't need extensive prefixes.  And these
   prefixes need unification after moving stuff between netdev*
   and dpif* modules.

4. Move of the port manager instances from provider-specific
   structures to the common struct dpif_offload.  This should
   eliminate some duplication and make the module boundaries
   more clear.  This should also eliminate the need for the
   port dump API or at least significantly simplify it, e.g.
   by just iterating over port cmap with a cmap_cursor.


So, the situation is the following: We have branching planned for
tomorrow.  Full CI run takes about 24 hours.  So, we don't really
have time for v7.  We could postpone branching, but it also seems
unreasonable as we only really need changes in about 4 patches
out of 40, and none of the issues listed above are critical or
breaking any functionality.  They are mostly internal code movements.
It would be much easier to get the set merged and then work on the
4 items above next week and backport to the new branch.

As mentioned earlier, to save some time, I can handle the merging
and fix all the small nits on commit instead of wasting a lot of
time on writing them down on the mailing list and then Eelco
fixing them.

Eelco, Aaron, does that sound like a good plan?  Or did I miss
anything in my summary?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to