On 2/26/26 6:24 PM, David Marchand via dev wrote:
> By default, DPDK probes all available resources (like PCI devices) and
> partially initialises them (/ takes over them).
> This behavior has been relied on by OVS, since netdev-dpdk introduction.
> It is not needed since DPDK device hotplug has been supported and used
> for some time now.
> 
> Besides, this initial probing may not be desirable:
> - for PCI devices bound to vfio-pci, the first application taking over
>   them "wins", meaning that OVS would prevent qemu from using some VF
>   devices,
> - for mlx5 devices,
>   - the driver maintains link status and liveness of all ports
>     (taking some kernel lock) even when OVS only uses a subset of them,
>   - if some driver feature needs to be enabled for one port via a devargs,
>     this would have to be set in dpdk-extra,
> 
> Change this behavior and disable the initial PCI probing by passing
> a specially crafted allow list: this implementation is not elegant
> but it has been successfully used (for the PCI part) in a number of
> setups I know, and there is no better DPDK API to achieve the same
> at the moment.
> 
> This behavior change breaks setups that were using the
> class=eth,mac=XX:XX:XX:XX:XX:XX syntax because OVS was relying on the
> (fragile) assumption that all DPDK ports were probed at init once and
> for all.
> Add a warning for users of this syntax, update the documentation and
> add an option to restore the original behavior via
> 'dpdk-probe-at-init=true'.
> 
> This option also helps for unexpected cases like https://xkcd.com/1172/.
> 
> Signed-off-by: David Marchand <[email protected]>
> Acked-by: Eli Britstein <[email protected]>
> Acked-by: Eelco Chaudron <[email protected]>
> ---
> Changes since RFC v2:
> - updated descriptions and comments,
> 
> Changes since RFC v1:
> - updated commitlog (mentionning devargs),
> - handled other DPDK buses,
> 
> ---
>  Documentation/howto/dpdk.rst         |  6 +++++
>  Documentation/intro/install/dpdk.rst |  8 ++++++
>  NEWS                                 |  5 ++++
>  lib/dpdk.c                           | 28 +++++++++++++++++++
>  lib/netdev-dpdk.c                    |  2 +-
>  tests/system-dpdk-macros.at          |  2 +-
>  tests/system-dpdk.at                 | 40 ++++++++++++++--------------
>  vswitchd/vswitch.xml                 | 15 +++++++++++
>  8 files changed, 84 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 73e630b07f..5d6bf94cdb 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -62,6 +62,12 @@ is suggested::
>  
>  .. important::
>  
> +    Using this syntax requires that DPDK probes the device that owns those
> +    multiple ports. This can be achieved by either setting an allowlist
> +    of PCI devices in the ``dpdk-extra`` configuration, or by requesting that
> +    all available devices (including PCI devices) be probed at initialization
> +    (setting ``dpdk-probe-at-init`` to true).
> +
>      Hotplugging physical interfaces is not supported using the above syntax.
>      This is expected to change with the release of DPDK v18.05. For 
> information
>      on hotplugging physical interfaces, you should instead refer to
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 6f4687bdea..eabca63a83 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -297,6 +297,14 @@ listed below. Defaults will be provided for all values 
> not explicitly set.
>    sockets. If not specified, this option will not be set by default. DPDK
>    default will be used instead.
>  
> +``dpdk-probe-at-init``
> +  Let DPDK EAL probe all available devices at initialization.

We're duplicating the text between here and the man page, which doesn't seem
right.  I'd cut the text from here...

> +  This consumes more resources, as OVS may not use all probed devices, and it
> +  may cause undesired side effects (such as taking the RTNL lock frequently 
> for
> +  maintaining link status (and other states, etc.) of mlx5 netdevs that OVS
> +  does not care about). However,

..to here.  And addded more info to the man page if necessary.

nit: Should generally avoid doubly nested parentheses, if possible.
nit: The word "netdev" should not be used in the user-facing documentation.
     We talk about ports and interfaces, or DPDK devices.

>     this option is needed when using the
> +  ``class=eth,mac=XX:XX:XX:XX:XX:XX`` syntax for DPDK ports.
> +
>  ``dpdk-hugepage-dir``
>    Directory where hugetlbfs is mounted
>  
> diff --git a/NEWS b/NEWS
> index d5642f9857..392e3ed1f0 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,10 @@
>  Post-v3.7.0
>  --------------------
> +   - DPDK:
> +     * Probing of devices at DPDK initialization has been disabled to avoid
> +       wasting resources on unused devices. This breaks DPDK netdev ports
> +       using the "class=eth,mac=" syntax (though it can be restored, see
> +       Documentation/howto/dpdk.rst).

We need to mention the new config option and point to the man page for
more details.

>  
>  
>  v3.7.0 - 16 Feb 2026
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index d27b95cd9a..794ffbe599 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -430,6 +430,34 @@ dpdk_init__(const struct smap *ovs_other_config)
>          svec_add_nocopy(&args, xasprintf("0@%d", cpu));
>      }
>  
> +    if (!args_contains(&args, "-a") && !args_contains(&args, "-b")

What about full versions of these flags?

> +        && !smap_get_bool(ovs_other_config, "dpdk-probe-at-init", false)) {
> +#ifdef RTE_BUS_AUXILIARY
> +        svec_add(&args, "-a");
> +        svec_add(&args, "auxiliary:");
> +#endif
> +#ifdef RTE_BUS_CDX
> +        svec_add(&args, "-a");
> +        svec_add(&args, "cdx:cdx-");
> +#endif
> +#ifdef RTE_BUS_FSLMC
> +        svec_add(&args, "-a");
> +        svec_add(&args, "fslmc:dpni.65535");
> +#endif
> +#ifdef RTE_BUS_PCI
> +        svec_add(&args, "-a");
> +        svec_add(&args, "pci:0000:00:00.0");
> +#endif
> +#ifdef RTE_BUS_UACCE
> +        svec_add(&args, "-a");
> +        svec_add(&args, "uacce:");
> +#endif
> +#ifdef RTE_BUS_VMBUS
> +        svec_add(&args, "-a");
> +        svec_add(&args, "vmbus:00000000-0000-0000-0000-000000000000");
> +#endif

What about dpaa, ifpga and the platform?

> +    }
> +
>      svec_terminate(&args);
>  
>      optind = 1;
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c51fe7c258..8115223277 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2050,7 +2050,7 @@ netdev_dpdk_get_port_by_mac(const char *mac_str, char 
> **extra_err)
>          }
>      }
>  
> -    *extra_err = xstrdup("unknown mac");
> +    *extra_err = xstrdup("unknown mac (use dpdk-probe-at-init=true?)");

nit: s/use/need/ ?  "use" sounds a bit strange as a question.

>      return DPDK_ETH_PORT_ID_INVALID;
>  }
>  
> diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
> index f8ba766739..716d8a357d 100644
> --- a/tests/system-dpdk-macros.at
> +++ b/tests/system-dpdk-macros.at
> @@ -139,7 +139,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
>     OVS_DPDK_PRE_CHECK()
>     OVS_WAIT_WHILE([ip link show ovs-netdev])
>     dnl For functional tests, no need for DPDK PCI probing.
> -   OVS_DPDK_START([--no-pci], [--disable-system], [$3])
> +   OVS_DPDK_START([], [--disable-system], [$3])
>     dnl Add bridges, ports, etc.
>     OVS_WAIT_WHILE([ip link show br0])
>     AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
> uuidfilt])], [0], [$2])
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 17d3d25955..bd1bead661 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -43,7 +43,7 @@ dnl Check if EAL init is successful
>  AT_SETUP([OVS-DPDK - EAL init])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  AT_CHECK([grep "DPDK Enabled - initializing..." ovs-vswitchd.log], [], 
> [stdout])
>  AT_CHECK([grep "EAL" ovs-vswitchd.log], [], [stdout])
>  AT_CHECK([grep "DPDK Enabled - initialized" ovs-vswitchd.log], [], [stdout])
> @@ -59,7 +59,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - single])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED()
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0x1])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -77,7 +77,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - multi])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED(4)
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0xf])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -95,7 +95,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - 
> non-contig])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED(8)
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0xca])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -113,7 +113,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - 
> zeromask])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0x0])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
>  OVS_WAIT_UNTIL([grep "Ignoring database defined option 'dpdk-lcore-mask' due 
> to invalid value '0x0'" ovs-vswitchd.log])
> @@ -152,7 +152,7 @@ dnl Add vhost-user-client port
>  AT_SETUP([OVS-DPDK - add vhost-user-client port])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -181,7 +181,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user ports])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -237,7 +237,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user-client ports])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -350,7 +350,7 @@ AT_SETUP([OVS-DPDK - Ingress policing create delete vport 
> port])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -387,7 +387,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing rate])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -421,7 +421,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing burst])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -487,7 +487,7 @@ AT_SETUP([OVS-DPDK - QoS create delete vport port])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -522,7 +522,7 @@ AT_SETUP([OVS-DPDK - QoS no cir])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -551,7 +551,7 @@ AT_SETUP([OVS-DPDK - QoS no cbs])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -657,7 +657,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS with default MTU value
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -698,7 +698,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and modify MTU value
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -816,7 +816,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and set MTU value to max upper 
> bound
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -858,7 +858,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and set MTU value to min lower 
> bound
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -897,7 +897,7 @@ AT_SETUP([OVS-DPDK - user configured mempool])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:shared-mempool-config=8000,6000,1500])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -946,7 +946,7 @@ dnl 
> --------------------------------------------------------------------------
>  AT_SETUP([OVS-DPDK - ovs-appctl dpif/offload/show])
>  AT_KEYWORDS([dpdk dpif-offload])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
>  AT_CHECK([ovs-vsctl add-port br0 p1 \
>    -- set Interface p1 type=dpdk options:dpdk-devargs=net_null0,no-rx=1],
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index b7a5afc0a5..458b88870c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -453,6 +453,21 @@
>          </p>
>        </column>
>  
> +      <column name="other_config" key="dpdk-probe-at-init"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Specifies whether DPDK should probe all devices available at the
> +          time DPDK is initialized. This is required when declaring DPDK 
> ports
> +          using the "class=eth,mac=XX:XX:XX:XX:XX:XX" syntax, but beware that

nit: Use the <code> tags.

> +          it implies higher resource consumption and may cause undesired side
> +          effects with some devices (such as mlx5).

May need to expand on what these side effects could be.

> +        </p>
> +        <p>
> +          If not specified, DPDK will not probe any devices at 
> initialization,
> +          which should be fine in most cases.

Also need to specify that this doesn't work when dpdk-extra contains allowed
or blocked deveices. Even if set to true.

And it might be better to say what it does if set to false, with the caveats
of the dependency on other dpdk-extra flags.  And then say that the default
is 'false'.

> +        </p>
> +      </column>
> +
>        <column name="other_config" key="dpdk-extra"
>                type='{"type": "string"}'>
>          <p>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to