On 2/26/26 5:24 PM, David Marchand wrote:
> By default, DPDK probes all available resources (like PCI devices) and
> partially initialises them (/ takes over them).
> This behavior has been relied on by OVS, since netdev-dpdk introduction.
> It is not needed since DPDK device hotplug has been supported and used
> for some time now.
> 
> Besides, this initial probing may not be desirable:
> - for PCI devices bound to vfio-pci, the first application taking over
>   them "wins", meaning that OVS would prevent qemu from using some VF
>   devices,
> - for mlx5 devices,
>   - the driver maintains link status and liveness of all ports
>     (taking some kernel lock) even when OVS only uses a subset of them,
>   - if some driver feature needs to be enabled for one port via a devargs,
>     this would have to be set in dpdk-extra,
> 
> Change this behavior and disable the initial PCI probing by passing
> a specially crafted allow list: this implementation is not elegant
> but it has been successfully used (for the PCI part) in a number of
> setups I know, and there is no better DPDK API to achieve the same
> at the moment.
> 
> This behavior change breaks setups that were using the
> class=eth,mac=XX:XX:XX:XX:XX:XX syntax because OVS was relying on the
> (fragile) assumption that all DPDK ports were probed at init once and
> for all.
> Add a warning for users of this syntax, update the documentation and
> add an option to restore the original behavior via
> 'dpdk-probe-at-init=true'.
> 
> This option also helps for unexpected cases like https://xkcd.com/1172/.
> 
> Signed-off-by: David Marchand <[email protected]>
> Acked-by: Eli Britstein <[email protected]>
> Acked-by: Eelco Chaudron <[email protected]>

Hi David. Thanks for this. A few comments below.

thanks,
Kevin.

> ---
> Changes since RFC v2:
> - updated descriptions and comments,
> 
> Changes since RFC v1:
> - updated commitlog (mentionning devargs),
> - handled other DPDK buses,
> 
> ---
>  Documentation/howto/dpdk.rst         |  6 +++++
>  Documentation/intro/install/dpdk.rst |  8 ++++++
>  NEWS                                 |  5 ++++
>  lib/dpdk.c                           | 28 +++++++++++++++++++
>  lib/netdev-dpdk.c                    |  2 +-
>  tests/system-dpdk-macros.at          |  2 +-
>  tests/system-dpdk.at                 | 40 ++++++++++++++--------------
>  vswitchd/vswitch.xml                 | 15 +++++++++++
>  8 files changed, 84 insertions(+), 22 deletions(-)
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 73e630b07f..5d6bf94cdb 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -62,6 +62,12 @@ is suggested::
>  
>  .. important::
>  
> +    Using this syntax requires that DPDK probes the device that owns those
> +    multiple ports. This can be achieved by either setting an allowlist
> +    of PCI devices in the ``dpdk-extra`` configuration, or by requesting that
> +    all available devices (including PCI devices) be probed at initialization
> +    (setting ``dpdk-probe-at-init`` to true).
> +
>      Hotplugging physical interfaces is not supported using the above syntax.
>      This is expected to change with the release of DPDK v18.05. For 
> information
>      on hotplugging physical interfaces, you should instead refer to
> diff --git a/Documentation/intro/install/dpdk.rst 
> b/Documentation/intro/install/dpdk.rst
> index 6f4687bdea..eabca63a83 100644
> --- a/Documentation/intro/install/dpdk.rst
> +++ b/Documentation/intro/install/dpdk.rst
> @@ -297,6 +297,14 @@ listed below. Defaults will be provided for all values 
> not explicitly set.
>    sockets. If not specified, this option will not be set by default. DPDK
>    default will be used instead.
>  
> +``dpdk-probe-at-init``
> +  Let DPDK EAL probe all available devices at initialization.
> +  This consumes more resources, as OVS may not use all probed devices, and it
> +  may cause undesired side effects (such as taking the RTNL lock frequently 
> for
> +  maintaining link status (and other states, etc.) of mlx5 netdevs that OVS
> +  does not care about). However, this option is needed when using the
> +  ``class=eth,mac=XX:XX:XX:XX:XX:XX`` syntax for DPDK ports.
> +
>  ``dpdk-hugepage-dir``
>    Directory where hugetlbfs is mounted
>  
> diff --git a/NEWS b/NEWS
> index d5642f9857..392e3ed1f0 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,10 @@
>  Post-v3.7.0
>  --------------------
> +   - DPDK:
> +     * Probing of devices at DPDK initialization has been disabled to avoid
> +       wasting resources on unused devices. This breaks DPDK netdev ports
> +       using the "class=eth,mac=" syntax (though it can be restored, see
> +       Documentation/howto/dpdk.rst).
>  
>  
>  v3.7.0 - 16 Feb 2026
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index d27b95cd9a..794ffbe599 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -430,6 +430,34 @@ dpdk_init__(const struct smap *ovs_other_config)
>          svec_add_nocopy(&args, xasprintf("0@%d", cpu));
>      }
>  
> +    if (!args_contains(&args, "-a") && !args_contains(&args, "-b")

Should also add check for --allow and --block

> +        && !smap_get_bool(ovs_other_config, "dpdk-probe-at-init", false)) {
> +#ifdef RTE_BUS_AUXILIARY
> +        svec_add(&args, "-a");
> +        svec_add(&args, "auxiliary:");
> +#endif
> +#ifdef RTE_BUS_CDX
> +        svec_add(&args, "-a");
> +        svec_add(&args, "cdx:cdx-");
> +#endif
> +#ifdef RTE_BUS_FSLMC
> +        svec_add(&args, "-a");
> +        svec_add(&args, "fslmc:dpni.65535");
> +#endif
> +#ifdef RTE_BUS_PCI
> +        svec_add(&args, "-a");
> +        svec_add(&args, "pci:0000:00:00.0");

I tried with adding dpdk-extra="--no-pci". It is handled correctly in
rte_eal_init() but could consider adding a check here to avoid having
both "no pci" and "allow dummy pci device" in the args ?

> +#endif
> +#ifdef RTE_BUS_UACCE
> +        svec_add(&args, "-a");
> +        svec_add(&args, "uacce:");
> +#endif
> +#ifdef RTE_BUS_VMBUS
> +        svec_add(&args, "-a");
> +        svec_add(&args, "vmbus:00000000-0000-0000-0000-000000000000");
> +#endif
> +    }
> +
>      svec_terminate(&args);
>  
>      optind = 1;
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index c51fe7c258..8115223277 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -2050,7 +2050,7 @@ netdev_dpdk_get_port_by_mac(const char *mac_str, char 
> **extra_err)
>          }
>      }
>  
> -    *extra_err = xstrdup("unknown mac");
> +    *extra_err = xstrdup("unknown mac (use dpdk-probe-at-init=true?)");

nit: I really hope no-one would take you literally but just in case,
might be worth putting a space between "true" and "?"

>      return DPDK_ETH_PORT_ID_INVALID;
>  }
>  
> diff --git a/tests/system-dpdk-macros.at b/tests/system-dpdk-macros.at
> index f8ba766739..716d8a357d 100644
> --- a/tests/system-dpdk-macros.at
> +++ b/tests/system-dpdk-macros.at
> @@ -139,7 +139,7 @@ m4_define([OVS_TRAFFIC_VSWITCHD_START],
>     OVS_DPDK_PRE_CHECK()
>     OVS_WAIT_WHILE([ip link show ovs-netdev])
>     dnl For functional tests, no need for DPDK PCI probing.
> -   OVS_DPDK_START([--no-pci], [--disable-system], [$3])
> +   OVS_DPDK_START([], [--disable-system], [$3])
>     dnl Add bridges, ports, etc.
>     OVS_WAIT_WHILE([ip link show br0])
>     AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| 
> uuidfilt])], [0], [$2])
> diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
> index 17d3d25955..bd1bead661 100644
> --- a/tests/system-dpdk.at
> +++ b/tests/system-dpdk.at
> @@ -43,7 +43,7 @@ dnl Check if EAL init is successful
>  AT_SETUP([OVS-DPDK - EAL init])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  AT_CHECK([grep "DPDK Enabled - initializing..." ovs-vswitchd.log], [], 
> [stdout])
>  AT_CHECK([grep "EAL" ovs-vswitchd.log], [], [stdout])
>  AT_CHECK([grep "DPDK Enabled - initialized" ovs-vswitchd.log], [], [stdout])
> @@ -59,7 +59,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - single])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED()
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0x1])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -77,7 +77,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - multi])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED(4)
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0xf])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -95,7 +95,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - 
> non-contig])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  CHECK_CPU_DISCOVERED(8)
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0xca])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -113,7 +113,7 @@ AT_SETUP([OVS-DPDK - dpdk-lcore-mask conversion - 
> zeromask])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-lcore-mask=0x0])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
>  OVS_WAIT_UNTIL([grep "Ignoring database defined option 'dpdk-lcore-mask' due 
> to invalid value '0x0'" ovs-vswitchd.log])
> @@ -152,7 +152,7 @@ dnl Add vhost-user-client port
>  AT_SETUP([OVS-DPDK - add vhost-user-client port])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -181,7 +181,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user ports])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -237,7 +237,7 @@ AT_SETUP([OVS-DPDK - ping vhost-user-client ports])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_CHECK_TESTPMD()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -350,7 +350,7 @@ AT_SETUP([OVS-DPDK - Ingress policing create delete vport 
> port])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -387,7 +387,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing rate])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -421,7 +421,7 @@ AT_SETUP([OVS-DPDK - Ingress policing no policing burst])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add ingress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -487,7 +487,7 @@ AT_SETUP([OVS-DPDK - QoS create delete vport port])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -522,7 +522,7 @@ AT_SETUP([OVS-DPDK - QoS no cir])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -551,7 +551,7 @@ AT_SETUP([OVS-DPDK - QoS no cbs])
>  AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and add egress policer
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -657,7 +657,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS with default MTU value
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -698,7 +698,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and modify MTU value
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -816,7 +816,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and set MTU value to max upper 
> bound
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -858,7 +858,7 @@ AT_KEYWORDS([dpdk])
>  
>  OVS_DPDK_CHECK_TESTPMD()
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  
>  dnl Add userspace bridge and attach it to OVS and set MTU value to min lower 
> bound
>  AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
> @@ -897,7 +897,7 @@ AT_SETUP([OVS-DPDK - user configured mempool])
>  AT_KEYWORDS([dpdk])
>  OVS_DPDK_PRE_CHECK()
>  OVS_DPDK_START_OVSDB()
> -OVS_DPDK_START_VSWITCHD([--no-pci])
> +OVS_DPDK_START_VSWITCHD()
>  
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:shared-mempool-config=8000,6000,1500])
>  AT_CHECK([ovs-vsctl --no-wait set Open_vSwitch . 
> other_config:dpdk-init=true])
> @@ -946,7 +946,7 @@ dnl 
> --------------------------------------------------------------------------
>  AT_SETUP([OVS-DPDK - ovs-appctl dpif/offload/show])
>  AT_KEYWORDS([dpdk dpif-offload])
>  OVS_DPDK_PRE_CHECK()
> -OVS_DPDK_START([--no-pci])
> +OVS_DPDK_START()
>  AT_CHECK([ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev])
>  AT_CHECK([ovs-vsctl add-port br0 p1 \
>    -- set Interface p1 type=dpdk options:dpdk-devargs=net_null0,no-rx=1],
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index b7a5afc0a5..458b88870c 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -453,6 +453,21 @@
>          </p>
>        </column>
>  
> +      <column name="other_config" key="dpdk-probe-at-init"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Specifies whether DPDK should probe all devices available at the
> +          time DPDK is initialized. This is required when declaring DPDK 
> ports
> +          using the "class=eth,mac=XX:XX:XX:XX:XX:XX" syntax, but beware that
> +          it implies higher resource consumption and may cause undesired side
> +          effects with some devices (such as mlx5).

Ilya already commented, but +1 for expanding on the undesired side
effects. e.g. "undesired side-effects, such additional interrupt
handling and link status checks for unused devices"

> +        </p>
> +        <p>
> +          If not specified, DPDK will not probe any devices at 
> initialization,
> +          which should be fine in most cases.
> +        </p>
> +      </column>
> +
>        <column name="other_config" key="dpdk-extra"
>                type='{"type": "string"}'>
>          <p>

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to