'dpdk' ports no longer have naming restrictions. Now, instead of specifying the dpdk port ID as part of the name, the PCI address of the device must be specified via the 'dpdk-devargs' option. eg.
ovs-vsctl add-port br0 my-port ovs-vsctl set Interface my-port type=dpdk ovs-vsctl set Interface my-port options:dpdk-devargs=0000:06:00.3 The user must no longer hotplug attach DPDK ports by issuing the specific ovs-appctl netdev-dpdk/port-attach command. The hotplug is now automatically invoked when a valid PCI address is set in the dpdk-devargs. The format for ovs-appctl netdev-dpdk/port-detach command has changed in that the user now must specify the relevant PCI address as input instead of the port name. Signed-off-by: Ciara Loftus <[email protected]> --- Changelog: * Keep port-detach appctl function - use PCI as input arg * Add requires_mutex to devargs processing functions * use reconfigure infrastructure for devargs changes * process devargs even if valid portid ie. device already configured * report err if dpdk-devargs is not specified Documentation/intro/install/dpdk-advanced.rst | 7 +- Documentation/intro/install/dpdk.rst | 14 ++- NEWS | 2 + lib/netdev-dpdk.c | 157 +++++++++++++++++--------- vswitchd/vswitch.xml | 8 ++ 5 files changed, 123 insertions(+), 65 deletions(-) diff --git a/Documentation/intro/install/dpdk-advanced.rst b/Documentation/intro/install/dpdk-advanced.rst index 42a4af6..858ff98 100644 --- a/Documentation/intro/install/dpdk-advanced.rst +++ b/Documentation/intro/install/dpdk-advanced.rst @@ -944,14 +944,13 @@ dpdk_nic_bind.py script: Then it can be attached to OVS: - $ ovs-appctl netdev-dpdk/port-attach 0000:01:00.0 - -At this point, the user can create a ovs port using the add-port command. + $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ + options:dpdk-devargs=0000:01:00.0 It is also possible to detach a port from ovs, the user has to remove the port using the del-port command, then it can be detached using: - $ ovs-appctl netdev-dpdk/port-detach dpdk0 + $ ovs-appctl netdev-dpdk/port-detach 0000:01:00.0 This feature is not supported with VFIO and could not work with some NICs. For more information please refer to the `DPDK Port Hotplug Framework diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index 7724c8a..40b5b67 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -245,12 +245,14 @@ Bridges should be created with a ``datapath_type=netdev``:: $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev -Now you can add DPDK devices. OVS expects DPDK device names to start with -``dpdk`` and end with a portid. ovs-vswitchd should print the number of dpdk -devices found in the log file:: - - $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk - $ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk +Now you can add dpdk devices. The PCI address of the device needs to be +set using the 'dpdk-devargs' option. DPDK will print the PCI addresses of +eligible devices found during initialisation. + + $ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk \ + options:dpdk-devargs=0000:06:00.0 + $ ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk \ + options:dpdk-devargs=0000:06:00.1 After the DPDK ports get added to switch, a polling thread continuously polls DPDK devices and consumes 100% of the core, as can be checked from 'top' and diff --git a/NEWS b/NEWS index b596cf3..a62a7a4 100644 --- a/NEWS +++ b/NEWS @@ -42,6 +42,8 @@ Post-v2.6.0 which set the number of rx and tx descriptors to use for the given port. * Support for DPDK v16.11. * Port Hotplug is now supported. + * DPDK physical ports can now have arbitrary names. The PCI address of + the device must be set using the 'dpdk-devargs' option. - Fedora packaging: * A package upgrade does not automatically restart OVS service. - ovs-vswitchd/ovs-vsctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 677a10c..07a99c7 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -143,6 +143,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) #define VHOST_ENQ_RETRY_NUM 8 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) +#define DEVARGS_MAX PCI_PRI_STR_SIZE + static const struct rte_eth_conf port_conf = { .rxmode = { .mq_mode = ETH_MQ_RX_RSS, @@ -352,6 +354,9 @@ struct netdev_dpdk { /* Identifier used to distinguish vhost devices from each other. */ char vhost_id[PATH_MAX]; + /* Device arguments for dpdk ports */ + char dpdk_devargs[DEVARGS_MAX]; + /* In dpdk_list. */ struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); @@ -366,6 +371,7 @@ struct netdev_dpdk { int requested_n_rxq; int requested_rxq_size; int requested_txq_size; + char requested_devargs[DEVARGS_MAX]; /* Number of rx/tx descriptors for physical devices */ int rxq_size; @@ -813,7 +819,7 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, /* If the 'sid' is negative, it means that the kernel fails * to obtain the pci numa info. In that situation, always * use 'SOCKET0'. */ - if (type == DPDK_DEV_ETH) { + if (type == DPDK_DEV_ETH && rte_eth_dev_is_valid_port(dev->port_id)) { sid = rte_eth_dev_socket_id(port_no); } else { sid = rte_lcore_to_socket_id(rte_get_master_lcore()); @@ -852,9 +858,11 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, /* Initialize the flow control to NULL */ memset(&dev->fc_conf, 0, sizeof dev->fc_conf); if (type == DPDK_DEV_ETH) { - err = dpdk_eth_dev_init(dev); - if (err) { - goto unlock; + if (rte_eth_dev_is_valid_port(dev->port_id)) { + err = dpdk_eth_dev_init(dev); + if (err) { + goto unlock; + } } dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); } else { @@ -950,17 +958,10 @@ netdev_dpdk_vhost_client_construct(struct netdev *netdev) static int netdev_dpdk_construct(struct netdev *netdev) { - unsigned int port_no; int err; - /* Names always start with "dpdk" */ - err = dpdk_dev_parse_name(netdev->name, "dpdk", &port_no); - if (err) { - return err; - } - ovs_mutex_lock(&dpdk_mutex); - err = netdev_dpdk_init(netdev, port_no, DPDK_DEV_ETH); + err = netdev_dpdk_init(netdev, -1, DPDK_DEV_ETH); ovs_mutex_unlock(&dpdk_mutex); return err; } @@ -1078,6 +1079,68 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) return 0; } +static uint8_t +dpdk_get_port_id_from_pci(struct rte_pci_addr *addr) +{ + uint8_t i = 0; + struct rte_eth_dev_info info; + + /* Search for PCI device in DPDK */ + for (i = 0; i < RTE_MAX_ETHPORTS; i++) { + if (!rte_eth_dev_is_valid_port(i)) { + continue; + } + rte_eth_dev_info_get(i, &info); + if (!rte_eal_compare_pci_addr(&info.pci_dev->addr, addr)) { + return i; + } + } + + return -1; +} + +static int +netdev_dpdk_process_pdevargs(struct netdev_dpdk *dev, const char *devargs, + struct rte_pci_addr *addr) +{ + /* Search for PCI device in DPDK */ + dev->port_id = dpdk_get_port_id_from_pci(addr); + + if (!rte_eth_dev_is_valid_port(dev->port_id)) { + /* PCI device not found in DPDK, attempt to attach it */ + uint8_t new_port_id; + + if (!rte_eth_dev_attach(devargs, &new_port_id)) { + /* Attach successful */ + VLOG_INFO("Device "PCI_PRI_FMT" has been attached to DPDK", + addr->domain, addr->bus, addr->devid, addr->function); + dev->port_id = new_port_id; + } else { + /* Attach unsuccessful */ + return -1; + } + } + + return 0; +} + +static void +netdev_dpdk_process_devargs(struct netdev_dpdk *dev) + OVS_REQUIRES(dev->mutex) +{ + struct rte_pci_addr addr; + int err = -1; + + if (!eal_parse_pci_DomBDF(dev->requested_devargs, &addr)) { + /* Valid PCI address format detected - configure physical device */ + err = netdev_dpdk_process_pdevargs(dev, dev->requested_devargs, &addr); + } + + if (!err) { + strcpy(dev->dpdk_devargs, dev->requested_devargs); + } +} + static void dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) OVS_REQUIRES(dev->mutex) @@ -1118,6 +1181,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) {RTE_FC_NONE, RTE_FC_TX_PAUSE}, {RTE_FC_RX_PAUSE, RTE_FC_FULL } }; + const char *new_devargs; + int err = 0; ovs_mutex_lock(&dev->mutex); @@ -1141,9 +1206,18 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) dpdk_eth_flow_ctrl_setup(dev); } + new_devargs = smap_get(args, "dpdk-devargs"); + if (new_devargs && strlen(new_devargs)) { + strcpy(dev->requested_devargs, new_devargs); + netdev_request_reconfigure(&dev->up); + } else { + /* dpdk-devargs is required for device configuration */ + err = ENODEV; + } + ovs_mutex_unlock(&dev->mutex); - return 0; + return err; } static int @@ -2308,55 +2382,28 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc, } static void -netdev_dpdk_port_attach(struct unixctl_conn *conn, int argc OVS_UNUSED, - const char *argv[], void *aux OVS_UNUSED) -{ - int ret; - char *response; - uint8_t port_id; - - ovs_mutex_lock(&dpdk_mutex); - - ret = rte_eth_dev_attach(argv[1], &port_id); - if (ret < 0) { - response = xasprintf("Error attaching device '%s'", argv[1]); - ovs_mutex_unlock(&dpdk_mutex); - unixctl_command_reply_error(conn, response); - return; - } - - response = xasprintf("Device '%s' has been attached as 'dpdk%d'", - argv[1], port_id); - - ovs_mutex_unlock(&dpdk_mutex); - unixctl_command_reply(conn, response); -} - -static void netdev_dpdk_port_detach(struct unixctl_conn *conn, int argc OVS_UNUSED, const char *argv[], void *aux OVS_UNUSED) { int ret; char *response; - unsigned int parsed_port; - uint8_t port_id; + int8_t port_id = -1; char devname[RTE_ETH_NAME_MAX_LEN]; + struct rte_pci_addr addr; ovs_mutex_lock(&dpdk_mutex); - ret = dpdk_dev_parse_name(argv[1], "dpdk", &parsed_port); - if (ret) { - response = xasprintf("'%s' is not a valid port", argv[1]); + if (eal_parse_pci_DomBDF(argv[1], &addr)) { + response = xasprintf("Invalid PCI address '%s'. Cannot detach.", + argv[1]); goto error; } - port_id = parsed_port; + /* Search for the address in DPDK and retrieve corresponding port ID. */ + port_id = dpdk_get_port_id_from_pci(&addr); - struct netdev *netdev = netdev_from_name(argv[1]); - if (netdev) { - netdev_close(netdev); - response = xasprintf("Port '%s' is being used. Remove it before" - "detaching", argv[1]); + if (port_id == -1) { + response = xasprintf("Device '%s' not found in DPDK", argv[1]); goto error; } @@ -2364,11 +2411,11 @@ netdev_dpdk_port_detach(struct unixctl_conn *conn, int argc OVS_UNUSED, ret = rte_eth_dev_detach(port_id, devname); if (ret < 0) { - response = xasprintf("Port '%s' can not be detached", argv[1]); + response = xasprintf("Device '%s' can not be detached", argv[1]); goto error; } - response = xasprintf("Port '%s' has been detached", argv[1]); + response = xasprintf("Device '%s' has been detached", argv[1]); ovs_mutex_unlock(&dpdk_mutex); unixctl_command_reply(conn, response); @@ -2654,9 +2701,6 @@ netdev_dpdk_class_init(void) unixctl_command_register("netdev-dpdk/set-admin-state", "[netdev] up|down", 1, 2, netdev_dpdk_set_admin_state, NULL); - unixctl_command_register("netdev-dpdk/port-attach", - "pci address of device", 1, 1, - netdev_dpdk_port_attach, NULL); unixctl_command_register("netdev-dpdk/port-detach", "port", 1, 1, netdev_dpdk_port_detach, NULL); @@ -3029,7 +3073,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) && netdev->n_rxq == dev->requested_n_rxq && dev->mtu == dev->requested_mtu && dev->rxq_size == dev->requested_rxq_size - && dev->txq_size == dev->requested_txq_size) { + && dev->txq_size == dev->requested_txq_size + && !(strcmp(dev->dpdk_devargs, dev->requested_devargs))) { /* Reconfiguration is unnecessary */ goto out; @@ -3047,6 +3092,8 @@ netdev_dpdk_reconfigure(struct netdev *netdev) dev->rxq_size = dev->requested_rxq_size; dev->txq_size = dev->requested_txq_size; + netdev_dpdk_process_devargs(dev); + rte_free(dev->tx_q); err = dpdk_eth_dev_init(dev); dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index b4af5a5..23ab469 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2303,6 +2303,14 @@ </p> </column> + <column name="options" key="dpdk-devargs" + type='{"type": "string"}'> + <p> + Specifies the PCI address of a physical dpdk device. + Only supported by 'dpdk' devices. + </p> + </column> + <column name="other_config" key="pmd-rxq-affinity"> <p>Specifies mapping of RX queues of this interface to CPU cores.</p> <p>Value should be set in the following form:</p> -- 2.4.3 _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
