Hi Ilya,
Thank you for your review.
Please find comments inline.

> -----Original Message-----
> From: Ilya Maximets <[email protected]>
> Sent: Thursday, January 10, 2019 1:32 PM
> To: Ophir Munk <[email protected]>; [email protected]
> Cc: Asaf Penso <[email protected]>; Ian Stokes <[email protected]>;
> Shahaf Shuler <[email protected]>; Thomas Monjalon
> <[email protected]>; Olga Shern <[email protected]>; Kevin
> Traynor <[email protected]>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> On 08.01.2019 12:31, Ophir Munk wrote:
> > Dpdk port representors were introduced in dpdk versions 18.xx.
> > Prior to port representors there was a one-to-one relationship between
> > an rte device (e.g. PCI bus) and an eth device (referenced as dpdk
> > port id in OVS). With port representors the relationship becomes
> > one-to-many rte device to eth devices.
> > For example in [3] there are two devices (representors) using the same
> > PCI physical address 0000:08:00.0: "0000:08:00.0,representor=[3]" and
> > "0000:08:00.0,representor=[5]".
> > This commit handles the new one-to-many relationship. For example,
> > when one of the device port representors in [3] is closed - the PCI
> > bus cannot be detached until the other device port representor is
> > closed as well. OVS remains backward compatible by supporting dpdk
> > legacy PCI ports which do not include port representors.
> > Dpdk port representors related commits are listed in [1]. Dpdk port
> > representors documentation appears in [2]. A sample configuration
> > which uses two representors ports (the output of "ovs-vsctl show"
> > command) is shown in [3].
> > +
> >  static int
> >  vhost_common_construct(struct netdev *netdev)
> >      OVS_REQUIRES(dpdk_mutex)
> > @@ -1353,20 +1373,25 @@ static void
> >  netdev_dpdk_destruct(struct netdev *netdev)  {
> >      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> >      rte_eth_dev_stop(dev->port_id);
> >      dev->started = false;
> > -
> 
> Please, keep the empty line.
> 

I will update in v5.

> >      if (dev->attached) {
> > +        /* Remove the port eth device. */
> >          rte_eth_dev_close(dev->port_id);
> > -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> > -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> > -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> 
> Please keep this log message for the successful case.

Please note the line below: 
   VLOG_INFO("Device '%s' has been removed", dev->devargs);

This is the log message for the successful case. With the introduction of 
representors several eth devices can use one PCI device.
If you remove the first representor - its eth device is closed - but you do not 
detach the PCI device till the second representor eth device is closed as well.
Would you suggest replacing "removed" with "closed" or "detached"?

> 
> > -        } else {
> > -            VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +        VLOG_INFO("Device '%s' has been removed", dev->devargs);
> > +        /* If this is the last port_id using this rte device
> > +         * remove this rte device and all its eth devices.
> > +         * FIXME: avoid direct access to DPDK internal array 
> > rte_eth_devices.
> > +         */
> > +        rte_dev = rte_eth_devices[dev->port_id].device;
> > +        if (netdev_dpdk_get_num_ports(rte_dev) == 1) {
> 
> rte_eth_dev_close() sets the RTE_ETH_DEV_UNUSED state for PMDs with
> RTE_ETH_DEV_CLOSE_REMOVE flag.
> The function netdev_dpdk_get_num_ports() skips unused ports.
> The result should not be equal to 1. We need the other way to detect the
> siblings or close the port after calculating them.
> In this case we'll probably do not need to have check for UNUSED state inside
> the _get_num_ports().
> 

I will reply to this in the exchange of emails that followed with Thomas.

> > +            if (rte_dev_remove(rte_dev) < 0) {
> > +                VLOG_ERR("Device '%s' can not be detached", dev->devargs);
> > +            }
> >          }
> >      }
> >

> > @@ -1632,8 +1657,37 @@ netdev_dpdk_get_port_by_mac(const char
> *mac_str)
> >      return DPDK_ETH_PORT_ID_INVALID;
> >  }
> >
> > +/* Return the first DPDK port_id matching the devargs pattern. */
> > +static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char
> > +*devargs)
> 
> OVS_REQUIRES(dpdk_mutex)

I will update in v5

> 
> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
> because it not only finds (gets) the port, but probes (attaches) it.
> 

I think that the function main purpose is to find the port. Probing is optional 
for that purpose.
I am good with the current name but I am open to any new name you would like to 
suggest.
Can you please suggest?

> > +{
> > +    struct rte_dev_iterator iterator;
> > +    dpdk_port_t port_id;
> > +
> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> 

> Invalid spacing around the FOREACH loop. 
> Same in all the other places.

I do not understand this comment. Can you please explain or demo the fix? 

> > +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> > +        rte_eth_iterator_cleanup(&iterator);
> > +        break;
> > +    }
> > +    /* The port may be invalid if it was not probed */
> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> > +        if (rte_dev_probe(devargs)) {
> > +            port_id = DPDK_ETH_PORT_ID_INVALID;
> > +        } else {
> > +            RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> > +                /* If a break is done - must call 
> > rte_eth_iterator_cleanup. */
> > +                rte_eth_iterator_cleanup(&iterator);
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    return port_id;
> > +}
> > +
> >  /*
> > - * Normally, a PCI id is enough for identifying a specific DPDK port.
> > + * Normally, a PCI id (optionally followed by a representor number)
> > + * is enough for identifying a specific DPDK port.
> >   * However, for some NICs having multiple ports sharing the same PCI
> >   * id, using PCI id won't work then.
> >   *
> > @@ -1641,34 +1695,42 @@ netdev_dpdk_get_port_by_mac(const char
> *mac_str)
> >   *
> >   * Note that the compatibility is fully kept: user can still use the
> >   * PCI id for adding ports (when it's enough for them).
> > + *
> > + * In order to avoid clang errors add OVS_REQUIRES(dpdk_mutex) to
> > + this function
> > + * since it is required by netdev_dpdk_lookup_by_port_id() while the
> > + dpdk_mutex
> > + * is taken outside of this function.
> >   */
> >  static dpdk_port_t
> >  netdev_dpdk_process_devargs(struct netdev_dpdk *dev,
> >                              const char *devargs, char **errp)
> > +    OVS_REQUIRES(dpdk_mutex)
> >  {
> > -    char *name;
> >      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >
> >      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> >          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> >      } else {
> > -        name = xmemdup0(devargs, strcspn(devargs, ","));
> > -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> > -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> > -            /* Device not found in DPDK, attempt to attach it */
> > -            if (!rte_dev_probe(devargs)
> > -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> > -                /* Attach successful */
> > -                dev->attached = true;
> > -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> > -            } else {
> > -                /* Attach unsuccessful */
> > +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> > +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> > +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +        } else {
> > +            struct netdev_dpdk *dup_dev;
> > +
> > +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> > +            if (dup_dev) {
> > +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> > +                               "which is already in use by '%s'",
> > +                          netdev_get_name(&dev->up), devargs,
> > +                          netdev_get_name(&dup_dev->up));
> 
> Why you duplicating this part of code ?
> Can we just replace 'rte_eth_dev_get_port_by_name' with
> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
> 
> netdev_dpdk_get_port_by_devargs should look like this in this case:
> 
> /* Return the first DPDK port_id matching the devargs pattern. */ static bool
> netdev_dpdk_get_port_by_devargs(const char *devargs, dpdk_port_t
> *port_id)
>     OVS_REQUIRES(dpdk_mutex)
> {
>     struct rte_dev_iterator iterator;
> 
>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
>         /* If a break is done - must call rte_eth_iterator_cleanup. */
>         rte_eth_iterator_cleanup(&iterator);
>         break;
>     }
>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
> 
> netdev_dpdk_process_devargs:
> 
>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
>             /* Device not found in DPDK, attempt to attach it */
>             if (!rte_dev_probe(devargs)
>                 && !netdev_dpdk_get_port_by_devargs(devargs, &new_port_id)) {
>                 /* Attach successful */
>                 dev->attached = true;
>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
>             } else {
>                 /* Attach unsuccessful */
>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
>             }
>         }
> 
> If it works, I'd prefer this variant.
> 

I don't think it works in case you add two different ports with the same 
representor devargs (or the same PCI address with a PMD supporting 
representors).
The extra code is required to avoid this misconfiguration (using the same 
device for two different ports). 
For example, the following configuration must fail:

ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one type=dpdk 
options:dpdk-devargs=0000:08:00.0,representor=[1]
ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two type=dpdk 
options:dpdk-devargs=0000:08:00.0,representor=[1]

In such misconfiguration please note the waring you get:
                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
                               "which is already in use by '%s'",
                          netdev_get_name(&dev->up), devargs,
                          netdev_get_name(&dup_dev->up));

An interesting question is why didn't we need this extra code in previous OVS 
versions.
The reason is that if we called rte_dev_probe() a second time for the same PCI 
device - we got an error. So we were immune against this misconfiguration.
This was the behavior before the introduction of representors. With representor 
- calling rte_dev_probe() twice for the same devargs will return successfully, 
hence the need for an additional check (extra code).

> Also, do we still need netdev_dpdk_get_port_by_mac() if we're using
> devargs iterator ?
> 

I will reply to this in the exchange of emails that followed with Thomas.
In short - the devargs iterator seems broken in parsing mac addresses.

> >                  new_port_id = DPDK_ETH_PORT_ID_INVALID;
> > +            } else {
> > +                /* Device successfully found. */
> > +                dev->attached = true;
> > +                VLOG_INFO("Device '%s' attached to DPDK port %d",
> > +                          devargs, new_port_id);
> >              }
> >          }
> > -        free(name);
> >      }
> > -
> 
> Please, keep the empty line.

I will update in v5

> 
> >      if (new_port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK",
> devargs);
> >      }
> > @@ -1761,7 +1823,7 @@ netdev_dpdk_set_config(struct netdev *netdev,
> const struct smap *args,
> >                  dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> >                  if (dup_dev) {
> >                      VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' 
> > "
> > -                                        "which is already in use by '%s'",
> > +                                  "which is already in use by '%s'",
> >                                    netdev_get_name(netdev), new_devargs,
> >                                    netdev_get_name(&dup_dev->up));
> >                      err = EADDRINUSE; @@ -3236,11 +3298,17 @@
> > netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED,
> >      char *response;
> >      dpdk_port_t port_id;
> >      struct netdev_dpdk *dev;
> > -    struct rte_eth_dev_info dev_info;
> > +    struct rte_device *rte_dev;
> > +    struct rte_dev_iterator iterator;
> >
> >      ovs_mutex_lock(&dpdk_mutex);
> >
> > -    if (rte_eth_dev_get_port_by_name(argv[1], &port_id)) {
> > +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, argv[1], &iterator) {
> > +        /* If a break is done - must call rte_eth_iterator_cleanup. */
> > +        rte_eth_iterator_cleanup(&iterator);
> > +        break;
> > +    }
> > +    if (port_id == DPDK_ETH_PORT_ID_INVALID) {
> >          response = xasprintf("Device '%s' not found in DPDK", argv[1]);
> >          goto error;
> >      }
> > @@ -3253,15 +3321,23 @@ netdev_dpdk_detach(struct unixctl_conn
> *conn, int argc OVS_UNUSED,
> >          goto error;
> >      }
> >
> > -    rte_eth_dev_close(port_id);
> > +    /* FIXME: avoid direct access to DPDK internal array rte_eth_devices. 
> > */
> > +    rte_dev = rte_eth_devices[port_id].device;
> > +    if (netdev_dpdk_get_num_ports(rte_dev)) {
> 
> Probably same here.
> DPDK does not set RTE_ETH_DEV_UNUSED state for PMDs without
> RTE_ETH_DEV_CLOSE_REMOVE flag.

Same here: I will reply to this in the exchange of emails that followed with 
Thomas.

> 
> > +        response = xasprintf("Device '%s' is being shared with other "
> > +                             "interfaces. Remove them before detaching.",
> > +                             argv[1]);
> > +        goto error;
> > +    }
> >
> > -    rte_eth_dev_info_get(port_id, &dev_info);
> > -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> > -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> > +    rte_eth_dev_close(port_id);
> > +    if (rte_dev_remove(rte_dev) < 0) {
> > +        response = xasprintf("Device '%s' can not be removed",
> > + argv[1]);
> 
> /removed/detached/
> You're using different words for equal situations. This is misleading.
> 

I can change the wording to "detached'. However I think it is misleading in a 
different way:
Assuming there are two representors using the same PCI bus - then "detaching" 
one representor will only
close the eth device, but will not detach the PCI device until the second 
representor is closed as well.
When all eth devices are closed - please note the message below:
   response = xasprintf("All devices shared with device '%s' "
                         "have been detached", argv[1]);

Having said that - do you still suggest replacing "removed" with "detached"?

> >
> > -    response = xasprintf("Device '%s' has been detached", argv[1]);
> > +    response = xasprintf("All devices shared with device '%s' "
> > +                          "have been detached", argv[1]);
> >
> >      ovs_mutex_unlock(&dpdk_mutex);
> >      unixctl_command_reply(conn, response);
> >
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to