Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

Ophir Munk Tue, 15 Jan 2019 00:59:16 -0800

Hi Ilya,
Please find comments inline

> -----Original Message-----
> From: Ilya Maximets <[email protected]>
> Sent: Monday, January 14, 2019 12:11 PM
> To: Ophir Munk <[email protected]>; [email protected]
> Cc: Asaf Penso <[email protected]>; Ian Stokes <[email protected]>;
> Shahaf Shuler <[email protected]>; Thomas Monjalon
> <[email protected]>; Olga Shern <[email protected]>; Kevin
> Traynor <[email protected]>
> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> 
> On 13.01.2019 18:13, Ophir Munk wrote:
> > Hi Ilya,
> > Thank you for your review.
> > Please find comments inline.
> >
> >> -----Original Message-----
> >> From: Ilya Maximets <[email protected]>
> >> Sent: Thursday, January 10, 2019 1:32 PM
> >> To: Ophir Munk <[email protected]>; [email protected]
> >> Cc: Asaf Penso <[email protected]>; Ian Stokes
> >> <[email protected]>; Shahaf Shuler <[email protected]>;
> Thomas
> >> Monjalon <[email protected]>; Olga Shern <[email protected]>;
> >> Kevin Traynor <[email protected]>
> >> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
> >>
> >> On 08.01.2019 12:31, Ophir Munk wrote:
> >>> Dpdk port representors were introduced in dpdk versions 18.xx.
> >>> Prior to port representors there was a one-to-one relationship
> >>> between an rte device (e.g. PCI bus) and an eth device (referenced
> >>> as dpdk port id in OVS). With port representors the relationship
> >>> becomes one-to-many rte device to eth devices.
> >>> For example in [3] there are two devices (representors) using the
> >>> same PCI physical address 0000:08:00.0:
> >>> "0000:08:00.0,representor=[3]" and "0000:08:00.0,representor=[5]".
> >>> This commit handles the new one-to-many relationship. For example,
> >>> when one of the device port representors in [3] is closed - the PCI
> >>> bus cannot be detached until the other device port representor is
> >>> closed as well. OVS remains backward compatible by supporting dpdk
> >>> legacy PCI ports which do not include port representors.
> >>> Dpdk port representors related commits are listed in [1]. Dpdk port
> >>> representors documentation appears in [2]. A sample configuration
> >>> which uses two representors ports (the output of "ovs-vsctl show"
> >>> command) is shown in [3].
> >>> +
> >>>  static int
> >>>  vhost_common_construct(struct netdev *netdev)
> >>>      OVS_REQUIRES(dpdk_mutex)
> >>> @@ -1353,20 +1373,25 @@ static void
> >>>  netdev_dpdk_destruct(struct netdev *netdev)  {
> >>>      struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> >>> -    struct rte_eth_dev_info dev_info;
> >>> +    struct rte_device *rte_dev;
> >>>
> >>>      ovs_mutex_lock(&dpdk_mutex);
> >>>
> >>>      rte_eth_dev_stop(dev->port_id);
> >>>      dev->started = false;
> >>> -
> >>
> >> Please, keep the empty line.
> >>
> >
> > I will update in v5.
> >
> >>>      if (dev->attached) {
> >>> +        /* Remove the port eth device. */
> >>>          rte_eth_dev_close(dev->port_id);
> >>> -        rte_eth_dev_info_get(dev->port_id, &dev_info);
> >>> -        if (dev_info.device && !rte_dev_remove(dev_info.device)) {
> >>> -            VLOG_INFO("Device '%s' has been detached", dev->devargs);
> >>
> >> Please keep this log message for the successful case.
> >
> > Please note the line below:
> >    VLOG_INFO("Device '%s' has been removed", dev->devargs);
> >
> > This is the log message for the successful case. With the introduction of
> representors several eth devices can use one PCI device.
> > If you remove the first representor - its eth device is closed - but you do
> not detach the PCI device till the second representor eth device is closed as
> well.
> > Would you suggest replacing "removed" with "closed" or "detached"?
> 
> I understand what you're trying to do here. What I'm trying to say is that in
> current version, message "has been removed" states that device was closed,
> but we do not know if we tried to detach it.
> We now have two levels of getting port out of OVS. First is "removing"
> (rte_eth_dev_close) and the second is "detaching" (rte_dev_remove, kind
> of confusing).
> So, in current code we do not know if we detached the port or we skipped
> detaching because of existence of other ports sharing the same device.
>


In v5 I will inform if a device was removed (e.g. closed) or detached.

> >>
> >> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
> >> because it not only finds (gets) the port, but probes (attaches) it.
> >>
> >
> > I think that the function main purpose is to find the port. Probing is 
> > optional
> for that purpose.
> > I am good with the current name but I am open to any new name you
> would like to suggest.
> > Can you please suggest?
> 
> IMHO, the main purpose of this function is to probe the port because
> searching will succeed only if we're trying to add duplicated port.
> 'netdev_dpdk_probe_port_by_devargs' should be better, IMHO.
> 

In v5 I will update the function name per your suggestion.

> >
> >>> +{
> >>> +    struct rte_dev_iterator iterator;
> >>> +    dpdk_port_t port_id;
> >>> +
> >>> +    RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
> >>
> >
> >> Invalid spacing around the FOREACH loop.
> >> Same in all the other places.
> >
> > I do not understand this comment. Can you please explain or demo the fix?
> 
> You just need an extra space before the '(' like this:
>       'RTE_ETH_FOREACH_MATCHING_DEV ('
> 

I will update in v5

> I'll send a patch for checkpatch script to catch this. Right now it supports 
> only
> 'FOR_EACH' loops, not the 'FOREACH'.
> 
> >
> >>>      dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>>
> >>>      if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
> >>>          new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
> >>>      } else {
> >>> -        name = xmemdup0(devargs, strcspn(devargs, ","));
> >>> -        if (rte_eth_dev_get_port_by_name(name, &new_port_id)
> >>> -                || !rte_eth_dev_is_valid_port(new_port_id)) {
> >>> -            /* Device not found in DPDK, attempt to attach it */
> >>> -            if (!rte_dev_probe(devargs)
> >>> -                && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
> >>> -                /* Attach successful */
> >>> -                dev->attached = true;
> >>> -                VLOG_INFO("Device '%s' attached to DPDK", devargs);
> >>> -            } else {
> >>> -                /* Attach unsuccessful */
> >>> +        new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
> >>> +        if (!rte_eth_dev_is_valid_port(new_port_id)) {
> >>> +            new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>> +        } else {
> >>> +            struct netdev_dpdk *dup_dev;
> >>> +
> >>> +            dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
> >>> +            if (dup_dev) {
> >>> +                VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> >>> +                               "which is already in use by '%s'",
> >>> +                          netdev_get_name(&dev->up), devargs,
> >>> +                          netdev_get_name(&dup_dev->up));
> >>
> >> Why you duplicating this part of code ?
> >> Can we just replace 'rte_eth_dev_get_port_by_name' with
> >> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
> >>
> >> netdev_dpdk_get_port_by_devargs should look like this in this case:
> >>
> >> /* Return the first DPDK port_id matching the devargs pattern. */
> >> static bool netdev_dpdk_get_port_by_devargs(const char *devargs,
> >> dpdk_port_t
> >> *port_id)
> >>     OVS_REQUIRES(dpdk_mutex)
> >> {
> >>     struct rte_dev_iterator iterator;
> >>
> >>     RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
> >>         /* If a break is done - must call rte_eth_iterator_cleanup. */
> >>         rte_eth_iterator_cleanup(&iterator);
> >>         break;
> >>     }
> >>     return *port_id != DPDK_ETH_PORT_ID_INVALID; }
> >>
> >> netdev_dpdk_process_devargs:
> >>
> >>         if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
> >>                 || !rte_eth_dev_is_valid_port(new_port_id)) {
> >>             /* Device not found in DPDK, attempt to attach it */
> >>             if (!rte_dev_probe(devargs)
> >>                 && !netdev_dpdk_get_port_by_devargs(devargs,
> &new_port_id)) {
> >>                 /* Attach successful */
> >>                 dev->attached = true;
> >>                 VLOG_INFO("Device '%s' attached to DPDK", devargs);
> >>             } else {
> >>                 /* Attach unsuccessful */
> >>                 new_port_id = DPDK_ETH_PORT_ID_INVALID;
> >>             }
> >>         }
> >>
> >> If it works, I'd prefer this variant.
> >>
> >
> > I don't think it works in case you add two different ports with the same
> representor devargs (or the same PCI address with a PMD supporting
> representors).
> > The extra code is required to avoid this misconfiguration (using the same
> device for two different ports).
> > For example, the following configuration must fail:
> >
> > ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one
> > type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> > ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two
> > type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
> >
> > In such misconfiguration please note the waring you get:
> >                 VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
> >                                "which is already in use by '%s'",
> >                           netdev_get_name(&dev->up), devargs,
> >                           netdev_get_name(&dup_dev->up));
> >
> > An interesting question is why didn't we need this extra code in previous
> OVS versions.
> > The reason is that if we called rte_dev_probe() a second time for the same
> PCI device - we got an error. So we were immune against this
> misconfiguration.
> > This was the behavior before the introduction of representors. With
> representor - calling rte_dev_probe() twice for the same devargs will return
> successfully, hence the need for an additional check (extra code).
> 
> But we have the same check on the upper level. If you will not check here,
> the new_port_id will go to netdev_dpdk_set_config(), where we have equal
> check for duplicated ports. Why this doesn't work for you?
> 

I will explain why this does not work. Please note the following code snippet 
from 
netdev_dpdk_set_config():

1    dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev, new_devargs, 
errp)
2   if (!rte_eth_dev_is_valid_port(new_port_id)) {
3       err = EINVAL;
4   } else if (new_port_id == dev->port_id) {
5       /* Already configured, do not reconfigure again */
6       err = 0;
7   } else {
8       err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
9                 new_devargs, errp);
10       if (!err) {
11          int sid = rte_eth_dev_socket_id(new_port_id);
12
13          dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
14          dev->devargs = xstrdup(new_devargs);
15          dev->port_id = new_port_id;
16          netdev_request_reconfigure(&dev->up);
17          netdev_dpdk_clear_xstats(dev);
18      }
19 }

Line 1: Before representors were introduced the check for a duplicated device 
was implicitly done inside netdev_dpdk_process_devarags which returned an error 
(reason: rte_dev_probe() was called twice on the same devargs and failed).
In such a case err is set to EINVAL in lines 2-3 and we do not reach lines 
4-19. With the introduction of representors netdev_dpdk_process_devarags() may 
return the same 'new_port_id' if called again with the same devargs, or may 
fail as before for PMDs which do not support representors.
In case netdev_dpdk_process_devarags() is successful for the same devargs we 
end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev case 
which results in accepting the same device for 2 different ports .... which 
ends in seg fault.
In order to prevent a change in flow logic (dup dev should fail in 
netdev_dpdk_process_devarags() as always has been) I moved the check of dup_dev 
inside the call of netdev_dpdk_process_devarags(). 
The check for dup dev in line 8 is redundant (probably has always been so).

> >
> >>
> >>> +        response = xasprintf("Device '%s' is being shared with other "
> >>> +                             "interfaces. Remove them before detaching.",
> >>> +                             argv[1]);
> >>> +        goto error;
> >>> +    }
> >>>
> >>> -    rte_eth_dev_info_get(port_id, &dev_info);
> >>> -    if (!dev_info.device || rte_dev_remove(dev_info.device)) {
> >>> -        response = xasprintf("Device '%s' can not be detached", argv[1]);
> >>> +    rte_eth_dev_close(port_id);
> >>> +    if (rte_dev_remove(rte_dev) < 0) {
> >>> +        response = xasprintf("Device '%s' can not be removed",
> >>> + argv[1]);
> >>
> >> /removed/detached/
> >> You're using different words for equal situations. This is misleading.
> >>
> >
> > I can change the wording to "detached'. However I think it is misleading in 
> > a
> different way:
> > Assuming there are two representors using the same PCI bus - then
> > "detaching" one representor will only close the eth device, but will not
> detach the PCI device until the second representor is closed as well.
> > When all eth devices are closed - please note the message below:
> >    response = xasprintf("All devices shared with device '%s' "
> >                          "have been detached", argv[1]);
> >
> > Having said that - do you still suggest replacing "removed" with
> "detached"?
> 
> I'm suggesting to separate two cases:
> 
>  1. rte_eth_dev_close() executed.
>  2. rte_dev_remove() executed.
> 
> The first one we could call "Port '%s' closed", the second "All devices shared
> with device '%s' detached". And as we have two places where this could
> happen, in both places we should have equal messages for the equal events.
> You may vary the wording, but, IMHO, it should be clear from the log which
> of above events occurred. And we should log every our action (1 and 2) for
> this purpose.
> Otherwise it'll be hard to tell what devices are still attached to OVS.
> 
> >

In function netdev_dpdk_detach() you have one of 2 cases:

1. If you try to detach a device whose PCI is shared with other devides you 
return an error
response = xasprintf("Device '%s' is being shared with other "
                             "interfaces. Remove them before detaching.",
                             argv[1]);

2. If you detach the last device using the PCI bus - you can really detach. In 
this case you execute both 
rte_eth_dev_close() and rte_dev_remove() at once and there is no separation 
between the two.

In this case you  inform.
    response = xasprintf("All devices shared with device '%s' "
                          "have been detached", argv[1]);

It seems IMHO the correct way of informing.


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v4] netdev-dpdk: support port representors

Reply via email to