On 15.01.2019 11:39, Ophir Munk wrote:
> Hi Ilya,
> Please find comments inline
>
>> -----Original Message-----
>> From: Ilya Maximets <[email protected]>
>> Sent: Monday, January 14, 2019 12:11 PM
>> To: Ophir Munk <[email protected]>; [email protected]
>> Cc: Asaf Penso <[email protected]>; Ian Stokes <[email protected]>;
>> Shahaf Shuler <[email protected]>; Thomas Monjalon
>> <[email protected]>; Olga Shern <[email protected]>; Kevin
>> Traynor <[email protected]>
>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>
>> On 13.01.2019 18:13, Ophir Munk wrote:
>>> Hi Ilya,
>>> Thank you for your review.
>>> Please find comments inline.
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets <[email protected]>
>>>> Sent: Thursday, January 10, 2019 1:32 PM
>>>> To: Ophir Munk <[email protected]>; [email protected]
>>>> Cc: Asaf Penso <[email protected]>; Ian Stokes
>>>> <[email protected]>; Shahaf Shuler <[email protected]>;
>> Thomas
>>>> Monjalon <[email protected]>; Olga Shern <[email protected]>;
>>>> Kevin Traynor <[email protected]>
>>>> Subject: Re: [PATCH v4] netdev-dpdk: support port representors
>>>>
>>>> On 08.01.2019 12:31, Ophir Munk wrote:
>>>>> Dpdk port representors were introduced in dpdk versions 18.xx.
>>>>> Prior to port representors there was a one-to-one relationship
>>>>> between an rte device (e.g. PCI bus) and an eth device (referenced
>>>>> as dpdk port id in OVS). With port representors the relationship
>>>>> becomes one-to-many rte device to eth devices.
>>>>> For example in [3] there are two devices (representors) using the
>>>>> same PCI physical address 0000:08:00.0:
>>>>> "0000:08:00.0,representor=[3]" and "0000:08:00.0,representor=[5]".
>>>>> This commit handles the new one-to-many relationship. For example,
>>>>> when one of the device port representors in [3] is closed - the PCI
>>>>> bus cannot be detached until the other device port representor is
>>>>> closed as well. OVS remains backward compatible by supporting dpdk
>>>>> legacy PCI ports which do not include port representors.
>>>>> Dpdk port representors related commits are listed in [1]. Dpdk port
>>>>> representors documentation appears in [2]. A sample configuration
>>>>> which uses two representors ports (the output of "ovs-vsctl show"
>>>>> command) is shown in [3].
>>>>> +
>>>>> static int
>>>>> vhost_common_construct(struct netdev *netdev)
>>>>> OVS_REQUIRES(dpdk_mutex)
>>>>> @@ -1353,20 +1373,25 @@ static void
>>>>> netdev_dpdk_destruct(struct netdev *netdev) {
>>>>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>>> - struct rte_eth_dev_info dev_info;
>>>>> + struct rte_device *rte_dev;
>>>>>
>>>>> ovs_mutex_lock(&dpdk_mutex);
>>>>>
>>>>> rte_eth_dev_stop(dev->port_id);
>>>>> dev->started = false;
>>>>> -
>>>>
>>>> Please, keep the empty line.
>>>>
>>>
>>> I will update in v5.
>>>
>>>>> if (dev->attached) {
>>>>> + /* Remove the port eth device. */
>>>>> rte_eth_dev_close(dev->port_id);
>>>>> - rte_eth_dev_info_get(dev->port_id, &dev_info);
>>>>> - if (dev_info.device && !rte_dev_remove(dev_info.device)) {
>>>>> - VLOG_INFO("Device '%s' has been detached", dev->devargs);
>>>>
>>>> Please keep this log message for the successful case.
>>>
>>> Please note the line below:
>>> VLOG_INFO("Device '%s' has been removed", dev->devargs);
>>>
>>> This is the log message for the successful case. With the introduction of
>> representors several eth devices can use one PCI device.
>>> If you remove the first representor - its eth device is closed - but you do
>> not detach the PCI device till the second representor eth device is closed as
>> well.
>>> Would you suggest replacing "removed" with "closed" or "detached"?
>>
>> I understand what you're trying to do here. What I'm trying to say is that in
>> current version, message "has been removed" states that device was closed,
>> but we do not know if we tried to detach it.
>> We now have two levels of getting port out of OVS. First is "removing"
>> (rte_eth_dev_close) and the second is "detaching" (rte_dev_remove, kind
>> of confusing).
>> So, in current code we do not know if we detached the port or we skipped
>> detaching because of existence of other ports sharing the same device.
>>
>
> In v5 I will inform if a device was removed (e.g. closed) or detached.
>
>>>>
>>>> I'm not sure if 'netdev_dpdk_get_port_by_devargs' is the correct name
>>>> because it not only finds (gets) the port, but probes (attaches) it.
>>>>
>>>
>>> I think that the function main purpose is to find the port. Probing is
>>> optional
>> for that purpose.
>>> I am good with the current name but I am open to any new name you
>> would like to suggest.
>>> Can you please suggest?
>>
>> IMHO, the main purpose of this function is to probe the port because
>> searching will succeed only if we're trying to add duplicated port.
>> 'netdev_dpdk_probe_port_by_devargs' should be better, IMHO.
>>
>
> In v5 I will update the function name per your suggestion.
>
>>>
>>>>> +{
>>>>> + struct rte_dev_iterator iterator;
>>>>> + dpdk_port_t port_id;
>>>>> +
>>>>> + RTE_ETH_FOREACH_MATCHING_DEV(port_id, devargs, &iterator) {
>>>>
>>>
>>>> Invalid spacing around the FOREACH loop.
>>>> Same in all the other places.
>>>
>>> I do not understand this comment. Can you please explain or demo the fix?
>>
>> You just need an extra space before the '(' like this:
>> 'RTE_ETH_FOREACH_MATCHING_DEV ('
>>
>
> I will update in v5
>
>> I'll send a patch for checkpatch script to catch this. Right now it supports
>> only
>> 'FOR_EACH' loops, not the 'FOREACH'.
>>
>>>
>>>>> dpdk_port_t new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>>>
>>>>> if (strncmp(devargs, "class=eth,mac=", 14) == 0) {
>>>>> new_port_id = netdev_dpdk_get_port_by_mac(&devargs[14]);
>>>>> } else {
>>>>> - name = xmemdup0(devargs, strcspn(devargs, ","));
>>>>> - if (rte_eth_dev_get_port_by_name(name, &new_port_id)
>>>>> - || !rte_eth_dev_is_valid_port(new_port_id)) {
>>>>> - /* Device not found in DPDK, attempt to attach it */
>>>>> - if (!rte_dev_probe(devargs)
>>>>> - && !rte_eth_dev_get_port_by_name(name, &new_port_id)) {
>>>>> - /* Attach successful */
>>>>> - dev->attached = true;
>>>>> - VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>>>> - } else {
>>>>> - /* Attach unsuccessful */
>>>>> + new_port_id = netdev_dpdk_get_port_by_devargs(devargs);
>>>>> + if (!rte_eth_dev_is_valid_port(new_port_id)) {
>>>>> + new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>>> + } else {
>>>>> + struct netdev_dpdk *dup_dev;
>>>>> +
>>>>> + dup_dev = netdev_dpdk_lookup_by_port_id(new_port_id);
>>>>> + if (dup_dev) {
>>>>> + VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>>>> + "which is already in use by '%s'",
>>>>> + netdev_get_name(&dev->up), devargs,
>>>>> + netdev_get_name(&dup_dev->up));
>>>>
>>>> Why you duplicating this part of code ?
>>>> Can we just replace 'rte_eth_dev_get_port_by_name' with
>>>> 'netdev_dpdk_get_port_by_devargs' and keep all the other logic ?
>>>>
>>>> netdev_dpdk_get_port_by_devargs should look like this in this case:
>>>>
>>>> /* Return the first DPDK port_id matching the devargs pattern. */
>>>> static bool netdev_dpdk_get_port_by_devargs(const char *devargs,
>>>> dpdk_port_t
>>>> *port_id)
>>>> OVS_REQUIRES(dpdk_mutex)
>>>> {
>>>> struct rte_dev_iterator iterator;
>>>>
>>>> RTE_ETH_FOREACH_MATCHING_DEV (*port_id, devargs, &iterator) {
>>>> /* If a break is done - must call rte_eth_iterator_cleanup. */
>>>> rte_eth_iterator_cleanup(&iterator);
>>>> break;
>>>> }
>>>> return *port_id != DPDK_ETH_PORT_ID_INVALID; }
>>>>
>>>> netdev_dpdk_process_devargs:
>>>>
>>>> if (netdev_dpdk_get_port_by_devargs(devagrs, &new_port_id)
>>>> || !rte_eth_dev_is_valid_port(new_port_id)) {
>>>> /* Device not found in DPDK, attempt to attach it */
>>>> if (!rte_dev_probe(devargs)
>>>> && !netdev_dpdk_get_port_by_devargs(devargs,
>> &new_port_id)) {
>>>> /* Attach successful */
>>>> dev->attached = true;
>>>> VLOG_INFO("Device '%s' attached to DPDK", devargs);
>>>> } else {
>>>> /* Attach unsuccessful */
>>>> new_port_id = DPDK_ETH_PORT_ID_INVALID;
>>>> }
>>>> }
>>>>
>>>> If it works, I'd prefer this variant.
>>>>
>>>
>>> I don't think it works in case you add two different ports with the same
>> representor devargs (or the same PCI address with a PMD supporting
>> representors).
>>> The extra code is required to avoid this misconfiguration (using the same
>> device for two different ports).
>>> For example, the following configuration must fail:
>>>
>>> ovs-vsctl --no-wait add-port br1 port-one -- set Interface port-one
>>> type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
>>> ovs-vsctl --no-wait add-port br1 port-two -- set Interface port-two
>>> type=dpdk options:dpdk-devargs=0000:08:00.0,representor=[1]
>>>
>>> In such misconfiguration please note the waring you get:
>>> VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' "
>>> "which is already in use by '%s'",
>>> netdev_get_name(&dev->up), devargs,
>>> netdev_get_name(&dup_dev->up));
>>>
>>> An interesting question is why didn't we need this extra code in previous
>> OVS versions.
>>> The reason is that if we called rte_dev_probe() a second time for the same
>> PCI device - we got an error. So we were immune against this
>> misconfiguration.
>>> This was the behavior before the introduction of representors. With
>> representor - calling rte_dev_probe() twice for the same devargs will return
>> successfully, hence the need for an additional check (extra code).
>>
>> But we have the same check on the upper level. If you will not check here,
>> the new_port_id will go to netdev_dpdk_set_config(), where we have equal
>> check for duplicated ports. Why this doesn't work for you?
>>
>
> I will explain why this does not work. Please note the following code snippet
> from
> netdev_dpdk_set_config():
>
> 1 dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev, new_devargs,
> errp)
> 2 if (!rte_eth_dev_is_valid_port(new_port_id)) {
> 3 err = EINVAL;
> 4 } else if (new_port_id == dev->port_id) {
> 5 /* Already configured, do not reconfigure again */
> 6 err = 0;
> 7 } else {
> 8 err = netdev_dpdk_check_dup_dev(new_port_id, netdev,
> 9 new_devargs, errp);
> 10 if (!err) {
> 11 int sid = rte_eth_dev_socket_id(new_port_id);
> 12
> 13 dev->requested_socket_id = sid < 0 ? SOCKET0 : sid;
> 14 dev->devargs = xstrdup(new_devargs);
> 15 dev->port_id = new_port_id;
> 16 netdev_request_reconfigure(&dev->up);
> 17 netdev_dpdk_clear_xstats(dev);
> 18 }
> 19 }
>
> Line 1: Before representors were introduced the check for a duplicated device
> was implicitly done inside netdev_dpdk_process_devarags which returned an
> error (reason: rte_dev_probe() was called twice on the same devargs and
> failed).
> In such a case err is set to EINVAL in lines 2-3 and we do not reach lines
> 4-19. With the introduction of representors netdev_dpdk_process_devarags()
> may return the same 'new_port_id' if called again with the same devargs, or
> may fail as before for PMDs which do not support representors.
> In case netdev_dpdk_process_devarags() is successful for the same devargs we
> end with no errors (err = 0) in lines 4-6 .... hence we ignore the dup dev
> case which results in accepting the same device for 2 different ports ....
> which ends in seg fault.
But 'dev->port_id' should not be initialized yet at this point.
And the (new_port_id == dev->port_id) should not be true and we
should execute code in 'else' branch (8-18).
> In order to prevent a change in flow logic (dup dev should fail in
> netdev_dpdk_process_devarags() as always has been) I moved the check of
> dup_dev inside the call of netdev_dpdk_process_devarags().
> The check for dup dev in line 8 is redundant (probably has always been so).
>
>>>
>>>>
>>>>> + response = xasprintf("Device '%s' is being shared with other "
>>>>> + "interfaces. Remove them before detaching.",
>>>>> + argv[1]);
>>>>> + goto error;
>>>>> + }
>>>>>
>>>>> - rte_eth_dev_info_get(port_id, &dev_info);
>>>>> - if (!dev_info.device || rte_dev_remove(dev_info.device)) {
>>>>> - response = xasprintf("Device '%s' can not be detached", argv[1]);
>>>>> + rte_eth_dev_close(port_id);
>>>>> + if (rte_dev_remove(rte_dev) < 0) {
>>>>> + response = xasprintf("Device '%s' can not be removed",
>>>>> + argv[1]);
>>>>
>>>> /removed/detached/
>>>> You're using different words for equal situations. This is misleading.
>>>>
>>>
>>> I can change the wording to "detached'. However I think it is misleading in
>>> a
>> different way:
>>> Assuming there are two representors using the same PCI bus - then
>>> "detaching" one representor will only close the eth device, but will not
>> detach the PCI device until the second representor is closed as well.
>>> When all eth devices are closed - please note the message below:
>>> response = xasprintf("All devices shared with device '%s' "
>>> "have been detached", argv[1]);
>>>
>>> Having said that - do you still suggest replacing "removed" with
>> "detached"?
>>
>> I'm suggesting to separate two cases:
>>
>> 1. rte_eth_dev_close() executed.
>> 2. rte_dev_remove() executed.
>>
>> The first one we could call "Port '%s' closed", the second "All devices
>> shared
>> with device '%s' detached". And as we have two places where this could
>> happen, in both places we should have equal messages for the equal events.
>> You may vary the wording, but, IMHO, it should be clear from the log which
>> of above events occurred. And we should log every our action (1 and 2) for
>> this purpose.
>> Otherwise it'll be hard to tell what devices are still attached to OVS.
>>
>>>
>
> In function netdev_dpdk_detach() you have one of 2 cases:
>
> 1. If you try to detach a device whose PCI is shared with other devides you
> return an error
> response = xasprintf("Device '%s' is being shared with other "
> "interfaces. Remove them before detaching.",
> argv[1]);
>
> 2. If you detach the last device using the PCI bus - you can really detach.
> In this case you execute both
> rte_eth_dev_close() and rte_dev_remove() at once and there is no separation
> between the two.
>
> In this case you inform.
> response = xasprintf("All devices shared with device '%s' "
> "have been detached", argv[1]);
>
> It seems IMHO the correct way of informing.
>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev