On 25.04.2019 10:38, Ilya Maximets wrote:
> On 25.04.2019 0:58, Roni Bar Yanai wrote:
>> Hi Ilya,
>>
>> Please see comment/clarification inline.
>> Thanks,
>> Roni
>>
>>> -----Original Message-----
>>> From: Ilya Maximets <[email protected]>
>>> Sent: Wednesday, April 24, 2019 6:12 PM
>>> To: Ophir Munk <[email protected]>; [email protected]
>>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>; 
>>> Kevin
>>> Traynor <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>>> <[email protected]>; Simon Horman <[email protected]>; Olga
>>> Shern <[email protected]>; Asaf Penso <[email protected]>; Oz Shlomo
>>> <[email protected]>; Paul Blakey <[email protected]>
>>> Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>>>
>>> On 24.04.2019 17:26, Ophir Munk wrote:
>>>> Hi Ilya,
>>>> Please find comments inline and at the end of this mail.
>>>
>>> Hi. Thanks for review.
>>> Comments inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> Regards,
>>>> Ophir
>>>>
>>>>> -----Original Message-----
>>>>> From: Ilya Maximets <[email protected]>
>>>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>>>> To: [email protected]
>>>>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>;
>>>>> Ophir Munk <[email protected]>; Kevin Traynor
>>>>> <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>>>>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>>>>> <[email protected]>; Simon Horman <[email protected]>;
>>>>> Ilya Maximets <[email protected]>
>>>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>>>
>>>>> Current issues with Flow API:
>>>>>
>>>>> * OVS calls offloading functions regardless of successful
>>>>>   flow API initialization. (ex. on init_flow_api failure)
>>>>> * Static initilaization of Flow API for a netdev_class forbids
>>>>>   having different offloading types for different instances
>>>>>   of netdev with the same netdev_class. (ex. different vports in
>>>>>   'system' and 'netdev' datapaths at the same time)
>>>>>
>>>>> Solution:
>>>>>
>>>>> * Move Flow API from the netdev_class to netdev instance.
>>>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>>>   suitable one.
>>>>>
>>>>
>>>> I suggest distinguishing between initialization and probing.
>>>> The probing you mention is implemented by testing device initialization:
>>> init_flow_api().
>>>> I suggest adding a new probe() API just for the sake of probing. It will 
>>>> check the
>>> netdev struct.
>>>> I suggest  leaving the init_flow_api() API for HW/drivers initialization.
>>>> The HW/driver may fail at initialization, but it does not mean we need to 
>>>> probe
>>> for a new API in that case.
>>>
>>> I also thought about separate 'probe()' api, but it seems that probing
>>> will be equal to 'init()' in most cases. Checking the 'netdev struct'
>>> fro probing is a bad solution as instances of same netdev class could
>>> require different offloading apis. Probing a new api on init failure
>>> could be useful exactly for this case. Anyway we'll be able to improve
>>> the logic of assigning in the future if it will be needed. This is
>>> the single point in code where we have information about all the available
>>> offloading providers.
>>>
>>>>
>>>>> Side effects:
>>>>>
>>>>> * Flow API providers localized as possible in their modules.
>>>>> * Now we have an ability to make runtime checks. For example,
>>>>>   we could check if particular device supports features we
>>>>>   need, like if dpdk device supports RSS+MARK action.
>>>>>
>>>>> Signed-off-by: Ilya Maximets <[email protected]>
>>>>> ---
>>>>>
>>>>> Since RFC:
>>>>>   * Fixed sparce build.
>>>>>   * Some logs turned from INFO to DBG.
>>>>>   * Enabled HW offloading on non-Linux systems
>>>>>     (For testing with dummy provider).
>>>>>
>>>>>  lib/automake.mk               |   2 +-
>>>>>  lib/dpdk.c                    |   2 +
>>>>>  lib/netdev-dpdk.c             |  24 +++-
>>>>>  lib/netdev-dpdk.h             |   3 +
>>>>>  lib/netdev-dummy.c            |  24 +++-
>>>>>  lib/netdev-linux.c            |   3 -
>>>>>  lib/netdev-linux.h            |  10 --
>>>>>  lib/netdev-offload-provider.h |  99 +++++++++++++++
>>>>>  lib/netdev-provider.h         |  67 +----------
>>>>>  lib/netdev-rte-offloads.c     |  40 +++++-
>>>>>  lib/netdev-rte-offloads.h     |  41 +------
>>>>>  lib/netdev-tc-offloads.c      |  39 ++++--
>>>>>  lib/netdev-tc-offloads.h      |  44 -------
>>>>>  lib/netdev-vport.c            |   6 +-
>>>>>  lib/netdev.c                  | 221 +++++++++++++++++++++++++++-------
>>>>>  tests/dpif-netdev.at          |   4 +-
>>>>>  tests/ofproto-macros.at       |   1 -
>>>>>  17 files changed, 398 insertions(+), 232 deletions(-)  create mode 100644
>>>>> lib/netdev-offload-provider.h  delete mode 100644 lib/netdev-tc-offloads.h
>>>>>
>>>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>>>> 100644
>>>>> --- a/lib/automake.mk
>>>>> +++ b/lib/automake.mk
>>>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>>>   lib/netdev-dpdk.h \
>>>>>   lib/netdev-dummy.c \
>>>>>   lib/netdev-provider.h \
>>>>> + lib/netdev-offload-provider.h \
>>>>
>>>> Please keep alphabetical order (switch between the last 2 lines)
>>>>
>>>>>   lib/netdev-rte-offloads.h \
>>>>>   lib/netdev-vport.c \
>>>>>   lib/netdev-vport.h \
>>>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \
>>>>>   lib/netdev-linux.c \
>>>>>   lib/netdev-linux.h \
>>>>>   lib/netdev-tc-offloads.c \
>>>>> - lib/netdev-tc-offloads.h \
>>>>>   lib/netlink-conntrack.c \
>>>>>   lib/netlink-conntrack.h \
>>>>>   lib/netlink-notifier.c \
>>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>>> index dc6171546..6c6298635 100644
>>>>> --- a/lib/dpdk.c
>>>>> +++ b/lib/dpdk.c
>>>>> @@ -34,6 +34,7 @@
>>>>>  #include "dirs.h"
>>>>>  #include "fatal-signal.h"
>>>>>  #include "netdev-dpdk.h"
>>>>> +#include "netdev-rte-offloads.h"
>>>>>  #include "openvswitch/dynamic-string.h"
>>>>>  #include "openvswitch/vlog.h"
>>>>>  #include "ovs-numa.h"
>>>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>>
>>>>>      /* Finally, register the dpdk classes */
>>>>>      netdev_dpdk_register();
>>>>> +    netdev_dpdk_flow_api_register();
>>>>>      return true;
>>>>>  }
>>>>>
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>>> 47153dc60..c06c9ef81 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -4204,6 +4204,27 @@ unlock:
>>>>>      return err;
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) {
>>>>> +    struct netdev_dpdk *dev;
>>>>> +    bool ret = false;
>>>>> +
>>>>> +    if (!is_dpdk_class(netdev->netdev_class)) {
>>>>> +        goto out;
>>>>> +    }
>>>>> +
>>>>> +    dev = netdev_dpdk_cast(netdev);
>>>>> +    ovs_mutex_lock(&dev->mutex);
>>>>> +    if (dev->type == DPDK_DEV_ETH) {
>>>>> +        /* TODO: Check if we able to offload some minimal flow. */
>>>>> +        ret = true;
>>>>> +    }
>>>>> +    ovs_mutex_unlock(&dev->mutex);
>>>>> +out:
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>>  int
>>>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>>                               struct rte_flow *rte_flow, @@ -4268,8 
>>>>> +4289,7 @@
>>>>> netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>>>      .get_features = netdev_dpdk_get_features,           \
>>>>>      .get_status = netdev_dpdk_get_status,               \
>>>>>      .reconfigure = netdev_dpdk_reconfigure,             \
>>>>> -    .rxq_recv = netdev_dpdk_rxq_recv,                   \
>>>>> -    DPDK_FLOW_OFFLOAD_API
>>>>> +    .rxq_recv = netdev_dpdk_rxq_recv
>>>>>
>>>>>  static const struct netdev_class dpdk_class = {
>>>>>      .type = "dpdk",
>>>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index
>>>>> 9bbb8d8d6..60631c4f0 100644
>>>>> --- a/lib/netdev-dpdk.h
>>>>> +++ b/lib/netdev-dpdk.h
>>>>> @@ -34,6 +34,9 @@ struct rte_flow_action;
>>>>>
>>>>>  void netdev_dpdk_register(void);
>>>>>  void free_dpdk_buf(struct dp_packet *);
>>>>> +
>>>>> +bool netdev_dpdk_flow_api_supported(struct netdev *);
>>>>> +
>>>>>  int
>>>>>  netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>>                               struct rte_flow *rte_flow, diff --git 
>>>>> a/lib/netdev-dummy.c
>>>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644
>>>>> --- a/lib/netdev-dummy.c
>>>>> +++ b/lib/netdev-dummy.c
>>>>> @@ -24,6 +24,7 @@
>>>>>  #include "dp-packet.h"
>>>>>  #include "dpif-netdev.h"
>>>>>  #include "flow.h"
>>>>> +#include "netdev-offload-provider.h"
>>>>>  #include "netdev-provider.h"
>>>>>  #include "netdev-vport.h"
>>>>>  #include "odp-util.h"
>>>>> @@ -1523,10 +1524,6 @@ exit:
>>>>>      return error ? -1 : 0;
>>>>>  }
>>>>>
>>>>> -#define DUMMY_FLOW_OFFLOAD_API                          \
>>>>> -    .flow_put = netdev_dummy_flow_put,                  \
>>>>> -    .flow_del = netdev_dummy_flow_del
>>>>> -
>>>>>  #define NETDEV_DUMMY_CLASS_COMMON                       \
>>>>>      .run = netdev_dummy_run,                            \
>>>>>      .wait = netdev_dummy_wait,                          \
>>>>> @@ -1559,8 +1556,7 @@ exit:
>>>>>      .rxq_dealloc = netdev_dummy_rxq_dealloc,            \
>>>>>      .rxq_recv = netdev_dummy_rxq_recv,                  \
>>>>>      .rxq_wait = netdev_dummy_rxq_wait,                  \
>>>>> -    .rxq_drain = netdev_dummy_rxq_drain,                \
>>>>> -    DUMMY_FLOW_OFFLOAD_API
>>>>> +    .rxq_drain = netdev_dummy_rxq_drain
>>>>>
>>>>>  static const struct netdev_class dummy_class = {
>>>>>      NETDEV_DUMMY_CLASS_COMMON,
>>>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class
>>>>> dummy_pmd_class = {
>>>>>      .is_pmd = true,
>>>>>      .reconfigure = netdev_dummy_reconfigure  };
>>>>> +
>>>>> +static int
>>>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) {
>>>>> +    return is_dummy_class(netdev->netdev_class) ? 0 : -1; }
>>>>> +
>>>>> +static const struct netdev_flow_api netdev_dummy_offloads = {
>>>>> +    .type = "dummy",
>>>>> +    .flow_put = netdev_dummy_flow_put,
>>>>> +    .flow_del = netdev_dummy_flow_del,
>>>>> +    .init_flow_api = netdev_dummy_offloads_init_flow_api,
>>>>> +};
>>>>> +
>>>>>
>>>>
>>>>>  /* Helper functions. */
>>>>>
>>>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level
>>>>> level)
>>>>>      netdev_register_provider(&dummy_internal_class);
>>>>>      netdev_register_provider(&dummy_pmd_class);
>>>>>
>>>>> +    netdev_register_flow_api_provider(&netdev_dummy_offloads);
>>>>> +
>>>>>      netdev_vport_tunnel_register();
>>>>>  }
>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
>>>>> f75d73fd3..e4ea94cf9
>>>>> 100644
>>>>> --- a/lib/netdev-linux.c
>>>>> +++ b/lib/netdev-linux.c
>>>>> @@ -55,7 +55,6 @@
>>>>>  #include "hash.h"
>>>>>  #include "openvswitch/hmap.h"
>>>>>  #include "netdev-provider.h"
>>>>> -#include "netdev-tc-offloads.h"
>>>>>  #include "netdev-vport.h"
>>>>>  #include "netlink-notifier.h"
>>>>>  #include "netlink-socket.h"
>>>>> @@ -3321,7 +3320,6 @@ exit:
>>>>>
>>>>>  const struct netdev_class netdev_linux_class = {
>>>>>      NETDEV_LINUX_CLASS_COMMON,
>>>>> -    LINUX_FLOW_OFFLOAD_API,
>>>>>      .type = "system",
>>>>
>>>> Please note that before this patch type="system" and type="internal" had
>>> LINUX_FLOW_OFFLOAD_API, but type="tap" did not.
>>>> With this patch type="tap" is going to be tested with init_flow_api(). If 
>>>> the test
>>> passes - type="tap" is going to have flow_api which it did not have before.
>>>> Was is verified that type="tap" will fail with init_flow_api()?
>>>
>>> If 'init_flow_api' will be able to create tc qdisc than offloading
>>> is supported. If not - init will fail. It's simple. When linux
>>> will support tc offloading for tap devices we'll have offloading
>>> for free without changing the OVS code.
>>>
>>>>
>>>>>      .construct = netdev_linux_construct,
>>>>>      .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const 
>>>>> struct
>>>>> netdev_class netdev_tap_class = {
>>>>>
>>>>>  const struct netdev_class netdev_internal_class = {
>>>>>      NETDEV_LINUX_CLASS_COMMON,
>>>>> -    LINUX_FLOW_OFFLOAD_API,
>>>>>      .type = "internal",
>>>>>      .construct = netdev_linux_construct,
>>>>>      .get_stats = netdev_internal_get_stats, diff --git 
>>>>> a/lib/netdev-linux.h
>>>> ...
>>>>> +static int
>>>>> +netdev_assign_flow_api(struct netdev *netdev) {
>>>>> +    struct netdev_registered_flow_api *rfa;
>>>>> +
>>>>> +    CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>>> +        if (!rfa->flow_api->init_flow_api(netdev)) {
>>>>> +            ovs_refcount_ref(&rfa->refcnt);
>>>>> +            ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>>> +            VLOG_INFO("%s: Assigned flow API '%s'.",
>>>>> +                      netdev_get_name(netdev), rfa->flow_api->type);
>>>>> +            return 0;
>>>>> +        }
>>>>
>>>> By this logic the first API which successfully passes init_flow_api() 
>>>> becomes the
>>> flow_api for this netdev.
>>>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev
>>> datapath vport API (planned to be added soon).
>>>> Assuming both vport APIs successfully pass init_flow_api(), then the first 
>>>> one to
>>> be tested will always be chosen and the second one will never be used.
>>>> Are we sure that the system datapath vport will not always be chosen with 
>>>> the
>>> current netdev_tc_init_flow_api() implementation.
>>>> How can we distinguish between a vport created under different datapaths?
>>>> This issue must be resolved for this patch.
>>>
>>> netdev-vport that works in userspace datapath has no backing linux
>>> interface --> get_ifindex failure --> no offloading API configured.
>>> When the planned offloading in dpdk flow api provider will be
>>> implemented it will be chosen.
>>>
>>> Right now we don't have and don't plan to have natdevs that suitable
>>> for more than one flow api provider. If they appear in the future, we
>>> could solve this by choosing most suitable provider instead of first
>>> one. But this is over-engineering for now and even for a near future.
>>>
>> This is more a request for a clarification. 
>> If I understand correctly,  you expect that there will be one flow api 
>> provider for dpdk.
> 
> Yes.
> 
>> When the api is called we must test the netdev class and then do the needed 
>> tests (maybe HW capabilities) so we can tell if we should return true or 
>> false.
> 
> Correct in general, but you need to test netdev instance, not the netdev 
> class.
> 
>> For vport we have multiple subtypes (vxlan,gre..etc), so if we only support 
>> sub group we can test the sub type by its name.
> 
> All the vport subtypes are different netdev classes.
> 
>> What happens if we have both system and netdev datapath (corner case, but 
>> still)? If the system is called first and there is no backing linux port 
>> then we are OK. 
>> What will happen if the netdev offload provider will be called first? do you 
>> expect that we do the same test (linux backing port?) in the dpdk 
>> implementation? Is this type of test being correct for all vport types? That 
>> is, if we don't have a backing linux port (ifindex) it means that the port 
>> is part of a netdev bridge?
> 
> Yes, you could add simple check:
> 
>     if (netdev_vport_is_vxlan(netdev) && netdev_get_ifindex(netdev) < 0) {
>         return 0;
>     }
> 
> This check should be done inside init_flow_api() (not in 
> netdev_dpdk_flow_api_supported()).
> It's allowed to call functions of different netdev provides inside the
> netdev offload provider code.
> 
> 'netdev_vport_is_vxlan' should be implemented inside netdev-vport.c.
> 
> 
> Note that you'll need an additional change to allow offloading for ports
> without ifindex:
> 
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index e0731959d..3c474b18c 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -527,10 +527,6 @@ netdev_ports_insert(struct netdev *netdev, const struct 
> dpif_class *dpif_class,
>      struct port_to_netdev_data *data;
>      int ifindex = netdev_get_ifindex(netdev);
>  
> -    if (ifindex < 0) {
> -        return ENODEV;
> -    }
> -
>      ovs_mutex_lock(&netdev_hmap_mutex);
>      if (netdev_ports_lookup(dpif_port->port_no, dpif_class)) {
>          ovs_mutex_unlock(&netdev_hmap_mutex);
> @@ -541,11 +537,16 @@ netdev_ports_insert(struct netdev *netdev, const struct 
> dpif_class *dpif_class,
>      data->netdev = netdev_ref(netdev);
>      data->dpif_class = dpif_class;
>      dpif_port_clone(&data->dpif_port, dpif_port);
> -    data->ifindex = ifindex;
> +
> +    if (ifindex >= 0) {
> +        data->ifindex = ifindex;
> +        hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
> +    } else {
> +        data->ifindex = -1;
> +    }
>  
>      hmap_insert(&port_to_netdev, &data->portno_node,
>                  netdev_ports_hash(dpif_port->port_no, dpif_class));
> -    hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
>      ovs_mutex_unlock(&netdev_hmap_mutex);
>  
>      netdev_init_flow_api(netdev);

One more hunk:

@@ -582,7 +583,9 @@ netdev_ports_remove(odp_port_t port_no, const struct 
dpif_class *dpif_class)
         dpif_port_destroy(&data->dpif_port);
         netdev_close(data->netdev); /* unref and possibly close */
         hmap_remove(&port_to_netdev, &data->portno_node);
-        hmap_remove(&ifindex_to_port, &data->ifindex_node);
+        if (data->ifindex >= 0) {
+            hmap_remove(&ifindex_to_port, &data->ifindex_node);
+        }
         free(data);
         ret = 0;
     }

> ---
> 
> It's unrelated to current patch, so, I'll send it separately.
> 
>> Basically, there is nothing in the init flow parameters that can tell us if 
>> the port should be netdev or system (ifindex is implementation related) 
>> because we don't have the context of the caller.
>>
>>>>
>>>>> +        VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>>> +                 netdev_get_name(netdev), rfa->flow_api->type);
>>>>> +    }
>>>>> +    VLOG_INFO("%s: No suitable flow API found.",
>>>>> + netdev_get_name(netdev));
>>>>> +
>>>>> +    return -1;
>>>>
>>>> Please return an enumerated positive error.
>>>
>>> It's an internal helper function. Not the public API.
>>> All the functions that visible outside this module returns proper
>>> positive error codes.
>>>
>>>>
>>>>> +}
>>>>> +
>>>>>  int
>>>>>  netdev_flow_flush(struct netdev *netdev)  {
>>>>> -    const struct netdev_class *class = netdev->netdev_class;
>>>>
>>>> A few more comments.
>>>> 1. This patch changes code related to OVS offload implementation. OVS 
>>>> offload
>>> must be confirmed with this patch before it is accepted.
>>>
>>> Sure. And I'll appreciate any help with testing as I'm limited with
>>> offloading capable hardware.
>>>
>>>>
>>>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under
>>> netdev datapath)  port type.
>>>> Please confirm the following understanding of the required steps.
>>>> 2.1 Need to register two new flow_apis for the two ports types.
>>>> 2.2 Need to write the corresponding init_flow_api() to test the relevant 
>>>> netdev
>>> instances.
>>>
>>> No, you don't need to implement new flow api provider. I assume that
>>> offloading for these netdevs will be implemented via DPDK. So, you
>>> need to add the functionality to netdev_rte_offload provider and
>>> teach it to return success for these netdevs from the init_flow_api().
>>>
>>>> 2.3 The question how to distinguish between a vport under system datapath
>>> versus vport under netdev datapath is still an open question (would be 
>>> happy to
>>> learn that it is already resolved).
>>>
>>> Solved. See above.
>>>
>>>>
>>>> 3. Could you please add inline documentation for the newly added code? It
>>> seems the logic is spread is several places so any comments will be helpful.
>>>
>>> Provider registering API has comments. 'netdev_assign_flow_api' is
>>> quiet simple and mostly documented by the log messages inside.
>>> I'd like to add some comments if you'll point me functions that needs them.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to