On 25.04.2019 10:38, Ilya Maximets wrote:
> On 25.04.2019 0:58, Roni Bar Yanai wrote:
>> Hi Ilya,
>>
>> Please see comment/clarification inline.
>> Thanks,
>> Roni
>>
>>> -----Original Message-----
>>> From: Ilya Maximets <[email protected]>
>>> Sent: Wednesday, April 24, 2019 6:12 PM
>>> To: Ophir Munk <[email protected]>; [email protected]
>>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>;
>>> Kevin
>>> Traynor <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>>> <[email protected]>; Simon Horman <[email protected]>; Olga
>>> Shern <[email protected]>; Asaf Penso <[email protected]>; Oz Shlomo
>>> <[email protected]>; Paul Blakey <[email protected]>
>>> Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>>>
>>> On 24.04.2019 17:26, Ophir Munk wrote:
>>>> Hi Ilya,
>>>> Please find comments inline and at the end of this mail.
>>>
>>> Hi. Thanks for review.
>>> Comments inline.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>>>
>>>> Regards,
>>>> Ophir
>>>>
>>>>> -----Original Message-----
>>>>> From: Ilya Maximets <[email protected]>
>>>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>>>> To: [email protected]
>>>>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>;
>>>>> Ophir Munk <[email protected]>; Kevin Traynor
>>>>> <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>>>>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>>>>> <[email protected]>; Simon Horman <[email protected]>;
>>>>> Ilya Maximets <[email protected]>
>>>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>>>
>>>>> Current issues with Flow API:
>>>>>
>>>>> * OVS calls offloading functions regardless of successful
>>>>> flow API initialization. (ex. on init_flow_api failure)
>>>>> * Static initilaization of Flow API for a netdev_class forbids
>>>>> having different offloading types for different instances
>>>>> of netdev with the same netdev_class. (ex. different vports in
>>>>> 'system' and 'netdev' datapaths at the same time)
>>>>>
>>>>> Solution:
>>>>>
>>>>> * Move Flow API from the netdev_class to netdev instance.
>>>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>>> suitable one.
>>>>>
>>>>
>>>> I suggest distinguishing between initialization and probing.
>>>> The probing you mention is implemented by testing device initialization:
>>> init_flow_api().
>>>> I suggest adding a new probe() API just for the sake of probing. It will
>>>> check the
>>> netdev struct.
>>>> I suggest leaving the init_flow_api() API for HW/drivers initialization.
>>>> The HW/driver may fail at initialization, but it does not mean we need to
>>>> probe
>>> for a new API in that case.
>>>
>>> I also thought about separate 'probe()' api, but it seems that probing
>>> will be equal to 'init()' in most cases. Checking the 'netdev struct'
>>> fro probing is a bad solution as instances of same netdev class could
>>> require different offloading apis. Probing a new api on init failure
>>> could be useful exactly for this case. Anyway we'll be able to improve
>>> the logic of assigning in the future if it will be needed. This is
>>> the single point in code where we have information about all the available
>>> offloading providers.
>>>
>>>>
>>>>> Side effects:
>>>>>
>>>>> * Flow API providers localized as possible in their modules.
>>>>> * Now we have an ability to make runtime checks. For example,
>>>>> we could check if particular device supports features we
>>>>> need, like if dpdk device supports RSS+MARK action.
>>>>>
>>>>> Signed-off-by: Ilya Maximets <[email protected]>
>>>>> ---
>>>>>
>>>>> Since RFC:
>>>>> * Fixed sparce build.
>>>>> * Some logs turned from INFO to DBG.
>>>>> * Enabled HW offloading on non-Linux systems
>>>>> (For testing with dummy provider).
>>>>>
>>>>> lib/automake.mk | 2 +-
>>>>> lib/dpdk.c | 2 +
>>>>> lib/netdev-dpdk.c | 24 +++-
>>>>> lib/netdev-dpdk.h | 3 +
>>>>> lib/netdev-dummy.c | 24 +++-
>>>>> lib/netdev-linux.c | 3 -
>>>>> lib/netdev-linux.h | 10 --
>>>>> lib/netdev-offload-provider.h | 99 +++++++++++++++
>>>>> lib/netdev-provider.h | 67 +----------
>>>>> lib/netdev-rte-offloads.c | 40 +++++-
>>>>> lib/netdev-rte-offloads.h | 41 +------
>>>>> lib/netdev-tc-offloads.c | 39 ++++--
>>>>> lib/netdev-tc-offloads.h | 44 -------
>>>>> lib/netdev-vport.c | 6 +-
>>>>> lib/netdev.c | 221 +++++++++++++++++++++++++++-------
>>>>> tests/dpif-netdev.at | 4 +-
>>>>> tests/ofproto-macros.at | 1 -
>>>>> 17 files changed, 398 insertions(+), 232 deletions(-) create mode 100644
>>>>> lib/netdev-offload-provider.h delete mode 100644 lib/netdev-tc-offloads.h
>>>>>
>>>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>>>> 100644
>>>>> --- a/lib/automake.mk
>>>>> +++ b/lib/automake.mk
>>>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>>> lib/netdev-dpdk.h \
>>>>> lib/netdev-dummy.c \
>>>>> lib/netdev-provider.h \
>>>>> + lib/netdev-offload-provider.h \
>>>>
>>>> Please keep alphabetical order (switch between the last 2 lines)
>>>>
>>>>> lib/netdev-rte-offloads.h \
>>>>> lib/netdev-vport.c \
>>>>> lib/netdev-vport.h \
>>>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \
>>>>> lib/netdev-linux.c \
>>>>> lib/netdev-linux.h \
>>>>> lib/netdev-tc-offloads.c \
>>>>> - lib/netdev-tc-offloads.h \
>>>>> lib/netlink-conntrack.c \
>>>>> lib/netlink-conntrack.h \
>>>>> lib/netlink-notifier.c \
>>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>>> index dc6171546..6c6298635 100644
>>>>> --- a/lib/dpdk.c
>>>>> +++ b/lib/dpdk.c
>>>>> @@ -34,6 +34,7 @@
>>>>> #include "dirs.h"
>>>>> #include "fatal-signal.h"
>>>>> #include "netdev-dpdk.h"
>>>>> +#include "netdev-rte-offloads.h"
>>>>> #include "openvswitch/dynamic-string.h"
>>>>> #include "openvswitch/vlog.h"
>>>>> #include "ovs-numa.h"
>>>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>>
>>>>> /* Finally, register the dpdk classes */
>>>>> netdev_dpdk_register();
>>>>> + netdev_dpdk_flow_api_register();
>>>>> return true;
>>>>> }
>>>>>
>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>>> 47153dc60..c06c9ef81 100644
>>>>> --- a/lib/netdev-dpdk.c
>>>>> +++ b/lib/netdev-dpdk.c
>>>>> @@ -4204,6 +4204,27 @@ unlock:
>>>>> return err;
>>>>> }
>>>>>
>>>>> +bool
>>>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) {
>>>>> + struct netdev_dpdk *dev;
>>>>> + bool ret = false;
>>>>> +
>>>>> + if (!is_dpdk_class(netdev->netdev_class)) {
>>>>> + goto out;
>>>>> + }
>>>>> +
>>>>> + dev = netdev_dpdk_cast(netdev);
>>>>> + ovs_mutex_lock(&dev->mutex);
>>>>> + if (dev->type == DPDK_DEV_ETH) {
>>>>> + /* TODO: Check if we able to offload some minimal flow. */
>>>>> + ret = true;
>>>>> + }
>>>>> + ovs_mutex_unlock(&dev->mutex);
>>>>> +out:
>>>>> + return ret;
>>>>> +}
>>>>> +
>>>>> int
>>>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>> struct rte_flow *rte_flow, @@ -4268,8
>>>>> +4289,7 @@
>>>>> netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>>> .get_features = netdev_dpdk_get_features, \
>>>>> .get_status = netdev_dpdk_get_status, \
>>>>> .reconfigure = netdev_dpdk_reconfigure, \
>>>>> - .rxq_recv = netdev_dpdk_rxq_recv, \
>>>>> - DPDK_FLOW_OFFLOAD_API
>>>>> + .rxq_recv = netdev_dpdk_rxq_recv
>>>>>
>>>>> static const struct netdev_class dpdk_class = {
>>>>> .type = "dpdk",
>>>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index
>>>>> 9bbb8d8d6..60631c4f0 100644
>>>>> --- a/lib/netdev-dpdk.h
>>>>> +++ b/lib/netdev-dpdk.h
>>>>> @@ -34,6 +34,9 @@ struct rte_flow_action;
>>>>>
>>>>> void netdev_dpdk_register(void);
>>>>> void free_dpdk_buf(struct dp_packet *);
>>>>> +
>>>>> +bool netdev_dpdk_flow_api_supported(struct netdev *);
>>>>> +
>>>>> int
>>>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>>> struct rte_flow *rte_flow, diff --git
>>>>> a/lib/netdev-dummy.c
>>>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644
>>>>> --- a/lib/netdev-dummy.c
>>>>> +++ b/lib/netdev-dummy.c
>>>>> @@ -24,6 +24,7 @@
>>>>> #include "dp-packet.h"
>>>>> #include "dpif-netdev.h"
>>>>> #include "flow.h"
>>>>> +#include "netdev-offload-provider.h"
>>>>> #include "netdev-provider.h"
>>>>> #include "netdev-vport.h"
>>>>> #include "odp-util.h"
>>>>> @@ -1523,10 +1524,6 @@ exit:
>>>>> return error ? -1 : 0;
>>>>> }
>>>>>
>>>>> -#define DUMMY_FLOW_OFFLOAD_API \
>>>>> - .flow_put = netdev_dummy_flow_put, \
>>>>> - .flow_del = netdev_dummy_flow_del
>>>>> -
>>>>> #define NETDEV_DUMMY_CLASS_COMMON \
>>>>> .run = netdev_dummy_run, \
>>>>> .wait = netdev_dummy_wait, \
>>>>> @@ -1559,8 +1556,7 @@ exit:
>>>>> .rxq_dealloc = netdev_dummy_rxq_dealloc, \
>>>>> .rxq_recv = netdev_dummy_rxq_recv, \
>>>>> .rxq_wait = netdev_dummy_rxq_wait, \
>>>>> - .rxq_drain = netdev_dummy_rxq_drain, \
>>>>> - DUMMY_FLOW_OFFLOAD_API
>>>>> + .rxq_drain = netdev_dummy_rxq_drain
>>>>>
>>>>> static const struct netdev_class dummy_class = {
>>>>> NETDEV_DUMMY_CLASS_COMMON,
>>>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class
>>>>> dummy_pmd_class = {
>>>>> .is_pmd = true,
>>>>> .reconfigure = netdev_dummy_reconfigure };
>>>>> +
>>>>> +static int
>>>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) {
>>>>> + return is_dummy_class(netdev->netdev_class) ? 0 : -1; }
>>>>> +
>>>>> +static const struct netdev_flow_api netdev_dummy_offloads = {
>>>>> + .type = "dummy",
>>>>> + .flow_put = netdev_dummy_flow_put,
>>>>> + .flow_del = netdev_dummy_flow_del,
>>>>> + .init_flow_api = netdev_dummy_offloads_init_flow_api,
>>>>> +};
>>>>> +
>>>>>
>>>>
>>>>> /* Helper functions. */
>>>>>
>>>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level
>>>>> level)
>>>>> netdev_register_provider(&dummy_internal_class);
>>>>> netdev_register_provider(&dummy_pmd_class);
>>>>>
>>>>> + netdev_register_flow_api_provider(&netdev_dummy_offloads);
>>>>> +
>>>>> netdev_vport_tunnel_register();
>>>>> }
>>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
>>>>> f75d73fd3..e4ea94cf9
>>>>> 100644
>>>>> --- a/lib/netdev-linux.c
>>>>> +++ b/lib/netdev-linux.c
>>>>> @@ -55,7 +55,6 @@
>>>>> #include "hash.h"
>>>>> #include "openvswitch/hmap.h"
>>>>> #include "netdev-provider.h"
>>>>> -#include "netdev-tc-offloads.h"
>>>>> #include "netdev-vport.h"
>>>>> #include "netlink-notifier.h"
>>>>> #include "netlink-socket.h"
>>>>> @@ -3321,7 +3320,6 @@ exit:
>>>>>
>>>>> const struct netdev_class netdev_linux_class = {
>>>>> NETDEV_LINUX_CLASS_COMMON,
>>>>> - LINUX_FLOW_OFFLOAD_API,
>>>>> .type = "system",
>>>>
>>>> Please note that before this patch type="system" and type="internal" had
>>> LINUX_FLOW_OFFLOAD_API, but type="tap" did not.
>>>> With this patch type="tap" is going to be tested with init_flow_api(). If
>>>> the test
>>> passes - type="tap" is going to have flow_api which it did not have before.
>>>> Was is verified that type="tap" will fail with init_flow_api()?
>>>
>>> If 'init_flow_api' will be able to create tc qdisc than offloading
>>> is supported. If not - init will fail. It's simple. When linux
>>> will support tc offloading for tap devices we'll have offloading
>>> for free without changing the OVS code.
>>>
>>>>
>>>>> .construct = netdev_linux_construct,
>>>>> .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const
>>>>> struct
>>>>> netdev_class netdev_tap_class = {
>>>>>
>>>>> const struct netdev_class netdev_internal_class = {
>>>>> NETDEV_LINUX_CLASS_COMMON,
>>>>> - LINUX_FLOW_OFFLOAD_API,
>>>>> .type = "internal",
>>>>> .construct = netdev_linux_construct,
>>>>> .get_stats = netdev_internal_get_stats, diff --git
>>>>> a/lib/netdev-linux.h
>>>> ...
>>>>> +static int
>>>>> +netdev_assign_flow_api(struct netdev *netdev) {
>>>>> + struct netdev_registered_flow_api *rfa;
>>>>> +
>>>>> + CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>>> + if (!rfa->flow_api->init_flow_api(netdev)) {
>>>>> + ovs_refcount_ref(&rfa->refcnt);
>>>>> + ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>>> + VLOG_INFO("%s: Assigned flow API '%s'.",
>>>>> + netdev_get_name(netdev), rfa->flow_api->type);
>>>>> + return 0;
>>>>> + }
>>>>
>>>> By this logic the first API which successfully passes init_flow_api()
>>>> becomes the
>>> flow_api for this netdev.
>>>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev
>>> datapath vport API (planned to be added soon).
>>>> Assuming both vport APIs successfully pass init_flow_api(), then the first
>>>> one to
>>> be tested will always be chosen and the second one will never be used.
>>>> Are we sure that the system datapath vport will not always be chosen with
>>>> the
>>> current netdev_tc_init_flow_api() implementation.
>>>> How can we distinguish between a vport created under different datapaths?
>>>> This issue must be resolved for this patch.
>>>
>>> netdev-vport that works in userspace datapath has no backing linux
>>> interface --> get_ifindex failure --> no offloading API configured.
>>> When the planned offloading in dpdk flow api provider will be
>>> implemented it will be chosen.
>>>
>>> Right now we don't have and don't plan to have natdevs that suitable
>>> for more than one flow api provider. If they appear in the future, we
>>> could solve this by choosing most suitable provider instead of first
>>> one. But this is over-engineering for now and even for a near future.
>>>
>> This is more a request for a clarification.
>> If I understand correctly, you expect that there will be one flow api
>> provider for dpdk.
>
> Yes.
>
>> When the api is called we must test the netdev class and then do the needed
>> tests (maybe HW capabilities) so we can tell if we should return true or
>> false.
>
> Correct in general, but you need to test netdev instance, not the netdev
> class.
>
>> For vport we have multiple subtypes (vxlan,gre..etc), so if we only support
>> sub group we can test the sub type by its name.
>
> All the vport subtypes are different netdev classes.
>
>> What happens if we have both system and netdev datapath (corner case, but
>> still)? If the system is called first and there is no backing linux port
>> then we are OK.
>> What will happen if the netdev offload provider will be called first? do you
>> expect that we do the same test (linux backing port?) in the dpdk
>> implementation? Is this type of test being correct for all vport types? That
>> is, if we don't have a backing linux port (ifindex) it means that the port
>> is part of a netdev bridge?
>
> Yes, you could add simple check:
>
> if (netdev_vport_is_vxlan(netdev) && netdev_get_ifindex(netdev) < 0) {
> return 0;
> }
>
> This check should be done inside init_flow_api() (not in
> netdev_dpdk_flow_api_supported()).
> It's allowed to call functions of different netdev provides inside the
> netdev offload provider code.
>
> 'netdev_vport_is_vxlan' should be implemented inside netdev-vport.c.
>
>
> Note that you'll need an additional change to allow offloading for ports
> without ifindex:
>
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index e0731959d..3c474b18c 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -527,10 +527,6 @@ netdev_ports_insert(struct netdev *netdev, const struct
> dpif_class *dpif_class,
> struct port_to_netdev_data *data;
> int ifindex = netdev_get_ifindex(netdev);
>
> - if (ifindex < 0) {
> - return ENODEV;
> - }
> -
> ovs_mutex_lock(&netdev_hmap_mutex);
> if (netdev_ports_lookup(dpif_port->port_no, dpif_class)) {
> ovs_mutex_unlock(&netdev_hmap_mutex);
> @@ -541,11 +537,16 @@ netdev_ports_insert(struct netdev *netdev, const struct
> dpif_class *dpif_class,
> data->netdev = netdev_ref(netdev);
> data->dpif_class = dpif_class;
> dpif_port_clone(&data->dpif_port, dpif_port);
> - data->ifindex = ifindex;
> +
> + if (ifindex >= 0) {
> + data->ifindex = ifindex;
> + hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
> + } else {
> + data->ifindex = -1;
> + }
>
> hmap_insert(&port_to_netdev, &data->portno_node,
> netdev_ports_hash(dpif_port->port_no, dpif_class));
> - hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
> ovs_mutex_unlock(&netdev_hmap_mutex);
>
> netdev_init_flow_api(netdev);
One more hunk:
@@ -582,7 +583,9 @@ netdev_ports_remove(odp_port_t port_no, const struct
dpif_class *dpif_class)
dpif_port_destroy(&data->dpif_port);
netdev_close(data->netdev); /* unref and possibly close */
hmap_remove(&port_to_netdev, &data->portno_node);
- hmap_remove(&ifindex_to_port, &data->ifindex_node);
+ if (data->ifindex >= 0) {
+ hmap_remove(&ifindex_to_port, &data->ifindex_node);
+ }
free(data);
ret = 0;
}
> ---
>
> It's unrelated to current patch, so, I'll send it separately.
>
>> Basically, there is nothing in the init flow parameters that can tell us if
>> the port should be netdev or system (ifindex is implementation related)
>> because we don't have the context of the caller.
>>
>>>>
>>>>> + VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>>> + netdev_get_name(netdev), rfa->flow_api->type);
>>>>> + }
>>>>> + VLOG_INFO("%s: No suitable flow API found.",
>>>>> + netdev_get_name(netdev));
>>>>> +
>>>>> + return -1;
>>>>
>>>> Please return an enumerated positive error.
>>>
>>> It's an internal helper function. Not the public API.
>>> All the functions that visible outside this module returns proper
>>> positive error codes.
>>>
>>>>
>>>>> +}
>>>>> +
>>>>> int
>>>>> netdev_flow_flush(struct netdev *netdev) {
>>>>> - const struct netdev_class *class = netdev->netdev_class;
>>>>
>>>> A few more comments.
>>>> 1. This patch changes code related to OVS offload implementation. OVS
>>>> offload
>>> must be confirmed with this patch before it is accepted.
>>>
>>> Sure. And I'll appreciate any help with testing as I'm limited with
>>> offloading capable hardware.
>>>
>>>>
>>>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under
>>> netdev datapath) port type.
>>>> Please confirm the following understanding of the required steps.
>>>> 2.1 Need to register two new flow_apis for the two ports types.
>>>> 2.2 Need to write the corresponding init_flow_api() to test the relevant
>>>> netdev
>>> instances.
>>>
>>> No, you don't need to implement new flow api provider. I assume that
>>> offloading for these netdevs will be implemented via DPDK. So, you
>>> need to add the functionality to netdev_rte_offload provider and
>>> teach it to return success for these netdevs from the init_flow_api().
>>>
>>>> 2.3 The question how to distinguish between a vport under system datapath
>>> versus vport under netdev datapath is still an open question (would be
>>> happy to
>>> learn that it is already resolved).
>>>
>>> Solved. See above.
>>>
>>>>
>>>> 3. Could you please add inline documentation for the newly added code? It
>>> seems the logic is spread is several places so any comments will be helpful.
>>>
>>> Provider registering API has comments. 'netdev_assign_flow_api' is
>>> quiet simple and mostly documented by the log messages inside.
>>> I'd like to add some comments if you'll point me functions that needs them.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev