On 25.04.2019 0:58, Roni Bar Yanai wrote:
> Hi Ilya,
>
> Please see comment/clarification inline.
> Thanks,
> Roni
>
>> -----Original Message-----
>> From: Ilya Maximets <[email protected]>
>> Sent: Wednesday, April 24, 2019 6:12 PM
>> To: Ophir Munk <[email protected]>; [email protected]
>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>;
>> Kevin
>> Traynor <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>> <[email protected]>; Simon Horman <[email protected]>; Olga
>> Shern <[email protected]>; Asaf Penso <[email protected]>; Oz Shlomo
>> <[email protected]>; Paul Blakey <[email protected]>
>> Subject: Re: [PATCH] netdev: Dynamic per-port Flow API.
>>
>> On 24.04.2019 17:26, Ophir Munk wrote:
>>> Hi Ilya,
>>> Please find comments inline and at the end of this mail.
>>
>> Hi. Thanks for review.
>> Comments inline.
>>
>> Best regards, Ilya Maximets.
>>
>>>
>>> Regards,
>>> Ophir
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets <[email protected]>
>>>> Sent: Tuesday, April 23, 2019 7:19 PM
>>>> To: [email protected]
>>>> Cc: Ian Stokes <[email protected]>; Flavio Leitner <[email protected]>;
>>>> Ophir Munk <[email protected]>; Kevin Traynor
>>>> <[email protected]>; Roni Bar Yanai <[email protected]>; Finn
>>>> Christensen <[email protected]>; Ben Pfaff <[email protected]>; Roi Dayan
>>>> <[email protected]>; Simon Horman <[email protected]>;
>>>> Ilya Maximets <[email protected]>
>>>> Subject: [PATCH] netdev: Dynamic per-port Flow API.
>>>>
>>>> Current issues with Flow API:
>>>>
>>>> * OVS calls offloading functions regardless of successful
>>>> flow API initialization. (ex. on init_flow_api failure)
>>>> * Static initilaization of Flow API for a netdev_class forbids
>>>> having different offloading types for different instances
>>>> of netdev with the same netdev_class. (ex. different vports in
>>>> 'system' and 'netdev' datapaths at the same time)
>>>>
>>>> Solution:
>>>>
>>>> * Move Flow API from the netdev_class to netdev instance.
>>>> * Make Flow API dynamic, i.e. probe the APIs and choose the
>>>> suitable one.
>>>>
>>>
>>> I suggest distinguishing between initialization and probing.
>>> The probing you mention is implemented by testing device initialization:
>> init_flow_api().
>>> I suggest adding a new probe() API just for the sake of probing. It will
>>> check the
>> netdev struct.
>>> I suggest leaving the init_flow_api() API for HW/drivers initialization.
>>> The HW/driver may fail at initialization, but it does not mean we need to
>>> probe
>> for a new API in that case.
>>
>> I also thought about separate 'probe()' api, but it seems that probing
>> will be equal to 'init()' in most cases. Checking the 'netdev struct'
>> fro probing is a bad solution as instances of same netdev class could
>> require different offloading apis. Probing a new api on init failure
>> could be useful exactly for this case. Anyway we'll be able to improve
>> the logic of assigning in the future if it will be needed. This is
>> the single point in code where we have information about all the available
>> offloading providers.
>>
>>>
>>>> Side effects:
>>>>
>>>> * Flow API providers localized as possible in their modules.
>>>> * Now we have an ability to make runtime checks. For example,
>>>> we could check if particular device supports features we
>>>> need, like if dpdk device supports RSS+MARK action.
>>>>
>>>> Signed-off-by: Ilya Maximets <[email protected]>
>>>> ---
>>>>
>>>> Since RFC:
>>>> * Fixed sparce build.
>>>> * Some logs turned from INFO to DBG.
>>>> * Enabled HW offloading on non-Linux systems
>>>> (For testing with dummy provider).
>>>>
>>>> lib/automake.mk | 2 +-
>>>> lib/dpdk.c | 2 +
>>>> lib/netdev-dpdk.c | 24 +++-
>>>> lib/netdev-dpdk.h | 3 +
>>>> lib/netdev-dummy.c | 24 +++-
>>>> lib/netdev-linux.c | 3 -
>>>> lib/netdev-linux.h | 10 --
>>>> lib/netdev-offload-provider.h | 99 +++++++++++++++
>>>> lib/netdev-provider.h | 67 +----------
>>>> lib/netdev-rte-offloads.c | 40 +++++-
>>>> lib/netdev-rte-offloads.h | 41 +------
>>>> lib/netdev-tc-offloads.c | 39 ++++--
>>>> lib/netdev-tc-offloads.h | 44 -------
>>>> lib/netdev-vport.c | 6 +-
>>>> lib/netdev.c | 221 +++++++++++++++++++++++++++-------
>>>> tests/dpif-netdev.at | 4 +-
>>>> tests/ofproto-macros.at | 1 -
>>>> 17 files changed, 398 insertions(+), 232 deletions(-) create mode 100644
>>>> lib/netdev-offload-provider.h delete mode 100644 lib/netdev-tc-offloads.h
>>>>
>>>> diff --git a/lib/automake.mk b/lib/automake.mk index cc5dccf39..b9f3b69e7
>>>> 100644
>>>> --- a/lib/automake.mk
>>>> +++ b/lib/automake.mk
>>>> @@ -138,6 +138,7 @@ lib_libopenvswitch_la_SOURCES = \
>>>> lib/netdev-dpdk.h \
>>>> lib/netdev-dummy.c \
>>>> lib/netdev-provider.h \
>>>> + lib/netdev-offload-provider.h \
>>>
>>> Please keep alphabetical order (switch between the last 2 lines)
>>>
>>>> lib/netdev-rte-offloads.h \
>>>> lib/netdev-vport.c \
>>>> lib/netdev-vport.h \
>>>> @@ -393,7 +394,6 @@ lib_libopenvswitch_la_SOURCES += \
>>>> lib/netdev-linux.c \
>>>> lib/netdev-linux.h \
>>>> lib/netdev-tc-offloads.c \
>>>> - lib/netdev-tc-offloads.h \
>>>> lib/netlink-conntrack.c \
>>>> lib/netlink-conntrack.h \
>>>> lib/netlink-notifier.c \
>>>> diff --git a/lib/dpdk.c b/lib/dpdk.c
>>>> index dc6171546..6c6298635 100644
>>>> --- a/lib/dpdk.c
>>>> +++ b/lib/dpdk.c
>>>> @@ -34,6 +34,7 @@
>>>> #include "dirs.h"
>>>> #include "fatal-signal.h"
>>>> #include "netdev-dpdk.h"
>>>> +#include "netdev-rte-offloads.h"
>>>> #include "openvswitch/dynamic-string.h"
>>>> #include "openvswitch/vlog.h"
>>>> #include "ovs-numa.h"
>>>> @@ -442,6 +443,7 @@ dpdk_init__(const struct smap *ovs_other_config)
>>>>
>>>> /* Finally, register the dpdk classes */
>>>> netdev_dpdk_register();
>>>> + netdev_dpdk_flow_api_register();
>>>> return true;
>>>> }
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index
>>>> 47153dc60..c06c9ef81 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -4204,6 +4204,27 @@ unlock:
>>>> return err;
>>>> }
>>>>
>>>> +bool
>>>> +netdev_dpdk_flow_api_supported(struct netdev *netdev) {
>>>> + struct netdev_dpdk *dev;
>>>> + bool ret = false;
>>>> +
>>>> + if (!is_dpdk_class(netdev->netdev_class)) {
>>>> + goto out;
>>>> + }
>>>> +
>>>> + dev = netdev_dpdk_cast(netdev);
>>>> + ovs_mutex_lock(&dev->mutex);
>>>> + if (dev->type == DPDK_DEV_ETH) {
>>>> + /* TODO: Check if we able to offload some minimal flow. */
>>>> + ret = true;
>>>> + }
>>>> + ovs_mutex_unlock(&dev->mutex);
>>>> +out:
>>>> + return ret;
>>>> +}
>>>> +
>>>> int
>>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>> struct rte_flow *rte_flow, @@ -4268,8
>>>> +4289,7 @@
>>>> netdev_dpdk_rte_flow_create(struct netdev *netdev,
>>>> .get_features = netdev_dpdk_get_features, \
>>>> .get_status = netdev_dpdk_get_status, \
>>>> .reconfigure = netdev_dpdk_reconfigure, \
>>>> - .rxq_recv = netdev_dpdk_rxq_recv, \
>>>> - DPDK_FLOW_OFFLOAD_API
>>>> + .rxq_recv = netdev_dpdk_rxq_recv
>>>>
>>>> static const struct netdev_class dpdk_class = {
>>>> .type = "dpdk",
>>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h index
>>>> 9bbb8d8d6..60631c4f0 100644
>>>> --- a/lib/netdev-dpdk.h
>>>> +++ b/lib/netdev-dpdk.h
>>>> @@ -34,6 +34,9 @@ struct rte_flow_action;
>>>>
>>>> void netdev_dpdk_register(void);
>>>> void free_dpdk_buf(struct dp_packet *);
>>>> +
>>>> +bool netdev_dpdk_flow_api_supported(struct netdev *);
>>>> +
>>>> int
>>>> netdev_dpdk_rte_flow_destroy(struct netdev *netdev,
>>>> struct rte_flow *rte_flow, diff --git
>>>> a/lib/netdev-dummy.c
>>>> b/lib/netdev-dummy.c index 3f90ffa09..2e2e0c2ab 100644
>>>> --- a/lib/netdev-dummy.c
>>>> +++ b/lib/netdev-dummy.c
>>>> @@ -24,6 +24,7 @@
>>>> #include "dp-packet.h"
>>>> #include "dpif-netdev.h"
>>>> #include "flow.h"
>>>> +#include "netdev-offload-provider.h"
>>>> #include "netdev-provider.h"
>>>> #include "netdev-vport.h"
>>>> #include "odp-util.h"
>>>> @@ -1523,10 +1524,6 @@ exit:
>>>> return error ? -1 : 0;
>>>> }
>>>>
>>>> -#define DUMMY_FLOW_OFFLOAD_API \
>>>> - .flow_put = netdev_dummy_flow_put, \
>>>> - .flow_del = netdev_dummy_flow_del
>>>> -
>>>> #define NETDEV_DUMMY_CLASS_COMMON \
>>>> .run = netdev_dummy_run, \
>>>> .wait = netdev_dummy_wait, \
>>>> @@ -1559,8 +1556,7 @@ exit:
>>>> .rxq_dealloc = netdev_dummy_rxq_dealloc, \
>>>> .rxq_recv = netdev_dummy_rxq_recv, \
>>>> .rxq_wait = netdev_dummy_rxq_wait, \
>>>> - .rxq_drain = netdev_dummy_rxq_drain, \
>>>> - DUMMY_FLOW_OFFLOAD_API
>>>> + .rxq_drain = netdev_dummy_rxq_drain
>>>>
>>>> static const struct netdev_class dummy_class = {
>>>> NETDEV_DUMMY_CLASS_COMMON,
>>>> @@ -1578,6 +1574,20 @@ static const struct netdev_class
>>>> dummy_pmd_class = {
>>>> .is_pmd = true,
>>>> .reconfigure = netdev_dummy_reconfigure };
>>>> +
>>>> +static int
>>>> +netdev_dummy_offloads_init_flow_api(struct netdev *netdev) {
>>>> + return is_dummy_class(netdev->netdev_class) ? 0 : -1; }
>>>> +
>>>> +static const struct netdev_flow_api netdev_dummy_offloads = {
>>>> + .type = "dummy",
>>>> + .flow_put = netdev_dummy_flow_put,
>>>> + .flow_del = netdev_dummy_flow_del,
>>>> + .init_flow_api = netdev_dummy_offloads_init_flow_api,
>>>> +};
>>>> +
>>>>
>>>
>>>> /* Helper functions. */
>>>>
>>>> @@ -2024,5 +2034,7 @@ netdev_dummy_register(enum dummy_level
>>>> level)
>>>> netdev_register_provider(&dummy_internal_class);
>>>> netdev_register_provider(&dummy_pmd_class);
>>>>
>>>> + netdev_register_flow_api_provider(&netdev_dummy_offloads);
>>>> +
>>>> netdev_vport_tunnel_register();
>>>> }
>>>> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index
>>>> f75d73fd3..e4ea94cf9
>>>> 100644
>>>> --- a/lib/netdev-linux.c
>>>> +++ b/lib/netdev-linux.c
>>>> @@ -55,7 +55,6 @@
>>>> #include "hash.h"
>>>> #include "openvswitch/hmap.h"
>>>> #include "netdev-provider.h"
>>>> -#include "netdev-tc-offloads.h"
>>>> #include "netdev-vport.h"
>>>> #include "netlink-notifier.h"
>>>> #include "netlink-socket.h"
>>>> @@ -3321,7 +3320,6 @@ exit:
>>>>
>>>> const struct netdev_class netdev_linux_class = {
>>>> NETDEV_LINUX_CLASS_COMMON,
>>>> - LINUX_FLOW_OFFLOAD_API,
>>>> .type = "system",
>>>
>>> Please note that before this patch type="system" and type="internal" had
>> LINUX_FLOW_OFFLOAD_API, but type="tap" did not.
>>> With this patch type="tap" is going to be tested with init_flow_api(). If
>>> the test
>> passes - type="tap" is going to have flow_api which it did not have before.
>>> Was is verified that type="tap" will fail with init_flow_api()?
>>
>> If 'init_flow_api' will be able to create tc qdisc than offloading
>> is supported. If not - init will fail. It's simple. When linux
>> will support tc offloading for tap devices we'll have offloading
>> for free without changing the OVS code.
>>
>>>
>>>> .construct = netdev_linux_construct,
>>>> .get_stats = netdev_linux_get_stats, @@ -3341,7 +3339,6 @@ const
>>>> struct
>>>> netdev_class netdev_tap_class = {
>>>>
>>>> const struct netdev_class netdev_internal_class = {
>>>> NETDEV_LINUX_CLASS_COMMON,
>>>> - LINUX_FLOW_OFFLOAD_API,
>>>> .type = "internal",
>>>> .construct = netdev_linux_construct,
>>>> .get_stats = netdev_internal_get_stats, diff --git
>>>> a/lib/netdev-linux.h
>>> ...
>>>> +static int
>>>> +netdev_assign_flow_api(struct netdev *netdev) {
>>>> + struct netdev_registered_flow_api *rfa;
>>>> +
>>>> + CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
>>>> + if (!rfa->flow_api->init_flow_api(netdev)) {
>>>> + ovs_refcount_ref(&rfa->refcnt);
>>>> + ovsrcu_set(&netdev->flow_api, rfa->flow_api);
>>>> + VLOG_INFO("%s: Assigned flow API '%s'.",
>>>> + netdev_get_name(netdev), rfa->flow_api->type);
>>>> + return 0;
>>>> + }
>>>
>>> By this logic the first API which successfully passes init_flow_api()
>>> becomes the
>> flow_api for this netdev.
>>> Assuming there are 2 vport APIs: 1. system datapath vport API 2. netdev
>> datapath vport API (planned to be added soon).
>>> Assuming both vport APIs successfully pass init_flow_api(), then the first
>>> one to
>> be tested will always be chosen and the second one will never be used.
>>> Are we sure that the system datapath vport will not always be chosen with
>>> the
>> current netdev_tc_init_flow_api() implementation.
>>> How can we distinguish between a vport created under different datapaths?
>>> This issue must be resolved for this patch.
>>
>> netdev-vport that works in userspace datapath has no backing linux
>> interface --> get_ifindex failure --> no offloading API configured.
>> When the planned offloading in dpdk flow api provider will be
>> implemented it will be chosen.
>>
>> Right now we don't have and don't plan to have natdevs that suitable
>> for more than one flow api provider. If they appear in the future, we
>> could solve this by choosing most suitable provider instead of first
>> one. But this is over-engineering for now and even for a near future.
>>
> This is more a request for a clarification.
> If I understand correctly, you expect that there will be one flow api
> provider for dpdk.
Yes.
> When the api is called we must test the netdev class and then do the needed
> tests (maybe HW capabilities) so we can tell if we should return true or
> false.
Correct in general, but you need to test netdev instance, not the netdev class.
> For vport we have multiple subtypes (vxlan,gre..etc), so if we only support
> sub group we can test the sub type by its name.
All the vport subtypes are different netdev classes.
> What happens if we have both system and netdev datapath (corner case, but
> still)? If the system is called first and there is no backing linux port then
> we are OK.
> What will happen if the netdev offload provider will be called first? do you
> expect that we do the same test (linux backing port?) in the dpdk
> implementation? Is this type of test being correct for all vport types? That
> is, if we don't have a backing linux port (ifindex) it means that the port is
> part of a netdev bridge?
Yes, you could add simple check:
if (netdev_vport_is_vxlan(netdev) && netdev_get_ifindex(netdev) < 0) {
return 0;
}
This check should be done inside init_flow_api() (not in
netdev_dpdk_flow_api_supported()).
It's allowed to call functions of different netdev provides inside the
netdev offload provider code.
'netdev_vport_is_vxlan' should be implemented inside netdev-vport.c.
Note that you'll need an additional change to allow offloading for ports
without ifindex:
diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index e0731959d..3c474b18c 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -527,10 +527,6 @@ netdev_ports_insert(struct netdev *netdev, const struct
dpif_class *dpif_class,
struct port_to_netdev_data *data;
int ifindex = netdev_get_ifindex(netdev);
- if (ifindex < 0) {
- return ENODEV;
- }
-
ovs_mutex_lock(&netdev_hmap_mutex);
if (netdev_ports_lookup(dpif_port->port_no, dpif_class)) {
ovs_mutex_unlock(&netdev_hmap_mutex);
@@ -541,11 +537,16 @@ netdev_ports_insert(struct netdev *netdev, const struct
dpif_class *dpif_class,
data->netdev = netdev_ref(netdev);
data->dpif_class = dpif_class;
dpif_port_clone(&data->dpif_port, dpif_port);
- data->ifindex = ifindex;
+
+ if (ifindex >= 0) {
+ data->ifindex = ifindex;
+ hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
+ } else {
+ data->ifindex = -1;
+ }
hmap_insert(&port_to_netdev, &data->portno_node,
netdev_ports_hash(dpif_port->port_no, dpif_class));
- hmap_insert(&ifindex_to_port, &data->ifindex_node, ifindex);
ovs_mutex_unlock(&netdev_hmap_mutex);
netdev_init_flow_api(netdev);
---
It's unrelated to current patch, so, I'll send it separately.
> Basically, there is nothing in the init flow parameters that can tell us if
> the port should be netdev or system (ifindex is implementation related)
> because we don't have the context of the caller.
>
>>>
>>>> + VLOG_DBG("%s: flow API '%s' is not suitable.",
>>>> + netdev_get_name(netdev), rfa->flow_api->type);
>>>> + }
>>>> + VLOG_INFO("%s: No suitable flow API found.",
>>>> + netdev_get_name(netdev));
>>>> +
>>>> + return -1;
>>>
>>> Please return an enumerated positive error.
>>
>> It's an internal helper function. Not the public API.
>> All the functions that visible outside this module returns proper
>> positive error codes.
>>
>>>
>>>> +}
>>>> +
>>>> int
>>>> netdev_flow_flush(struct netdev *netdev) {
>>>> - const struct netdev_class *class = netdev->netdev_class;
>>>
>>> A few more comments.
>>> 1. This patch changes code related to OVS offload implementation. OVS
>>> offload
>> must be confirmed with this patch before it is accepted.
>>
>> Sure. And I'll appreciate any help with testing as I'm limited with
>> offloading capable hardware.
>>
>>>
>>> 2. We may suggest to offload "dpdkvhostclient" port type and "vxlan" (under
>> netdev datapath) port type.
>>> Please confirm the following understanding of the required steps.
>>> 2.1 Need to register two new flow_apis for the two ports types.
>>> 2.2 Need to write the corresponding init_flow_api() to test the relevant
>>> netdev
>> instances.
>>
>> No, you don't need to implement new flow api provider. I assume that
>> offloading for these netdevs will be implemented via DPDK. So, you
>> need to add the functionality to netdev_rte_offload provider and
>> teach it to return success for these netdevs from the init_flow_api().
>>
>>> 2.3 The question how to distinguish between a vport under system datapath
>> versus vport under netdev datapath is still an open question (would be happy
>> to
>> learn that it is already resolved).
>>
>> Solved. See above.
>>
>>>
>>> 3. Could you please add inline documentation for the newly added code? It
>> seems the logic is spread is several places so any comments will be helpful.
>>
>> Provider registering API has comments. 'netdev_assign_flow_api' is
>> quiet simple and mostly documented by the log messages inside.
>> I'd like to add some comments if you'll point me functions that needs them.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev