Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

2016-10-19 Thread Paul Blakey



On 12/10/2016 23:36, Pravin Shelar wrote:

Sorry for jumping in a bit late. I have couple of high level comments below.

On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ro...@mellanox.com> wrote:

From: Joe Stringer [mailto:j...@ovn.org]  Sent: Thursday, October 06, 2016 5:06 
AM

Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
openvswitch

On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote:

Openvswitch currently configures the kerenel datapath via netlink over an

internal ovs protocol.

This patch series offers a new provider: dpif-netlink-tc that uses the
tc flower protocol to offload ovs rules into HW data-path through netdevices

that e.g represent NIC e-switch ports.

The user can create a bridge with type: datapath_type=dpif-hw-netlink in

order to use this provider.

This provider can be used to pass the tc flower rules to the HW for HW

offloads.

Also introducing in this patch series a policy module in which the
user can program a HW-offload policy. The policy module accept a ovs
flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --

currently the policy is to HW offload all rules.

If the HW_OFFLOAD rule assignment fails the provider will fallback to the

system datapath.

Flower was chosen b/c its sort of natural to state OVS DP rules for
this classifier. However, the code can be extended to support other
classifiers such as U32, eBPF, etc which have HW offloads as well.

The use-case we are currently addressing is the newly introduced SRIOV
switchdev mode in the Linux kernel which is introduced in version 4.8
[1][2]. This series was tested against SRIOV VFs vports representors of the

Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.

Paul and Shahar.

[1]
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
13334e18a74f70c0be58c2eb73af1715325b870
[2]
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
3d94892e27409bb2b48140207c0273b2ba65f61

Thanks for submitting the series. Clearly this is a topic of interest for 
multiple
parties, and it's a good starting point to discuss.

A few of us also discussed this topic today at netdev, so I'll list a few 
points that
we talked about and hopefully others can fill in the bits I miss.

Thanks for summarize our meeting today.
Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing


Positives
* Hardware offload decision is made in a module in userspace
* Layered dpif approach means that the tc-based hardware offload could sit in
front of kernel or userspace datapaths
* Separate dpif means that if you don't enable it, it doesn't affect you. 
Doesn't
litter another dpif implementation with offload logic.


Because of better modularity and usage of existing kernel interfaces
for flow offload, I like this approach.


Drawbacks
* Additional dpif to maintain. Another implementation to change when
modifying dpif interface. Maybe this doesn't change too often, but there has
been some discussions recently about whether the flow_{put,get,del} should be
converted to use internal flow structures rather than OVS netlink
representation. This is one example of potential impact on development.

[RONY] you are right, but I don't think we can add it outher way. I think that 
the approach of use dpif_netlink will saved us a lot of maintenance.

* Fairly limited support for OVS matches and actions. For instance, it is not 
yet
useful for OVN-style pipeline. But that's not a limitation of the design, just 
the
current implementation.

[RONY] sure, we intend to support OVN and connection tracking, we start with 
the simple case.

Other considerations
* Is tc flower filter setup rate and stats dump fast enough? How does it
compare to existing kernel datapath flow setup rate? Multiple threads inserting
at once? How many filters can be dumped per second?
etc.

[RONY] we will test it, and will try to improve the TC if it will be needed


I think there are two part in flow offloading.
1. Time spent to Add the flow to TC.
2. Time spent on pushing the flow to hardware.

It would be interesting to know which one is dominant in this case.


We achieve about 1K rule insertions per second, we will be looking into 
the time distribution.



* Currently for a given flow, it will exist in either the offloaded 
implementation
or the kernel datapath. Statistics are only drawn from one location. This is
consistent with how ofproto-dpif-upcall will insert flows - one flow_put
operation and one flow is inserted into the datapath. Correspondingly there is
one udpif_key which reflects the most recently used stats for this datapath
flow. There may be situations where flows need to be in both datapaths, in
which case there either needs to be either one udpif_key per datapath
representation of the flow, or the dpif must hide the second flo

Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

2016-10-13 Thread Simon Horman
On Wed, Oct 12, 2016 at 01:36:44PM -0700, Pravin Shelar wrote:
> Sorry for jumping in a bit late. I have couple of high level comments below.
> 
> On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim  wrote:
> > From: Joe Stringer [mailto:j...@ovn.org]  Sent: Thursday, October 06, 2016 
> > 5:06 AM

...

> >> Other considerations
> >> * Is tc flower filter setup rate and stats dump fast enough? How does it
> >> compare to existing kernel datapath flow setup rate? Multiple threads 
> >> inserting
> >> at once? How many filters can be dumped per second?
> >> etc.
> > [RONY] we will test it, and will try to improve the TC if it will be needed
> >
> I think there are two part in flow offloading.
> 1. Time spent to Add the flow to TC.
> 2. Time spent on pushing the flow to hardware.
> 
> It would be interesting to know which one is dominant in this case.

I agree that the problem should be quantified but I expect the answer will
depend on the hardware in use. And I entirely expect there are worthwhile
gains to be had on the software side.

...
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

2016-10-12 Thread Pravin Shelar
Sorry for jumping in a bit late. I have couple of high level comments below.

On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ro...@mellanox.com> wrote:
> From: Joe Stringer [mailto:j...@ovn.org]  Sent: Thursday, October 06, 2016 
> 5:06 AM
>>
>> Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
>> openvswitch
>>
>> On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote:
>> > Openvswitch currently configures the kerenel datapath via netlink over an
>> internal ovs protocol.
>> >
>> > This patch series offers a new provider: dpif-netlink-tc that uses the
>> > tc flower protocol to offload ovs rules into HW data-path through 
>> > netdevices
>> that e.g represent NIC e-switch ports.
>> >
>> > The user can create a bridge with type: datapath_type=dpif-hw-netlink in
>> order to use this provider.
>> > This provider can be used to pass the tc flower rules to the HW for HW
>> offloads.
>> >
>> > Also introducing in this patch series a policy module in which the
>> > user can program a HW-offload policy. The policy module accept a ovs
>> > flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --
>> currently the policy is to HW offload all rules.
>> >
>> > If the HW_OFFLOAD rule assignment fails the provider will fallback to the
>> system datapath.
>> >
>> > Flower was chosen b/c its sort of natural to state OVS DP rules for
>> > this classifier. However, the code can be extended to support other
>> > classifiers such as U32, eBPF, etc which have HW offloads as well.
>> >
>> > The use-case we are currently addressing is the newly introduced SRIOV
>> > switchdev mode in the Linux kernel which is introduced in version 4.8
>> > [1][2]. This series was tested against SRIOV VFs vports representors of the
>> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
>> >
>> > Paul and Shahar.
>> >
>> > [1]
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>> > 13334e18a74f70c0be58c2eb73af1715325b870
>> > [2]
>> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
>> > 3d94892e27409bb2b48140207c0273b2ba65f61
>>
>> Thanks for submitting the series. Clearly this is a topic of interest for 
>> multiple
>> parties, and it's a good starting point to discuss.
>>
>> A few of us also discussed this topic today at netdev, so I'll list a few 
>> points that
>> we talked about and hopefully others can fill in the bits I miss.
> Thanks for summarize our meeting today.
> Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
> https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing
>
>>
>> Positives
>> * Hardware offload decision is made in a module in userspace
>> * Layered dpif approach means that the tc-based hardware offload could sit in
>> front of kernel or userspace datapaths
>> * Separate dpif means that if you don't enable it, it doesn't affect you. 
>> Doesn't
>> litter another dpif implementation with offload logic.
>>
Because of better modularity and usage of existing kernel interfaces
for flow offload, I like this approach.

>> Drawbacks
>> * Additional dpif to maintain. Another implementation to change when
>> modifying dpif interface. Maybe this doesn't change too often, but there has
>> been some discussions recently about whether the flow_{put,get,del} should be
>> converted to use internal flow structures rather than OVS netlink
>> representation. This is one example of potential impact on development.
> [RONY] you are right, but I don't think we can add it outher way. I think 
> that the approach of use dpif_netlink will saved us a lot of maintenance.
>> * Fairly limited support for OVS matches and actions. For instance, it is 
>> not yet
>> useful for OVN-style pipeline. But that's not a limitation of the design, 
>> just the
>> current implementation.
> [RONY] sure, we intend to support OVN and connection tracking, we start with 
> the simple case.
>>
>> Other considerations
>> * Is tc flower filter setup rate and stats dump fast enough? How does it
>> compare to existing kernel datapath flow setup rate? Multiple threads 
>> inserting
>> at once? How many filters can be dumped per second?
>> etc.
> [RONY] we will test it, and will try to improve the TC if it will be needed
>
I think there are two part in flow offloading.
1. Time spent to

Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

2016-10-06 Thread Rony Efraim
From: Joe Stringer [mailto:j...@ovn.org]  Sent: Thursday, October 06, 2016 5:06 
AM
> 
> Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for
> openvswitch
> 
> On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote:
> > Openvswitch currently configures the kerenel datapath via netlink over an
> internal ovs protocol.
> >
> > This patch series offers a new provider: dpif-netlink-tc that uses the
> > tc flower protocol to offload ovs rules into HW data-path through netdevices
> that e.g represent NIC e-switch ports.
> >
> > The user can create a bridge with type: datapath_type=dpif-hw-netlink in
> order to use this provider.
> > This provider can be used to pass the tc flower rules to the HW for HW
> offloads.
> >
> > Also introducing in this patch series a policy module in which the
> > user can program a HW-offload policy. The policy module accept a ovs
> > flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY --
> currently the policy is to HW offload all rules.
> >
> > If the HW_OFFLOAD rule assignment fails the provider will fallback to the
> system datapath.
> >
> > Flower was chosen b/c its sort of natural to state OVS DP rules for
> > this classifier. However, the code can be extended to support other
> > classifiers such as U32, eBPF, etc which have HW offloads as well.
> >
> > The use-case we are currently addressing is the newly introduced SRIOV
> > switchdev mode in the Linux kernel which is introduced in version 4.8
> > [1][2]. This series was tested against SRIOV VFs vports representors of the
> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver.
> >
> > Paul and Shahar.
> >
> > [1]
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
> > 13334e18a74f70c0be58c2eb73af1715325b870
> > [2]
> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5
> > 3d94892e27409bb2b48140207c0273b2ba65f61
> 
> Thanks for submitting the series. Clearly this is a topic of interest for 
> multiple
> parties, and it's a good starting point to discuss.
> 
> A few of us also discussed this topic today at netdev, so I'll list a few 
> points that
> we talked about and hopefully others can fill in the bits I miss.
Thanks for summarize our meeting today.
Attached a link to the pdf pic that show the idea (picture <= 1,000 words)
https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing

> 
> Positives
> * Hardware offload decision is made in a module in userspace
> * Layered dpif approach means that the tc-based hardware offload could sit in
> front of kernel or userspace datapaths
> * Separate dpif means that if you don't enable it, it doesn't affect you. 
> Doesn't
> litter another dpif implementation with offload logic.
> 
> Drawbacks
> * Additional dpif to maintain. Another implementation to change when
> modifying dpif interface. Maybe this doesn't change too often, but there has
> been some discussions recently about whether the flow_{put,get,del} should be
> converted to use internal flow structures rather than OVS netlink
> representation. This is one example of potential impact on development.
[RONY] you are right, but I don't think we can add it outher way. I think that 
the approach of use dpif_netlink will saved us a lot of maintenance.
> * Fairly limited support for OVS matches and actions. For instance, it is not 
> yet
> useful for OVN-style pipeline. But that's not a limitation of the design, 
> just the
> current implementation.
[RONY] sure, we intend to support OVN and connection tracking, we start with 
the simple case. 
> 
> Other considerations
> * Is tc flower filter setup rate and stats dump fast enough? How does it
> compare to existing kernel datapath flow setup rate? Multiple threads 
> inserting
> at once? How many filters can be dumped per second?
> etc.
[RONY] we will test it, and will try to improve the TC if it will be needed

> * Currently for a given flow, it will exist in either the offloaded 
> implementation
> or the kernel datapath. Statistics are only drawn from one location. This is
> consistent with how ofproto-dpif-upcall will insert flows - one flow_put
> operation and one flow is inserted into the datapath. Correspondingly there is
> one udpif_key which reflects the most recently used stats for this datapath
> flow. There may be situations where flows need to be in both datapaths, in
> which case there either needs to be either one udpif_key per datapath
> representation of the flow, or the dpif must hide the second flow and 
> aggregate
> stats.
[RONY] as you wrote the d

Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch

2016-10-05 Thread Joe Stringer
On 27 September 2016 at 21:45, Paul Blakey  wrote:
> Openvswitch currently configures the kerenel datapath via netlink over an 
> internal ovs protocol.
>
> This patch series offers a new provider: dpif-netlink-tc that uses the tc 
> flower protocol
> to offload ovs rules into HW data-path through netdevices that e.g represent 
> NIC e-switch ports.
>
> The user can create a bridge with type: datapath_type=dpif-hw-netlink in 
> order to use this provider.
> This provider can be used to pass the tc flower rules to the HW for HW 
> offloads.
>
> Also introducing in this patch series a policy module in which the user can 
> program a HW-offload
> policy. The policy module accept a ovs flow and returns a policy decision for 
> each
> flow:NO_OFFLOAD or HW_ONLY -- currently the policy is to HW offload all rules.
>
> If the HW_OFFLOAD rule assignment fails the provider will fallback to the 
> system datapath.
>
> Flower was chosen b/c its sort of natural to state OVS DP rules for this 
> classifier. However,
> the code can be extended to support other classifiers such as U32, eBPF, etc 
> which have
> HW offloads as well.
>
> The use-case we are currently addressing is the newly introduced SRIOV 
> switchdev mode in the
> Linux kernel which is introduced in version 4.8 [1][2]. This series was 
> tested against SRIOV VFs
> vports representors of the Mellanox 100G ConnectX-4 series exposed by the 
> mlx5 kernel driver.
>
> Paul and Shahar.
>
> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=513334e18a74f70c0be58c2eb73af1715325b870
> [2] 
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=53d94892e27409bb2b48140207c0273b2ba65f61

Thanks for submitting the series. Clearly this is a topic of interest
for multiple parties, and it's a good starting point to discuss.

A few of us also discussed this topic today at netdev, so I'll list a
few points that we talked about and hopefully others can fill in the
bits I miss.

Positives
* Hardware offload decision is made in a module in userspace
* Layered dpif approach means that the tc-based hardware offload could
sit in front of kernel or userspace datapaths
* Separate dpif means that if you don't enable it, it doesn't affect
you. Doesn't litter another dpif implementation with offload logic.

Drawbacks
* Additional dpif to maintain. Another implementation to change when
modifying dpif interface. Maybe this doesn't change too often, but
there has been some discussions recently about whether the
flow_{put,get,del} should be converted to use internal flow structures
rather than OVS netlink representation. This is one example of
potential impact on development.
* Fairly limited support for OVS matches and actions. For instance, it
is not yet useful for OVN-style pipeline. But that's not a limitation
of the design, just the current implementation.

Other considerations
* Is tc flower filter setup rate and stats dump fast enough? How does
it compare to existing kernel datapath flow setup rate? Multiple
threads inserting at once? How many filters can be dumped per second?
etc.
* Currently for a given flow, it will exist in either the offloaded
implementation or the kernel datapath. Statistics are only drawn from
one location. This is consistent with how ofproto-dpif-upcall will
insert flows - one flow_put operation and one flow is inserted into
the datapath. Correspondingly there is one udpif_key which reflects
the most recently used stats for this datapath flow. There may be
situations where flows need to be in both datapaths, in which case
there either needs to be either one udpif_key per datapath
representation of the flow, or the dpif must hide the second flow and
aggregate stats.

Extra, not previously discussed
* Testing - we may want a mode where tc flower is used in software
mode, to test the tc netlink interface. It would be good to see
extension of kernel module testsuite to at least test some basics of
the interface, perhaps also the flower behaviour (though that may be
out of scope of the testsuite in the OVS tree).

Thanks,
Joe
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev