Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch
On 12/10/2016 23:36, Pravin Shelar wrote: Sorry for jumping in a bit late. I have couple of high level comments below. On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ro...@mellanox.com> wrote: From: Joe Stringer [mailto:j...@ovn.org] Sent: Thursday, October 06, 2016 5:06 AM Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote: Openvswitch currently configures the kerenel datapath via netlink over an internal ovs protocol. This patch series offers a new provider: dpif-netlink-tc that uses the tc flower protocol to offload ovs rules into HW data-path through netdevices that e.g represent NIC e-switch ports. The user can create a bridge with type: datapath_type=dpif-hw-netlink in order to use this provider. This provider can be used to pass the tc flower rules to the HW for HW offloads. Also introducing in this patch series a policy module in which the user can program a HW-offload policy. The policy module accept a ovs flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY -- currently the policy is to HW offload all rules. If the HW_OFFLOAD rule assignment fails the provider will fallback to the system datapath. Flower was chosen b/c its sort of natural to state OVS DP rules for this classifier. However, the code can be extended to support other classifiers such as U32, eBPF, etc which have HW offloads as well. The use-case we are currently addressing is the newly introduced SRIOV switchdev mode in the Linux kernel which is introduced in version 4.8 [1][2]. This series was tested against SRIOV VFs vports representors of the Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver. Paul and Shahar. [1] http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 13334e18a74f70c0be58c2eb73af1715325b870 [2] http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 3d94892e27409bb2b48140207c0273b2ba65f61 Thanks for submitting the series. Clearly this is a topic of interest for multiple parties, and it's a good starting point to discuss. A few of us also discussed this topic today at netdev, so I'll list a few points that we talked about and hopefully others can fill in the bits I miss. Thanks for summarize our meeting today. Attached a link to the pdf pic that show the idea (picture <= 1,000 words) https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing Positives * Hardware offload decision is made in a module in userspace * Layered dpif approach means that the tc-based hardware offload could sit in front of kernel or userspace datapaths * Separate dpif means that if you don't enable it, it doesn't affect you. Doesn't litter another dpif implementation with offload logic. Because of better modularity and usage of existing kernel interfaces for flow offload, I like this approach. Drawbacks * Additional dpif to maintain. Another implementation to change when modifying dpif interface. Maybe this doesn't change too often, but there has been some discussions recently about whether the flow_{put,get,del} should be converted to use internal flow structures rather than OVS netlink representation. This is one example of potential impact on development. [RONY] you are right, but I don't think we can add it outher way. I think that the approach of use dpif_netlink will saved us a lot of maintenance. * Fairly limited support for OVS matches and actions. For instance, it is not yet useful for OVN-style pipeline. But that's not a limitation of the design, just the current implementation. [RONY] sure, we intend to support OVN and connection tracking, we start with the simple case. Other considerations * Is tc flower filter setup rate and stats dump fast enough? How does it compare to existing kernel datapath flow setup rate? Multiple threads inserting at once? How many filters can be dumped per second? etc. [RONY] we will test it, and will try to improve the TC if it will be needed I think there are two part in flow offloading. 1. Time spent to Add the flow to TC. 2. Time spent on pushing the flow to hardware. It would be interesting to know which one is dominant in this case. We achieve about 1K rule insertions per second, we will be looking into the time distribution. * Currently for a given flow, it will exist in either the offloaded implementation or the kernel datapath. Statistics are only drawn from one location. This is consistent with how ofproto-dpif-upcall will insert flows - one flow_put operation and one flow is inserted into the datapath. Correspondingly there is one udpif_key which reflects the most recently used stats for this datapath flow. There may be situations where flows need to be in both datapaths, in which case there either needs to be either one udpif_key per datapath representation of the flow, or the dpif must hide the second flo
Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch
On Wed, Oct 12, 2016 at 01:36:44PM -0700, Pravin Shelar wrote: > Sorry for jumping in a bit late. I have couple of high level comments below. > > On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraimwrote: > > From: Joe Stringer [mailto:j...@ovn.org] Sent: Thursday, October 06, 2016 > > 5:06 AM ... > >> Other considerations > >> * Is tc flower filter setup rate and stats dump fast enough? How does it > >> compare to existing kernel datapath flow setup rate? Multiple threads > >> inserting > >> at once? How many filters can be dumped per second? > >> etc. > > [RONY] we will test it, and will try to improve the TC if it will be needed > > > I think there are two part in flow offloading. > 1. Time spent to Add the flow to TC. > 2. Time spent on pushing the flow to hardware. > > It would be interesting to know which one is dominant in this case. I agree that the problem should be quantified but I expect the answer will depend on the hardware in use. And I entirely expect there are worthwhile gains to be had on the software side. ... ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch
Sorry for jumping in a bit late. I have couple of high level comments below. On Thu, Oct 6, 2016 at 10:10 AM, Rony Efraim <ro...@mellanox.com> wrote: > From: Joe Stringer [mailto:j...@ovn.org] Sent: Thursday, October 06, 2016 > 5:06 AM >> >> Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for >> openvswitch >> >> On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote: >> > Openvswitch currently configures the kerenel datapath via netlink over an >> internal ovs protocol. >> > >> > This patch series offers a new provider: dpif-netlink-tc that uses the >> > tc flower protocol to offload ovs rules into HW data-path through >> > netdevices >> that e.g represent NIC e-switch ports. >> > >> > The user can create a bridge with type: datapath_type=dpif-hw-netlink in >> order to use this provider. >> > This provider can be used to pass the tc flower rules to the HW for HW >> offloads. >> > >> > Also introducing in this patch series a policy module in which the >> > user can program a HW-offload policy. The policy module accept a ovs >> > flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY -- >> currently the policy is to HW offload all rules. >> > >> > If the HW_OFFLOAD rule assignment fails the provider will fallback to the >> system datapath. >> > >> > Flower was chosen b/c its sort of natural to state OVS DP rules for >> > this classifier. However, the code can be extended to support other >> > classifiers such as U32, eBPF, etc which have HW offloads as well. >> > >> > The use-case we are currently addressing is the newly introduced SRIOV >> > switchdev mode in the Linux kernel which is introduced in version 4.8 >> > [1][2]. This series was tested against SRIOV VFs vports representors of the >> Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver. >> > >> > Paul and Shahar. >> > >> > [1] >> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 >> > 13334e18a74f70c0be58c2eb73af1715325b870 >> > [2] >> > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 >> > 3d94892e27409bb2b48140207c0273b2ba65f61 >> >> Thanks for submitting the series. Clearly this is a topic of interest for >> multiple >> parties, and it's a good starting point to discuss. >> >> A few of us also discussed this topic today at netdev, so I'll list a few >> points that >> we talked about and hopefully others can fill in the bits I miss. > Thanks for summarize our meeting today. > Attached a link to the pdf pic that show the idea (picture <= 1,000 words) > https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing > >> >> Positives >> * Hardware offload decision is made in a module in userspace >> * Layered dpif approach means that the tc-based hardware offload could sit in >> front of kernel or userspace datapaths >> * Separate dpif means that if you don't enable it, it doesn't affect you. >> Doesn't >> litter another dpif implementation with offload logic. >> Because of better modularity and usage of existing kernel interfaces for flow offload, I like this approach. >> Drawbacks >> * Additional dpif to maintain. Another implementation to change when >> modifying dpif interface. Maybe this doesn't change too often, but there has >> been some discussions recently about whether the flow_{put,get,del} should be >> converted to use internal flow structures rather than OVS netlink >> representation. This is one example of potential impact on development. > [RONY] you are right, but I don't think we can add it outher way. I think > that the approach of use dpif_netlink will saved us a lot of maintenance. >> * Fairly limited support for OVS matches and actions. For instance, it is >> not yet >> useful for OVN-style pipeline. But that's not a limitation of the design, >> just the >> current implementation. > [RONY] sure, we intend to support OVN and connection tracking, we start with > the simple case. >> >> Other considerations >> * Is tc flower filter setup rate and stats dump fast enough? How does it >> compare to existing kernel datapath flow setup rate? Multiple threads >> inserting >> at once? How many filters can be dumped per second? >> etc. > [RONY] we will test it, and will try to improve the TC if it will be needed > I think there are two part in flow offloading. 1. Time spent to
Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch
From: Joe Stringer [mailto:j...@ovn.org] Sent: Thursday, October 06, 2016 5:06 AM > > Subject: Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for > openvswitch > > On 27 September 2016 at 21:45, Paul Blakey <pa...@mellanox.com> wrote: > > Openvswitch currently configures the kerenel datapath via netlink over an > internal ovs protocol. > > > > This patch series offers a new provider: dpif-netlink-tc that uses the > > tc flower protocol to offload ovs rules into HW data-path through netdevices > that e.g represent NIC e-switch ports. > > > > The user can create a bridge with type: datapath_type=dpif-hw-netlink in > order to use this provider. > > This provider can be used to pass the tc flower rules to the HW for HW > offloads. > > > > Also introducing in this patch series a policy module in which the > > user can program a HW-offload policy. The policy module accept a ovs > > flow and returns a policy decision for each flow:NO_OFFLOAD or HW_ONLY -- > currently the policy is to HW offload all rules. > > > > If the HW_OFFLOAD rule assignment fails the provider will fallback to the > system datapath. > > > > Flower was chosen b/c its sort of natural to state OVS DP rules for > > this classifier. However, the code can be extended to support other > > classifiers such as U32, eBPF, etc which have HW offloads as well. > > > > The use-case we are currently addressing is the newly introduced SRIOV > > switchdev mode in the Linux kernel which is introduced in version 4.8 > > [1][2]. This series was tested against SRIOV VFs vports representors of the > Mellanox 100G ConnectX-4 series exposed by the mlx5 kernel driver. > > > > Paul and Shahar. > > > > [1] > > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 > > 13334e18a74f70c0be58c2eb73af1715325b870 > > [2] > > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=5 > > 3d94892e27409bb2b48140207c0273b2ba65f61 > > Thanks for submitting the series. Clearly this is a topic of interest for > multiple > parties, and it's a good starting point to discuss. > > A few of us also discussed this topic today at netdev, so I'll list a few > points that > we talked about and hopefully others can fill in the bits I miss. Thanks for summarize our meeting today. Attached a link to the pdf pic that show the idea (picture <= 1,000 words) https://drive.google.com/file/d/0B2Yjm5a810FsZEoxOUJHU0l3c01OODUwMzVseXBFOE5MSGxr/view?usp=sharing > > Positives > * Hardware offload decision is made in a module in userspace > * Layered dpif approach means that the tc-based hardware offload could sit in > front of kernel or userspace datapaths > * Separate dpif means that if you don't enable it, it doesn't affect you. > Doesn't > litter another dpif implementation with offload logic. > > Drawbacks > * Additional dpif to maintain. Another implementation to change when > modifying dpif interface. Maybe this doesn't change too often, but there has > been some discussions recently about whether the flow_{put,get,del} should be > converted to use internal flow structures rather than OVS netlink > representation. This is one example of potential impact on development. [RONY] you are right, but I don't think we can add it outher way. I think that the approach of use dpif_netlink will saved us a lot of maintenance. > * Fairly limited support for OVS matches and actions. For instance, it is not > yet > useful for OVN-style pipeline. But that's not a limitation of the design, > just the > current implementation. [RONY] sure, we intend to support OVN and connection tracking, we start with the simple case. > > Other considerations > * Is tc flower filter setup rate and stats dump fast enough? How does it > compare to existing kernel datapath flow setup rate? Multiple threads > inserting > at once? How many filters can be dumped per second? > etc. [RONY] we will test it, and will try to improve the TC if it will be needed > * Currently for a given flow, it will exist in either the offloaded > implementation > or the kernel datapath. Statistics are only drawn from one location. This is > consistent with how ofproto-dpif-upcall will insert flows - one flow_put > operation and one flow is inserted into the datapath. Correspondingly there is > one udpif_key which reflects the most recently used stats for this datapath > flow. There may be situations where flows need to be in both datapaths, in > which case there either needs to be either one udpif_key per datapath > representation of the flow, or the dpif must hide the second flow and > aggregate > stats. [RONY] as you wrote the d
Re: [ovs-dev] [PATCH ovs RFC 0/9] Introducing HW offload support for openvswitch
On 27 September 2016 at 21:45, Paul Blakeywrote: > Openvswitch currently configures the kerenel datapath via netlink over an > internal ovs protocol. > > This patch series offers a new provider: dpif-netlink-tc that uses the tc > flower protocol > to offload ovs rules into HW data-path through netdevices that e.g represent > NIC e-switch ports. > > The user can create a bridge with type: datapath_type=dpif-hw-netlink in > order to use this provider. > This provider can be used to pass the tc flower rules to the HW for HW > offloads. > > Also introducing in this patch series a policy module in which the user can > program a HW-offload > policy. The policy module accept a ovs flow and returns a policy decision for > each > flow:NO_OFFLOAD or HW_ONLY -- currently the policy is to HW offload all rules. > > If the HW_OFFLOAD rule assignment fails the provider will fallback to the > system datapath. > > Flower was chosen b/c its sort of natural to state OVS DP rules for this > classifier. However, > the code can be extended to support other classifiers such as U32, eBPF, etc > which have > HW offloads as well. > > The use-case we are currently addressing is the newly introduced SRIOV > switchdev mode in the > Linux kernel which is introduced in version 4.8 [1][2]. This series was > tested against SRIOV VFs > vports representors of the Mellanox 100G ConnectX-4 series exposed by the > mlx5 kernel driver. > > Paul and Shahar. > > [1] > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=513334e18a74f70c0be58c2eb73af1715325b870 > [2] > http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=53d94892e27409bb2b48140207c0273b2ba65f61 Thanks for submitting the series. Clearly this is a topic of interest for multiple parties, and it's a good starting point to discuss. A few of us also discussed this topic today at netdev, so I'll list a few points that we talked about and hopefully others can fill in the bits I miss. Positives * Hardware offload decision is made in a module in userspace * Layered dpif approach means that the tc-based hardware offload could sit in front of kernel or userspace datapaths * Separate dpif means that if you don't enable it, it doesn't affect you. Doesn't litter another dpif implementation with offload logic. Drawbacks * Additional dpif to maintain. Another implementation to change when modifying dpif interface. Maybe this doesn't change too often, but there has been some discussions recently about whether the flow_{put,get,del} should be converted to use internal flow structures rather than OVS netlink representation. This is one example of potential impact on development. * Fairly limited support for OVS matches and actions. For instance, it is not yet useful for OVN-style pipeline. But that's not a limitation of the design, just the current implementation. Other considerations * Is tc flower filter setup rate and stats dump fast enough? How does it compare to existing kernel datapath flow setup rate? Multiple threads inserting at once? How many filters can be dumped per second? etc. * Currently for a given flow, it will exist in either the offloaded implementation or the kernel datapath. Statistics are only drawn from one location. This is consistent with how ofproto-dpif-upcall will insert flows - one flow_put operation and one flow is inserted into the datapath. Correspondingly there is one udpif_key which reflects the most recently used stats for this datapath flow. There may be situations where flows need to be in both datapaths, in which case there either needs to be either one udpif_key per datapath representation of the flow, or the dpif must hide the second flow and aggregate stats. Extra, not previously discussed * Testing - we may want a mode where tc flower is used in software mode, to test the tc netlink interface. It would be good to see extension of kernel module testsuite to at least test some basics of the interface, perhaps also the flower behaviour (though that may be out of scope of the testsuite in the OVS tree). Thanks, Joe ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev