Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 9/5/17, 2:01 AM, "Yuanhan Liu"wrote: On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote: > > > On 8/31/17, 3:13 AM, "Yuanhan Liu" wrote: > > On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote: > > > > [Finn] > > > > I think we should not further intermix the rxqs distributed to different pmd's, other than initially configured, when setting up hw-offload. If we make a round-robin distribution of the rxqs, a different pmd will most likely receive the hw-offloaded packets - not the same pmd that ran the slow-path originally creating the flow. > > > > It is usual to optimize caches etc. per pmd and that would not work then. Maybe the further processing of the hw-offloaded packets does not need these optimizations at the moment, however, IMHO I think we would be better off using the first proposal above (use the same rxq as the one creating the flow). > > > > [Darrell] Several ideas have some validity. > > However, this sounds reasonable and simple and we could revisit as needed. > > What do you think Yuanhan ? > > Just want to make sure we are on the same page: do you mean the original > solution/workaround I mentioned in the cover letter: record the rxq at > recv and pass it down to flow creation? > > If so, I'm okay with it. > > [Darrell] > This is the relevant part from the cover letter: > > “One possible > solution is to record the rxq and pass it down to the flow creation > stage. It would be much better, but it's still far away from being perfect. > Because it might have changed the steering rules stealthily, which may > break the default RSS setup by OVS-DPDK.” > > This is a reasonable first cut. > However, the flows installed are masked flows but the associated packets would ‘normally’ end up on multiple > PMDs due to RSS, right ? Why it's "multiple PMDs due to RSS"? Isn't RSS for distributing packets to multiple RX queues inside the NIC? [Darrell] I was referring to the general case of using the distribution across queues to distribute work to different PMDs. --yliu > But for HWOL, we specify ‘the queue’ to be the one we receive the first packet from. > This is what I was getting at b4. So, future workarounds would be ‘auto-splitting flows’ across queues, user specified flow->queue > associations etc > > > > > > > --yliu > > > > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote: > > > On 8/31/17, 3:13 AM, "Yuanhan Liu"wrote: > > On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote: > > > > [Finn] > > > > I think we should not further intermix the rxqs distributed to > different pmd's, other than initially configured, when setting up hw-offload. > If we make a round-robin distribution of the rxqs, a different pmd will most > likely receive the hw-offloaded packets - not the same pmd that ran the > slow-path originally creating the flow. > > > > It is usual to optimize caches etc. per pmd and that would not work > then. Maybe the further processing of the hw-offloaded packets does not need > these optimizations at the moment, however, IMHO I think we would be better > off using the first proposal above (use the same rxq as the one creating the > flow). > > > > [Darrell] Several ideas have some validity. > > However, this sounds reasonable and simple and we > could revisit as needed. > > What do you think Yuanhan ? > > Just want to make sure we are on the same page: do you mean the original > solution/workaround I mentioned in the cover letter: record the rxq at > recv and pass it down to flow creation? > > If so, I'm okay with it. > > [Darrell] > This is the relevant part from the cover letter: > > “One possible > solution is to record the rxq and pass it down to the flow creation > stage. It would be much better, but it's still far away from being perfect. > Because it might have changed the steering rules stealthily, which may > break the default RSS setup by OVS-DPDK.” > > This is a reasonable first cut. > However, the flows installed are masked flows but the associated packets > would ‘normally’ end up on multiple > PMDs due to RSS, right ? Why it's "multiple PMDs due to RSS"? Isn't RSS for distributing packets to multiple RX queues inside the NIC? --yliu > But for HWOL, we specify ‘the queue’ to be the one we receive the first > packet from. > This is what I was getting at b4. So, future workarounds would be > ‘auto-splitting flows’ across queues, user specified flow->queue > associations etc > > > > > > > --yliu > > > > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 8/31/17, 3:16 AM, "Yuanhan Liu"wrote: On Wed, Aug 30, 2017 at 07:39:35PM +, Darrell Ball wrote: > > > Note that it's disabled by default, which can be enabled by: > > > > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > > > > > Maybe per in-port configuration would alleviate the issue to a certain degree. > > > > Yes, it could be done. I choose it for following reasons: > > > > - the option is already there, used by tc offloads. > > - it also simplifies the (first) patchset a bit, IMO. > > > > Of course, I understand. > > > > However, I'm okay with making it per port. What's your suggestion for > > this? Making "hw-offload" be port, or introducing another one? If so, > > what's your suggestion on the naming? > > > > I am not suggesting to drop the global configuration > > Mainly we ‘consider’ additional per interface configuration because of the restriction with > > queue action we discuss. This reduces the scope of the queue remapping from what RSS would yield > > with HWOL. > > I would expect that when such configuration is done (if it were done), that > > typically multiple ports would be configured, since traffic flows bi-directionally at least. > > > > If we were to do this, one of the possibilities would be something like: > > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true > > What's the scope between the global one and this one then? We simply > ignore the globle one for OVS-DPDK? > > [Darrell] > If we were to have such an option (and it is a pretty big ‘if’), then more > specific scope (i.e. interface scope here) would eclipse global scope I think. > Maybe initially, we try to keep it as simple as possible, Like the one I have already done, re-use "hw-offload" option? :) This is the simplest I could think of. [Darrell] At the interface scope; the name hw-offload could be reused and documented that it is only used by dpdk, ‘if’ this route were to be followed. --yliu > but I could see some other use cases > for interface level config., like HW resource contention for multiport NICs ? > > > Thanks for the review. BTW, would you please add me in 'to' or 'cc' > > list while replying to me? Otherwise, it's easy to get missed: too > > many emails :/ > > > > of course > > Thank you! > > --yliu > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 8/31/17, 3:13 AM, "Yuanhan Liu"wrote: On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote: > > [Finn] > > I think we should not further intermix the rxqs distributed to different pmd's, other than initially configured, when setting up hw-offload. If we make a round-robin distribution of the rxqs, a different pmd will most likely receive the hw-offloaded packets - not the same pmd that ran the slow-path originally creating the flow. > > It is usual to optimize caches etc. per pmd and that would not work then. Maybe the further processing of the hw-offloaded packets does not need these optimizations at the moment, however, IMHO I think we would be better off using the first proposal above (use the same rxq as the one creating the flow). > > [Darrell] Several ideas have some validity. > However, this sounds reasonable and simple and we could revisit as needed. > What do you think Yuanhan ? Just want to make sure we are on the same page: do you mean the original solution/workaround I mentioned in the cover letter: record the rxq at recv and pass it down to flow creation? If so, I'm okay with it. [Darrell] This is the relevant part from the cover letter: “One possible solution is to record the rxq and pass it down to the flow creation stage. It would be much better, but it's still far away from being perfect. Because it might have changed the steering rules stealthily, which may break the default RSS setup by OVS-DPDK.” This is a reasonable first cut. However, the flows installed are masked flows but the associated packets would ‘normally’ end up on multiple PMDs due to RSS, right ? But for HWOL, we specify ‘the queue’ to be the one we receive the first packet from. This is what I was getting at b4. So, future workarounds would be ‘auto-splitting flows’ across queues, user specified flow->queue associations etc --yliu ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Wed, Aug 30, 2017 at 07:39:35PM +, Darrell Ball wrote: > > > Note that it's disabled by default, which can be enabled > by: > > > > > > $ ovs-vsctl set Open_vSwitch . > other_config:hw-offload=true > > > > > > Maybe per in-port configuration would alleviate the issue to a > certain degree. > > > > Yes, it could be done. I choose it for following reasons: > > > > - the option is already there, used by tc offloads. > > - it also simplifies the (first) patchset a bit, IMO. > > > > Of course, I understand. > > > > However, I'm okay with making it per port. What's your suggestion > for > > this? Making "hw-offload" be port, or introducing another one? If > so, > > what's your suggestion on the naming? > > > > I am not suggesting to drop the global configuration > > Mainly we ‘consider’ additional per interface configuration because of > the restriction with > > queue action we discuss. This reduces the scope of the queue remapping > from what RSS would yield > > with HWOL. > > I would expect that when such configuration is done (if it were done), > that > > typically multiple ports would be configured, since traffic flows > bi-directionally at least. > > > > If we were to do this, one of the possibilities would be something like: > > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true > > What's the scope between the global one and this one then? We simply > ignore the globle one for OVS-DPDK? > > [Darrell] > If we were to have such an option (and it is a pretty big ‘if’), then more > specific scope (i.e. interface scope here) would eclipse global scope I think. > Maybe initially, we try to keep it as simple as possible, Like the one I have already done, re-use "hw-offload" option? :) This is the simplest I could think of. --yliu > but I could see some other use cases > for interface level config., like HW resource contention for multiport NICs ? > > > Thanks for the review. BTW, would you please add me in 'to' or 'cc' > > list while replying to me? Otherwise, it's easy to get missed: too > > many emails :/ > > > > of course > > Thank you! > > --yliu > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote: > > [Finn] > > I think we should not further intermix the rxqs distributed to different > pmd's, other than initially configured, when setting up hw-offload. If we > make a round-robin distribution of the rxqs, a different pmd will most likely > receive the hw-offloaded packets - not the same pmd that ran the slow-path > originally creating the flow. > > It is usual to optimize caches etc. per pmd and that would not work then. > Maybe the further processing of the hw-offloaded packets does not need these > optimizations at the moment, however, IMHO I think we would be better off > using the first proposal above (use the same rxq as the one creating the > flow). > > [Darrell] Several ideas have some validity. > However, this sounds reasonable and simple and we could > revisit as needed. > What do you think Yuanhan ? Just want to make sure we are on the same page: do you mean the original solution/workaround I mentioned in the cover letter: record the rxq at recv and pass it down to flow creation? If so, I'm okay with it. --yliu ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Wed, Aug 30, 2017 at 04:23:53PM +, Darrell Ball wrote: > One other general comment about the series: > > It is not a ‘big deal’ at this point since we are discussing high level > concepts, but > we would need to address these at some point anyways. > > Could you run > > ./utilities/checkpatch.py > > before the next iteration Sure. --yliu > > Thanks Darrell > > > On 8/22/17, 11:24 PM, "Yuanhan Liu"wrote: > > Hi, > > Here is a joint work from Mellanox and Napatech, to enable the flow hw > offload with the DPDK generic flow interface (rte_flow). > > The basic idea is to associate the flow with a mark id (a unit32_t > number). > Later, we then get the flow directly from the mark id, bypassing the > heavy > emc processing, including miniflow_extract. > > The association is done with CMAP in patch 1. It also resues the flow > APIs introduced while adding the tc offloads. The emc bypassing is > done > in patch 2. The flow offload is done in patch 4, which mainly does two > things: > > - translate the ovs match to DPDK rte flow patterns > - bind those patterns with a MARK action. > > Afterwards, the NIC will set the mark id in every pkt's mbuf when it > matches the flow. That's basically how we could get the flow directly > from the received mbuf. > > While testing with PHY-PHY forwarding with one core and one queue, I > got > almost 80% performance boost. For PHY-vhost forwarding, I got about > 50% > performance boost. > > > Though that being said, this patchset still has issues unresolved. The > major issue is that maybe most NIC (for instance, Mellanox and Intel) > can not support a pure MARK action. It has to be used together with a > QUEUE action, which in turn needs a queue index. That comes to the > issue: > the queue index is not given in the flow context. To make it work, > patch > 5 just set the queue index to 0, which is obviously wrong. One > possible > solution is to record the rxq and pass it down to the flow creation > stage. It would be much better, but it's still far away from being > perfect. > Because it might have changed the steering rules stealthily, which may > break the default RSS setup by OVS-DPDK. > > The reason I still want to send it out is to get more > comments/thoughts > from community on this whole patchset. Meanwhile, I will try to > resolve > the QUEUE action issue. > > Note that it's disabled by default, which can be enabled by: > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > Thanks. > > --yliu > > --- > Finn Christensen (3): > netdev-dpdk: implement flow put with rte flow > netdev-dpdk: retry with queue action > netdev-dpdk: set FDIR config > > Yuanhan Liu (4): > dpif-netdev: associate flow with a mark id > dpif-netdev: retrieve flow directly from the flow mark > netdev-dpdk: convert ufid to dpdk flow > netdev-dpdk: remove offloaded flow on deletion > > lib/dp-packet.h | 13 ++ > lib/dpif-netdev.c | 105 +++ > lib/netdev-dpdk.c | 511 > +- > lib/netdev.h | 6 + > 4 files changed, 634 insertions(+), 1 deletion(-) > > -- > 2.7.4 > > > > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 8/29/17, 7:25 PM, "Yuanhan Liu"wrote: On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote: > > > Though that being said, this patchset still has issues unresolved. The > > major issue is that maybe most NIC (for instance, Mellanox and Intel) > > can not support a pure MARK action. It has to be used together with a > > QUEUE action, which in turn needs a queue index. That comes to the issue: > > the queue index is not given in the flow context. To make it work, patch > > 5 just set the queue index to 0, which is obviously wrong. One possible > > solution is to record the rxq and pass it down to the flow creation > > stage. It would be much better, but it's still far away from being perfect. > > Because it might have changed the steering rules stealthily, which may > > break the default RSS setup by OVS-DPDK. > > > > If this cannot be solved by removing this restriction, I guess another alternative is to actively > > manage flow-queue associations. > > do you mean let user provide the set_queue action? > > I mean in the worst case, if the restriction cannot be lifted, we might to need to do flow distribution > across queues with additional added logic/tracking, Like the way mentioned above: record the rxq at recv stage and pass it down to flow_put? > because we would not really want to keep this restriction > without a workaround. For sure. > This would be a fair bit of work, however. Indeed. Another reason I don't really like above solution is it changes the core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even solve the issue perfectly: it's still a workaround! That being said, if we have to provide a workaround, I think maybe a simple round-robin queue distribution is better, judging it's our first attempt to have hw offload? Later, we could either improve the workaround, or remove the restriction from DPDK side (if it's possible). Good to you? > Alternatively, pushing this to the user in the general may be too much overhead; what do you think ?. > As a user specification, it could be optional, however ? Yes, that's what I was thinking. For the users that know their NIC supports MARK with QUEUE action, they could not add such option. For others, it's suggested to add it. Or, it's a must (if we do not do workaround like above). Makes sense? Hi Yuanhan I responded to the last e-mail since there is some overlap in the discussion on some points. > > > > The reason I still want to send it out is to get more comments/thoughts > > from community on this whole patchset. Meanwhile, I will try to resolve > > the QUEUE action issue. > > > > Note that it's disabled by default, which can be enabled by: > > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > > > Maybe per in-port configuration would alleviate the issue to a certain degree. > > Yes, it could be done. I choose it for following reasons: > > - the option is already there, used by tc offloads. > - it also simplifies the (first) patchset a bit, IMO. > > Of course, I understand. > > However, I'm okay with making it per port. What's your suggestion for > this? Making "hw-offload" be port, or introducing another one? If so, > what's your suggestion on the naming? > > I am not suggesting to drop the global configuration > Mainly we ‘consider’ additional per interface configuration because of the restriction with > queue action we discuss. This reduces the scope of the queue remapping from what RSS would yield > with HWOL. > I would expect that when such configuration is done (if it were done), that > typically multiple ports would be configured, since traffic flows bi-directionally at least. > > If we were to do this, one of the possibilities would be something like: > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true What's the scope between the global one and this one then? We simply ignore the globle one for OVS-DPDK? [Darrell] If we were to have such an option (and it is a pretty big ‘if’), then more specific scope (i.e. interface scope here) would eclipse global scope I think. Maybe initially, we try to keep it as simple as possible, but I could see some other use cases for interface level config., like HW resource contention for multiport NICs ? > Thanks for the review. BTW, would you please add me in 'to'
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 8/30/17, 12:19 AM, "Finn Christensen" <f...@napatech.com> wrote: Hi, Just wanted to add one comment to this correspondence. See below. Thanks, Finn Christensen -Original Message- From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Yuanhan Liu Sent: 30. august 2017 04:25 To: Darrell Ball <db...@vmware.com> Cc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote: > > > Though that being said, this patchset still has issues unresolved. The > > major issue is that maybe most NIC (for instance, Mellanox and Intel) > > can not support a pure MARK action. It has to be used together with a > > QUEUE action, which in turn needs a queue index. That comes to the issue: > > the queue index is not given in the flow context. To make it work, patch > > 5 just set the queue index to 0, which is obviously wrong. One possible > > solution is to record the rxq and pass it down to the flow creation > > stage. It would be much better, but it's still far away from being perfect. > > Because it might have changed the steering rules stealthily, which may > > break the default RSS setup by OVS-DPDK. > > > > If this cannot be solved by removing this restriction, I guess another alternative is to actively > > manage flow-queue associations. > > do you mean let user provide the set_queue action? > > I mean in the worst case, if the restriction cannot be lifted, we > might to need to do flow distribution across queues with additional > added logic/tracking, Like the way mentioned above: record the rxq at recv stage and pass it down to flow_put? > because we would not really want to keep this restriction without a > workaround. For sure. > This would be a fair bit of work, however. Indeed. Another reason I don't really like above solution is it changes the core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even solve the issue perfectly: it's still a workaround! That being said, if we have to provide a workaround, I think maybe a simple round-robin queue distribution is better, judging it's our first attempt to have hw offload? Later, we could either improve the workaround, or remove the restriction from DPDK side (if it's possible). Good to you? [Finn] I think we should not further intermix the rxqs distributed to different pmd's, other than initially configured, when setting up hw-offload. If we make a round-robin distribution of the rxqs, a different pmd will most likely receive the hw-offloaded packets - not the same pmd that ran the slow-path originally creating the flow. It is usual to optimize caches etc. per pmd and that would not work then. Maybe the further processing of the hw-offloaded packets does not need these optimizations at the moment, however, IMHO I think we would be better off using the first proposal above (use the same rxq as the one creating the flow). [Darrell] Several ideas have some validity. However, this sounds reasonable and simple and we could revisit as needed. What do you think Yuanhan ? We will not be able to do RSS on a specific hw-offloaded flow anyway. That would require a larger rework of the RSS configuration in OVS, as Darell points out. > Alternatively, pushing this to the user in the general may be too much overhead; what do you think ?. > As a user specification, it could be optional, however ? Yes, that's what I was thinking. For the users that know their NIC supports MARK with QUEUE action, they could not add such option. For others, it's suggested to add it. Or, it's a must (if we do not do workaround like above). Makes sense? > > > > The reason I still want to send it out is to get more comments/thoughts > > from community on this whole patchset. Meanwhile, I will try to resolve > > the QUEUE action issue. > > > > Note that it's disabled by default, which can be enabled by: > >
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
One other general comment about the series: It is not a ‘big deal’ at this point since we are discussing high level concepts, but we would need to address these at some point anyways. Could you run ./utilities/checkpatch.py before the next iteration Thanks Darrell On 8/22/17, 11:24 PM, "Yuanhan Liu"wrote: Hi, Here is a joint work from Mellanox and Napatech, to enable the flow hw offload with the DPDK generic flow interface (rte_flow). The basic idea is to associate the flow with a mark id (a unit32_t number). Later, we then get the flow directly from the mark id, bypassing the heavy emc processing, including miniflow_extract. The association is done with CMAP in patch 1. It also resues the flow APIs introduced while adding the tc offloads. The emc bypassing is done in patch 2. The flow offload is done in patch 4, which mainly does two things: - translate the ovs match to DPDK rte flow patterns - bind those patterns with a MARK action. Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches the flow. That's basically how we could get the flow directly from the received mbuf. While testing with PHY-PHY forwarding with one core and one queue, I got almost 80% performance boost. For PHY-vhost forwarding, I got about 50% performance boost. Though that being said, this patchset still has issues unresolved. The major issue is that maybe most NIC (for instance, Mellanox and Intel) can not support a pure MARK action. It has to be used together with a QUEUE action, which in turn needs a queue index. That comes to the issue: the queue index is not given in the flow context. To make it work, patch 5 just set the queue index to 0, which is obviously wrong. One possible solution is to record the rxq and pass it down to the flow creation stage. It would be much better, but it's still far away from being perfect. Because it might have changed the steering rules stealthily, which may break the default RSS setup by OVS-DPDK. The reason I still want to send it out is to get more comments/thoughts from community on this whole patchset. Meanwhile, I will try to resolve the QUEUE action issue. Note that it's disabled by default, which can be enabled by: $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true Thanks. --yliu --- Finn Christensen (3): netdev-dpdk: implement flow put with rte flow netdev-dpdk: retry with queue action netdev-dpdk: set FDIR config Yuanhan Liu (4): dpif-netdev: associate flow with a mark id dpif-netdev: retrieve flow directly from the flow mark netdev-dpdk: convert ufid to dpdk flow netdev-dpdk: remove offloaded flow on deletion lib/dp-packet.h | 13 ++ lib/dpif-netdev.c | 105 +++ lib/netdev-dpdk.c | 511 +- lib/netdev.h | 6 + 4 files changed, 634 insertions(+), 1 deletion(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
Hi, Just wanted to add one comment to this correspondence. See below. Thanks, Finn Christensen -Original Message- From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Yuanhan Liu Sent: 30. august 2017 04:25 To: Darrell Ball <db...@vmware.com> Cc: d...@openvswitch.org Subject: Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote: > > > Though that being said, this patchset still has issues > unresolved. The > > major issue is that maybe most NIC (for instance, Mellanox and > Intel) > > can not support a pure MARK action. It has to be used together > with a > > QUEUE action, which in turn needs a queue index. That comes to > the issue: > > the queue index is not given in the flow context. To make it > work, patch > > 5 just set the queue index to 0, which is obviously wrong. One > possible > > solution is to record the rxq and pass it down to the flow > creation > > stage. It would be much better, but it's still far away from > being perfect. > > Because it might have changed the steering rules stealthily, > which may > > break the default RSS setup by OVS-DPDK. > > > > If this cannot be solved by removing this restriction, I guess another > alternative is to actively > > manage flow-queue associations. > > do you mean let user provide the set_queue action? > > I mean in the worst case, if the restriction cannot be lifted, we > might to need to do flow distribution across queues with additional > added logic/tracking, Like the way mentioned above: record the rxq at recv stage and pass it down to flow_put? > because we would not really want to keep this restriction without a > workaround. For sure. > This would be a fair bit of work, however. Indeed. Another reason I don't really like above solution is it changes the core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even solve the issue perfectly: it's still a workaround! That being said, if we have to provide a workaround, I think maybe a simple round-robin queue distribution is better, judging it's our first attempt to have hw offload? Later, we could either improve the workaround, or remove the restriction from DPDK side (if it's possible). Good to you? [Finn] I think we should not further intermix the rxqs distributed to different pmd's, other than initially configured, when setting up hw-offload. If we make a round-robin distribution of the rxqs, a different pmd will most likely receive the hw-offloaded packets - not the same pmd that ran the slow-path originally creating the flow. It is usual to optimize caches etc. per pmd and that would not work then. Maybe the further processing of the hw-offloaded packets does not need these optimizations at the moment, however, IMHO I think we would be better off using the first proposal above (use the same rxq as the one creating the flow). We will not be able to do RSS on a specific hw-offloaded flow anyway. That would require a larger rework of the RSS configuration in OVS, as Darell points out. > Alternatively, pushing this to the user in the general may be too much > overhead; what do you think ?. > As a user specification, it could be optional, however ? Yes, that's what I was thinking. For the users that know their NIC supports MARK with QUEUE action, they could not add such option. For others, it's suggested to add it. Or, it's a must (if we do not do workaround like above). Makes sense? > > > > The reason I still want to send it out is to get more > comments/thoughts > > from community on this whole patchset. Meanwhile, I will try to > resolve > > the QUEUE action issue. > > > > Note that it's disabled by default, which can be enabled by: > > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > > > Maybe per in-port configuration would alleviate the issue to a certain > degree. > > Yes, it could be done. I choose it for following reasons: > > - the option is already there, used by tc offloads. > - it also simplifies the (first) patchset a bit, IMO. > > Of course, I understand. > > However, I'm okay with making it per port. What's your suggestion for > this? Making "hw-offload" be port, or introducing another one? If so, > what's your suggestion on the naming? > > I am not suggesting to drop the global configuration Mainly w
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote: > > > Though that being said, this patchset still has issues > unresolved. The > > major issue is that maybe most NIC (for instance, Mellanox and > Intel) > > can not support a pure MARK action. It has to be used together > with a > > QUEUE action, which in turn needs a queue index. That comes to > the issue: > > the queue index is not given in the flow context. To make it > work, patch > > 5 just set the queue index to 0, which is obviously wrong. One > possible > > solution is to record the rxq and pass it down to the flow > creation > > stage. It would be much better, but it's still far away from > being perfect. > > Because it might have changed the steering rules stealthily, > which may > > break the default RSS setup by OVS-DPDK. > > > > If this cannot be solved by removing this restriction, I guess another > alternative is to actively > > manage flow-queue associations. > > do you mean let user provide the set_queue action? > > I mean in the worst case, if the restriction cannot be lifted, we might to > need to do flow distribution > across queues with additional added logic/tracking, Like the way mentioned above: record the rxq at recv stage and pass it down to flow_put? > because we would not really want to keep this restriction > without a workaround. For sure. > This would be a fair bit of work, however. Indeed. Another reason I don't really like above solution is it changes the core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even solve the issue perfectly: it's still a workaround! That being said, if we have to provide a workaround, I think maybe a simple round-robin queue distribution is better, judging it's our first attempt to have hw offload? Later, we could either improve the workaround, or remove the restriction from DPDK side (if it's possible). Good to you? > Alternatively, pushing this to the user in the general may be too much > overhead; what do you think ?. > As a user specification, it could be optional, however ? Yes, that's what I was thinking. For the users that know their NIC supports MARK with QUEUE action, they could not add such option. For others, it's suggested to add it. Or, it's a must (if we do not do workaround like above). Makes sense? > > > > The reason I still want to send it out is to get more > comments/thoughts > > from community on this whole patchset. Meanwhile, I will try to > resolve > > the QUEUE action issue. > > > > Note that it's disabled by default, which can be enabled by: > > > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > > > Maybe per in-port configuration would alleviate the issue to a certain > degree. > > Yes, it could be done. I choose it for following reasons: > > - the option is already there, used by tc offloads. > - it also simplifies the (first) patchset a bit, IMO. > > Of course, I understand. > > However, I'm okay with making it per port. What's your suggestion for > this? Making "hw-offload" be port, or introducing another one? If so, > what's your suggestion on the naming? > > I am not suggesting to drop the global configuration > Mainly we ‘consider’ additional per interface configuration because of the > restriction with > queue action we discuss. This reduces the scope of the queue remapping from > what RSS would yield > with HWOL. > I would expect that when such configuration is done (if it were done), that > typically multiple ports would be configured, since traffic flows > bi-directionally at least. > > If we were to do this, one of the possibilities would be something like: > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true What's the scope between the global one and this one then? We simply ignore the globle one for OVS-DPDK? > Thanks for the review. BTW, would you please add me in 'to' or 'cc' > list while replying to me? Otherwise, it's easy to get missed: too > many emails :/ > > of course Thank you! --yliu ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On Tue, Aug 29, 2017 at 07:11:42AM +, Darrell Ball wrote: > > On 8/22/17, 11:24 PM, "Yuanhan Liu"wrote: > > Hi, > > Here is a joint work from Mellanox and Napatech, to enable the flow hw > offload with the DPDK generic flow interface (rte_flow). > > The basic idea is to associate the flow with a mark id (a unit32_t > number). > Later, we then get the flow directly from the mark id, bypassing the > heavy > emc processing, including miniflow_extract. > > The association is done with CMAP in patch 1. It also resues the flow > APIs introduced while adding the tc offloads. The emc bypassing is > done > in patch 2. The flow offload is done in patch 4, which mainly does two > things: > > - translate the ovs match to DPDK rte flow patterns > - bind those patterns with a MARK action. > > Afterwards, the NIC will set the mark id in every pkt's mbuf when it > matches the flow. That's basically how we could get the flow directly > from the received mbuf. > > While testing with PHY-PHY forwarding with one core and one queue, I > got > almost 80% performance boost. For PHY-vhost forwarding, I got about > 50% > performance boost. > > > Though that being said, this patchset still has issues unresolved. The > major issue is that maybe most NIC (for instance, Mellanox and Intel) > can not support a pure MARK action. It has to be used together with a > QUEUE action, which in turn needs a queue index. That comes to the > issue: > the queue index is not given in the flow context. To make it work, > patch > 5 just set the queue index to 0, which is obviously wrong. One > possible > solution is to record the rxq and pass it down to the flow creation > stage. It would be much better, but it's still far away from being > perfect. > Because it might have changed the steering rules stealthily, which may > break the default RSS setup by OVS-DPDK. > > If this cannot be solved by removing this restriction, I guess another > alternative is to actively > manage flow-queue associations. do you mean let user provide the set_queue action? > > The reason I still want to send it out is to get more > comments/thoughts > from community on this whole patchset. Meanwhile, I will try to > resolve > the QUEUE action issue. > > Note that it's disabled by default, which can be enabled by: > > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true > > Maybe per in-port configuration would alleviate the issue to a certain degree. Yes, it could be done. I choose it for following reasons: - the option is already there, used by tc offloads. - it also simplifies the (first) patchset a bit, IMO. However, I'm okay with making it per port. What's your suggestion for this? Making "hw-offload" be port, or introducing another one? If so, what's your suggestion on the naming? Thanks for the review. BTW, would you please add me in 'to' or 'cc' list while replying to me? Otherwise, it's easy to get missed: too many emails :/ --yliu ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
On 8/22/17, 11:24 PM, "Yuanhan Liu"wrote: Hi, Here is a joint work from Mellanox and Napatech, to enable the flow hw offload with the DPDK generic flow interface (rte_flow). The basic idea is to associate the flow with a mark id (a unit32_t number). Later, we then get the flow directly from the mark id, bypassing the heavy emc processing, including miniflow_extract. The association is done with CMAP in patch 1. It also resues the flow APIs introduced while adding the tc offloads. The emc bypassing is done in patch 2. The flow offload is done in patch 4, which mainly does two things: - translate the ovs match to DPDK rte flow patterns - bind those patterns with a MARK action. Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches the flow. That's basically how we could get the flow directly from the received mbuf. While testing with PHY-PHY forwarding with one core and one queue, I got almost 80% performance boost. For PHY-vhost forwarding, I got about 50% performance boost. Though that being said, this patchset still has issues unresolved. The major issue is that maybe most NIC (for instance, Mellanox and Intel) can not support a pure MARK action. It has to be used together with a QUEUE action, which in turn needs a queue index. That comes to the issue: the queue index is not given in the flow context. To make it work, patch 5 just set the queue index to 0, which is obviously wrong. One possible solution is to record the rxq and pass it down to the flow creation stage. It would be much better, but it's still far away from being perfect. Because it might have changed the steering rules stealthily, which may break the default RSS setup by OVS-DPDK. If this cannot be solved by removing this restriction, I guess another alternative is to actively manage flow-queue associations. The reason I still want to send it out is to get more comments/thoughts from community on this whole patchset. Meanwhile, I will try to resolve the QUEUE action issue. Note that it's disabled by default, which can be enabled by: $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true Maybe per in-port configuration would alleviate the issue to a certain degree. Thanks. --yliu --- Finn Christensen (3): netdev-dpdk: implement flow put with rte flow netdev-dpdk: retry with queue action netdev-dpdk: set FDIR config Yuanhan Liu (4): dpif-netdev: associate flow with a mark id dpif-netdev: retrieve flow directly from the flow mark netdev-dpdk: convert ufid to dpdk flow netdev-dpdk: remove offloaded flow on deletion lib/dp-packet.h | 13 ++ lib/dpif-netdev.c | 105 +++ lib/netdev-dpdk.c | 511 +- lib/netdev.h | 6 + 4 files changed, 634 insertions(+), 1 deletion(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
Thank you for this contribution It is a big step forward for HWOL; even partial offload for these patches have a significant benefit. I had done a first pass on Wednesday when the patches came out but I’ll send comments out on Monday On 8/22/17, 11:24 PM, "Yuanhan Liu"wrote: Hi, Here is a joint work from Mellanox and Napatech, to enable the flow hw offload with the DPDK generic flow interface (rte_flow). The basic idea is to associate the flow with a mark id (a unit32_t number). Later, we then get the flow directly from the mark id, bypassing the heavy emc processing, including miniflow_extract. The association is done with CMAP in patch 1. It also resues the flow APIs introduced while adding the tc offloads. The emc bypassing is done in patch 2. The flow offload is done in patch 4, which mainly does two things: - translate the ovs match to DPDK rte flow patterns - bind those patterns with a MARK action. Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches the flow. That's basically how we could get the flow directly from the received mbuf. While testing with PHY-PHY forwarding with one core and one queue, I got almost 80% performance boost. For PHY-vhost forwarding, I got about 50% performance boost. Though that being said, this patchset still has issues unresolved. The major issue is that maybe most NIC (for instance, Mellanox and Intel) can not support a pure MARK action. It has to be used together with a QUEUE action, which in turn needs a queue index. That comes to the issue: the queue index is not given in the flow context. To make it work, patch 5 just set the queue index to 0, which is obviously wrong. One possible solution is to record the rxq and pass it down to the flow creation stage. It would be much better, but it's still far away from being perfect. Because it might have changed the steering rules stealthily, which may break the default RSS setup by OVS-DPDK. The reason I still want to send it out is to get more comments/thoughts from community on this whole patchset. Meanwhile, I will try to resolve the QUEUE action issue. Note that it's disabled by default, which can be enabled by: $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true Thanks. --yliu --- Finn Christensen (3): netdev-dpdk: implement flow put with rte flow netdev-dpdk: retry with queue action netdev-dpdk: set FDIR config Yuanhan Liu (4): dpif-netdev: associate flow with a mark id dpif-netdev: retrieve flow directly from the flow mark netdev-dpdk: convert ufid to dpdk flow netdev-dpdk: remove offloaded flow on deletion lib/dp-packet.h | 13 ++ lib/dpif-netdev.c | 105 +++ lib/netdev-dpdk.c | 511 +- lib/netdev.h | 6 + 4 files changed, 634 insertions(+), 1 deletion(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow
Hi, Here is a joint work from Mellanox and Napatech, to enable the flow hw offload with the DPDK generic flow interface (rte_flow). The basic idea is to associate the flow with a mark id (a unit32_t number). Later, we then get the flow directly from the mark id, bypassing the heavy emc processing, including miniflow_extract. The association is done with CMAP in patch 1. It also resues the flow APIs introduced while adding the tc offloads. The emc bypassing is done in patch 2. The flow offload is done in patch 4, which mainly does two things: - translate the ovs match to DPDK rte flow patterns - bind those patterns with a MARK action. Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches the flow. That's basically how we could get the flow directly from the received mbuf. While testing with PHY-PHY forwarding with one core and one queue, I got almost 80% performance boost. For PHY-vhost forwarding, I got about 50% performance boost. Though that being said, this patchset still has issues unresolved. The major issue is that maybe most NIC (for instance, Mellanox and Intel) can not support a pure MARK action. It has to be used together with a QUEUE action, which in turn needs a queue index. That comes to the issue: the queue index is not given in the flow context. To make it work, patch 5 just set the queue index to 0, which is obviously wrong. One possible solution is to record the rxq and pass it down to the flow creation stage. It would be much better, but it's still far away from being perfect. Because it might have changed the steering rules stealthily, which may break the default RSS setup by OVS-DPDK. The reason I still want to send it out is to get more comments/thoughts from community on this whole patchset. Meanwhile, I will try to resolve the QUEUE action issue. Note that it's disabled by default, which can be enabled by: $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true Thanks. --yliu --- Finn Christensen (3): netdev-dpdk: implement flow put with rte flow netdev-dpdk: retry with queue action netdev-dpdk: set FDIR config Yuanhan Liu (4): dpif-netdev: associate flow with a mark id dpif-netdev: retrieve flow directly from the flow mark netdev-dpdk: convert ufid to dpdk flow netdev-dpdk: remove offloaded flow on deletion lib/dp-packet.h | 13 ++ lib/dpif-netdev.c | 105 +++ lib/netdev-dpdk.c | 511 +- lib/netdev.h | 6 + 4 files changed, 634 insertions(+), 1 deletion(-) -- 2.7.4 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev