Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-09-05 Thread Darrell Ball


On 9/5/17, 2:01 AM, "Yuanhan Liu"  wrote:

On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote:
> 
> 
> On 8/31/17, 3:13 AM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> > 
> > [Finn]
> > 
> > I think we should not further intermix the rxqs distributed to 
different pmd's, other than initially configured, when setting up hw-offload. 
If we make a round-robin distribution of the rxqs, a different pmd will most 
likely receive the hw-offloaded packets - not the same pmd that ran the 
slow-path originally creating the flow.
> > 
> > It is usual to optimize caches etc. per pmd and that would not 
work then. Maybe the further processing of the hw-offloaded packets does not 
need these optimizations at the moment, however, IMHO I think we would be 
better off using the first proposal above (use the same rxq as the one creating 
the flow).
> > 
> > [Darrell] Several ideas have some validity.
> >  However, this sounds reasonable and simple and we 
could revisit as needed.
> >  What do you think Yuanhan ?
> 
> Just want to make sure we are on the same page: do you mean the 
original
> solution/workaround I mentioned in the cover letter: record the rxq at
> recv and pass it down to flow creation?
> 
> If so, I'm okay with it.
> 
> [Darrell]
> This is the relevant part from the cover letter:
> 
> “One possible
>  solution is to record the rxq and pass it down to the flow creation
>  stage. It would be much better, but it's still far away from being 
perfect.
>  Because it might have changed the steering rules stealthily, which may
>  break the default RSS setup by OVS-DPDK.”
> 
> This is a reasonable first cut.
> However, the flows installed are masked flows but the associated packets 
would ‘normally’ end up on multiple
> PMDs due to RSS, right ?

Why it's "multiple PMDs due to RSS"? Isn't RSS for distributing packets
to multiple RX queues inside the NIC?

[Darrell] I was referring to the general case of using the distribution across 
queues to
distribute work to different PMDs.


--yliu

> But for HWOL, we specify ‘the queue’ to be the one we receive the first 
packet from. 
> This is what I was getting at b4. So, future workarounds would be 
‘auto-splitting flows’ across queues, user specified flow->queue
> associations etc
> 
> 
> 
> 
> 
> 
>   --yliu
> 
> 
> 
> 
> 
> 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-09-05 Thread Yuanhan Liu
On Fri, Sep 01, 2017 at 10:35:51PM +, Darrell Ball wrote:
> 
> 
> On 8/31/17, 3:13 AM, "Yuanhan Liu"  wrote:
> 
> On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> > 
> > [Finn]
> > 
> > I think we should not further intermix the rxqs distributed to 
> different pmd's, other than initially configured, when setting up hw-offload. 
> If we make a round-robin distribution of the rxqs, a different pmd will most 
> likely receive the hw-offloaded packets - not the same pmd that ran the 
> slow-path originally creating the flow.
> > 
> > It is usual to optimize caches etc. per pmd and that would not work 
> then. Maybe the further processing of the hw-offloaded packets does not need 
> these optimizations at the moment, however, IMHO I think we would be better 
> off using the first proposal above (use the same rxq as the one creating the 
> flow).
> > 
> > [Darrell] Several ideas have some validity.
> >  However, this sounds reasonable and simple and we 
> could revisit as needed.
> >  What do you think Yuanhan ?
> 
> Just want to make sure we are on the same page: do you mean the original
> solution/workaround I mentioned in the cover letter: record the rxq at
> recv and pass it down to flow creation?
> 
> If so, I'm okay with it.
> 
> [Darrell]
> This is the relevant part from the cover letter:
> 
> “One possible
>  solution is to record the rxq and pass it down to the flow creation
>  stage. It would be much better, but it's still far away from being perfect.
>  Because it might have changed the steering rules stealthily, which may
>  break the default RSS setup by OVS-DPDK.”
> 
> This is a reasonable first cut.
> However, the flows installed are masked flows but the associated packets 
> would ‘normally’ end up on multiple
> PMDs due to RSS, right ?

Why it's "multiple PMDs due to RSS"? Isn't RSS for distributing packets
to multiple RX queues inside the NIC?

--yliu

> But for HWOL, we specify ‘the queue’ to be the one we receive the first 
> packet from. 
> This is what I was getting at b4. So, future workarounds would be 
> ‘auto-splitting flows’ across queues, user specified flow->queue
> associations etc
> 
> 
> 
> 
> 
> 
>   --yliu
> 
> 
> 
> 
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-09-01 Thread Darrell Ball


On 8/31/17, 3:16 AM, "Yuanhan Liu"  wrote:

On Wed, Aug 30, 2017 at 07:39:35PM +, Darrell Ball wrote:
> > > Note that it's disabled by default, which can be 
enabled by:
> > > 
> > > $ ovs-vsctl set Open_vSwitch . 
other_config:hw-offload=true
> > > 
> > > Maybe per in-port configuration would alleviate the issue to 
a certain degree.
> > 
> > Yes, it could be done. I choose it for following reasons:
> > 
> > - the option is already there, used by tc offloads.
> > - it also simplifies the (first) patchset a bit, IMO.
> > 
> > Of course, I understand.
> > 
> > However, I'm okay with making it per port. What's your 
suggestion for
> > this? Making "hw-offload" be port, or introducing another one? 
If so,
> > what's your suggestion on the naming?
> > 
> > I am not suggesting to drop the global configuration
> > Mainly we ‘consider’ additional per interface configuration because 
of the restriction with
> > queue action we discuss. This reduces the scope of the queue 
remapping from what RSS would yield
> > with HWOL.
> > I would expect that when such configuration is done (if it were 
done), that
> > typically multiple ports would be configured, since traffic flows 
bi-directionally at least.
> > 
> > If we were to do this, one of the possibilities would be something 
like:
> > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true
> 
> What's the scope between the global one and this one then? We simply
> ignore the globle one for OVS-DPDK?
> 
> [Darrell]
> If we were to have such an option (and it is a pretty big ‘if’), then more
> specific scope (i.e. interface scope here) would eclipse global scope I 
think.
> Maybe initially, we try to keep it as simple as possible,

Like the one I have already done, re-use "hw-offload" option? :)
This is the simplest I could think of.

[Darrell]
At the interface scope; the name hw-offload could be reused and documented that
it is only used by dpdk, ‘if’ this route were to be followed.


--yliu

> but I could see some other use cases
> for interface level config., like HW resource contention for multiport 
NICs ?
> 
> > Thanks for the review. BTW, would you please add me in 'to' or 
'cc'
> > list while replying to me?  Otherwise, it's easy to get missed: 
too
> > many emails :/
> > 
> > of course
> 
> Thank you!
> 
>   --yliu
> 
> 




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-09-01 Thread Darrell Ball


On 8/31/17, 3:13 AM, "Yuanhan Liu"  wrote:

On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> 
> [Finn]
> 
> I think we should not further intermix the rxqs distributed to 
different pmd's, other than initially configured, when setting up hw-offload. 
If we make a round-robin distribution of the rxqs, a different pmd will most 
likely receive the hw-offloaded packets - not the same pmd that ran the 
slow-path originally creating the flow.
> 
> It is usual to optimize caches etc. per pmd and that would not work 
then. Maybe the further processing of the hw-offloaded packets does not need 
these optimizations at the moment, however, IMHO I think we would be better off 
using the first proposal above (use the same rxq as the one creating the flow).
> 
> [Darrell] Several ideas have some validity.
>  However, this sounds reasonable and simple and we could 
revisit as needed.
>  What do you think Yuanhan ?

Just want to make sure we are on the same page: do you mean the original
solution/workaround I mentioned in the cover letter: record the rxq at
recv and pass it down to flow creation?

If so, I'm okay with it.

[Darrell]
This is the relevant part from the cover letter:

“One possible
 solution is to record the rxq and pass it down to the flow creation
 stage. It would be much better, but it's still far away from being perfect.
 Because it might have changed the steering rules stealthily, which may
 break the default RSS setup by OVS-DPDK.”

This is a reasonable first cut.
However, the flows installed are masked flows but the associated packets would 
‘normally’ end up on multiple
PMDs due to RSS, right ? But for HWOL, we specify ‘the queue’ to be the one we 
receive the first packet from. 
This is what I was getting at b4. So, future workarounds would be 
‘auto-splitting flows’ across queues, user specified flow->queue
associations etc






--yliu






___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-31 Thread Yuanhan Liu
On Wed, Aug 30, 2017 at 07:39:35PM +, Darrell Ball wrote:
> > > Note that it's disabled by default, which can be enabled 
> by:
> > > 
> > > $ ovs-vsctl set Open_vSwitch . 
> other_config:hw-offload=true
> > > 
> > > Maybe per in-port configuration would alleviate the issue to a 
> certain degree.
> > 
> > Yes, it could be done. I choose it for following reasons:
> > 
> > - the option is already there, used by tc offloads.
> > - it also simplifies the (first) patchset a bit, IMO.
> > 
> > Of course, I understand.
> > 
> > However, I'm okay with making it per port. What's your suggestion 
> for
> > this? Making "hw-offload" be port, or introducing another one? If 
> so,
> > what's your suggestion on the naming?
> > 
> > I am not suggesting to drop the global configuration
> > Mainly we ‘consider’ additional per interface configuration because of 
> the restriction with
> > queue action we discuss. This reduces the scope of the queue remapping 
> from what RSS would yield
> > with HWOL.
> > I would expect that when such configuration is done (if it were done), 
> that
> > typically multiple ports would be configured, since traffic flows 
> bi-directionally at least.
> > 
> > If we were to do this, one of the possibilities would be something like:
> > ovs-vsctl set Interface dpdk0 other_config:hw-offload=true
> 
> What's the scope between the global one and this one then? We simply
> ignore the globle one for OVS-DPDK?
> 
> [Darrell]
> If we were to have such an option (and it is a pretty big ‘if’), then more
> specific scope (i.e. interface scope here) would eclipse global scope I think.
> Maybe initially, we try to keep it as simple as possible,

Like the one I have already done, re-use "hw-offload" option? :)
This is the simplest I could think of.

--yliu

> but I could see some other use cases
> for interface level config., like HW resource contention for multiport NICs ?
> 
> > Thanks for the review. BTW, would you please add me in 'to' or 'cc'
> > list while replying to me?  Otherwise, it's easy to get missed: too
> > many emails :/
> > 
> > of course
> 
> Thank you!
> 
>   --yliu
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-31 Thread Yuanhan Liu
On Wed, Aug 30, 2017 at 07:28:01PM +, Darrell Ball wrote:
> 
> [Finn]
> 
> I think we should not further intermix the rxqs distributed to different 
> pmd's, other than initially configured, when setting up hw-offload. If we 
> make a round-robin distribution of the rxqs, a different pmd will most likely 
> receive the hw-offloaded packets - not the same pmd that ran the slow-path 
> originally creating the flow.
> 
> It is usual to optimize caches etc. per pmd and that would not work then. 
> Maybe the further processing of the hw-offloaded packets does not need these 
> optimizations at the moment, however, IMHO I think we would be better off 
> using the first proposal above (use the same rxq as the one creating the 
> flow).
> 
> [Darrell] Several ideas have some validity.
>  However, this sounds reasonable and simple and we could 
> revisit as needed.
>  What do you think Yuanhan ?

Just want to make sure we are on the same page: do you mean the original
solution/workaround I mentioned in the cover letter: record the rxq at
recv and pass it down to flow creation?

If so, I'm okay with it.

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-31 Thread Yuanhan Liu
On Wed, Aug 30, 2017 at 04:23:53PM +, Darrell Ball wrote:
> One other general comment about the series:
> 
> It is not a ‘big deal’ at this point since we are discussing high level 
> concepts, but
> we would need to address these at some point anyways.
> 
> Could you run 
> 
> ./utilities/checkpatch.py 
> 
> before the next iteration

Sure.

--yliu
> 
> Thanks Darrell
> 
> 
> On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:
> 
> Hi,
> 
> Here is a joint work from Mellanox and Napatech, to enable the flow hw
> offload with the DPDK generic flow interface (rte_flow).
> 
> The basic idea is to associate the flow with a mark id (a unit32_t 
> number).
> Later, we then get the flow directly from the mark id, bypassing the 
> heavy
> emc processing, including miniflow_extract.
> 
> The association is done with CMAP in patch 1. It also resues the flow
> APIs introduced while adding the tc offloads. The emc bypassing is 
> done
> in patch 2. The flow offload is done in patch 4, which mainly does two
> things:
> 
> - translate the ovs match to DPDK rte flow patterns
> - bind those patterns with a MARK action.
> 
> Afterwards, the NIC will set the mark id in every pkt's mbuf when it
> matches the flow. That's basically how we could get the flow directly
> from the received mbuf.
> 
> While testing with PHY-PHY forwarding with one core and one queue, I 
> got
> almost 80% performance boost. For PHY-vhost forwarding, I got about 
> 50%
> performance boost.
> 
> 
> Though that being said, this patchset still has issues unresolved. The
> major issue is that maybe most NIC (for instance, Mellanox and Intel)
> can not support a pure MARK action. It has to be used together with a
> QUEUE action, which in turn needs a queue index. That comes to the 
> issue:
> the queue index is not given in the flow context. To make it work, 
> patch
> 5 just set the queue index to 0, which is obviously wrong. One 
> possible
> solution is to record the rxq and pass it down to the flow creation
> stage. It would be much better, but it's still far away from being 
> perfect.
> Because it might have changed the steering rules stealthily, which may
> break the default RSS setup by OVS-DPDK.
> 
> The reason I still want to send it out is to get more 
> comments/thoughts
> from community on this whole patchset. Meanwhile, I will try to 
> resolve
> the QUEUE action issue.
> 
> Note that it's disabled by default, which can be enabled by:
> 
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> 
> Thanks.
> 
> --yliu
> 
> ---
> Finn Christensen (3):
>   netdev-dpdk: implement flow put with rte flow
>   netdev-dpdk: retry with queue action
>   netdev-dpdk: set FDIR config
> 
> Yuanhan Liu (4):
>   dpif-netdev: associate flow with a mark id
>   dpif-netdev: retrieve flow directly from the flow mark
>   netdev-dpdk: convert ufid to dpdk flow
>   netdev-dpdk: remove offloaded flow on deletion
> 
>  lib/dp-packet.h   |  13 ++
>  lib/dpif-netdev.c | 105 +++
>  lib/netdev-dpdk.c | 511 
> +-
>  lib/netdev.h  |   6 +
>  4 files changed, 634 insertions(+), 1 deletion(-)
> 
> -- 
> 2.7.4
> 
> 
> 
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-30 Thread Darrell Ball


On 8/29/17, 7:25 PM, "Yuanhan Liu"  wrote:

On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote:
> 
> > Though that being said, this patchset still has issues 
unresolved. The
> > major issue is that maybe most NIC (for instance, Mellanox 
and Intel)
> > can not support a pure MARK action. It has to be used 
together with a
> > QUEUE action, which in turn needs a queue index. That comes 
to the issue:
> > the queue index is not given in the flow context. To make 
it work, patch
> > 5 just set the queue index to 0, which is obviously wrong. 
One possible
> > solution is to record the rxq and pass it down to the flow 
creation
> > stage. It would be much better, but it's still far away 
from being perfect.
> > Because it might have changed the steering rules 
stealthily, which may
> > break the default RSS setup by OVS-DPDK.
> > 
> > If this cannot be solved by removing this restriction, I guess 
another alternative is to actively
> > manage flow-queue associations.
> 
> do you mean let user provide the set_queue action?
> 
> I mean in the worst case, if the restriction cannot be lifted, we might 
to need to do flow distribution
> across queues with additional added logic/tracking,

Like the way mentioned above: record the rxq at recv stage and pass it
down to flow_put?

> because we would not really want to keep this restriction
> without a workaround.

For sure.

> This would be a fair bit of work, however.

Indeed. Another reason I don't really like above solution is it changes
the core logic of OVS (just for OVS-DPDK use case). What's worse, it
doesn't even solve the issue perfectly: it's still a workaround!

That being said, if we have to provide a workaround, I think maybe a
simple round-robin queue distribution is better, judging it's our first
attempt to have hw offload? Later, we could either improve the workaround,
or remove the restriction from DPDK side (if it's possible). Good to you?

> Alternatively, pushing this to the user in the general may be too much 
overhead; what do you think ?.
> As a user specification, it could be optional, however ?

Yes, that's what I was thinking. For the users that know their NIC supports
MARK with QUEUE action, they could not add such option. For others, it's
suggested to add it. Or, it's a must (if we do not do workaround like
above). Makes sense?

Hi Yuanhan

I responded to the last e-mail since there is some overlap in the discussion on 
some
points.



> > 
> > The reason I still want to send it out is to get more 
comments/thoughts
> > from community on this whole patchset. Meanwhile, I will 
try to resolve
> > the QUEUE action issue.
> > 
> > Note that it's disabled by default, which can be enabled by:
> > 
> > $ ovs-vsctl set Open_vSwitch . 
other_config:hw-offload=true
> > 
> > Maybe per in-port configuration would alleviate the issue to a 
certain degree.
> 
> Yes, it could be done. I choose it for following reasons:
> 
> - the option is already there, used by tc offloads.
> - it also simplifies the (first) patchset a bit, IMO.
> 
> Of course, I understand.
> 
> However, I'm okay with making it per port. What's your suggestion for
> this? Making "hw-offload" be port, or introducing another one? If so,
> what's your suggestion on the naming?
> 
> I am not suggesting to drop the global configuration
> Mainly we ‘consider’ additional per interface configuration because of 
the restriction with
> queue action we discuss. This reduces the scope of the queue remapping 
from what RSS would yield
> with HWOL.
> I would expect that when such configuration is done (if it were done), 
that
> typically multiple ports would be configured, since traffic flows 
bi-directionally at least.
> 
> If we were to do this, one of the possibilities would be something like:
> ovs-vsctl set Interface dpdk0 other_config:hw-offload=true

What's the scope between the global one and this one then? We simply
ignore the globle one for OVS-DPDK?

[Darrell]
If we were to have such an option (and it is a pretty big ‘if’), then more
specific scope (i.e. interface scope here) would eclipse global scope I think.
Maybe initially, we try to keep it as simple as possible, but I could see some 
other use cases
for interface level config., like HW resource contention for multiport NICs ?

> Thanks for the review. BTW, would you please add me in 'to' 

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-30 Thread Darrell Ball


On 8/30/17, 12:19 AM, "Finn Christensen" <f...@napatech.com> wrote:

Hi,

Just wanted to add one comment to this correspondence. See below.



Thanks,

Finn Christensen



-Original Message-

From: ovs-dev-boun...@openvswitch.org 
[mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Yuanhan Liu

Sent: 30. august 2017 04:25

To: Darrell Ball <db...@vmware.com>

Cc: d...@openvswitch.org
    
    Subject: Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow



On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote:

>

> > Though that being said, this patchset still has issues 
unresolved. The

> > major issue is that maybe most NIC (for instance, Mellanox 
and Intel)

> > can not support a pure MARK action. It has to be used 
together with a

> > QUEUE action, which in turn needs a queue index. That comes 
to the issue:

> > the queue index is not given in the flow context. To make 
it work, patch

> > 5 just set the queue index to 0, which is obviously wrong. 
One possible

> > solution is to record the rxq and pass it down to the flow 
creation

> > stage. It would be much better, but it's still far away 
from being perfect.

> > Because it might have changed the steering rules 
stealthily, which may

> > break the default RSS setup by OVS-DPDK.

> >

> > If this cannot be solved by removing this restriction, I guess 
another alternative is to actively

> > manage flow-queue associations.

>

> do you mean let user provide the set_queue action?

>

> I mean in the worst case, if the restriction cannot be lifted, we

> might to need to do flow distribution across queues with additional

> added logic/tracking,



Like the way mentioned above: record the rxq at recv stage and pass it down 
to flow_put?



> because we would not really want to keep this restriction without a

> workaround.



For sure.



> This would be a fair bit of work, however.



Indeed. Another reason I don't really like above solution is it changes the 
core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even 
solve the issue perfectly: it's still a workaround!



That being said, if we have to provide a workaround, I think maybe a simple 
round-robin queue distribution is better, judging it's our first attempt to 
have hw offload? Later, we could either improve the workaround, or remove the 
restriction from DPDK side (if it's possible). Good to you?



[Finn]

I think we should not further intermix the rxqs distributed to different 
pmd's, other than initially configured, when setting up hw-offload. If we make 
a round-robin distribution of the rxqs, a different pmd will most likely 
receive the hw-offloaded packets - not the same pmd that ran the slow-path 
originally creating the flow.

It is usual to optimize caches etc. per pmd and that would not work then. 
Maybe the further processing of the hw-offloaded packets does not need these 
optimizations at the moment, however, IMHO I think we would be better off using 
the first proposal above (use the same rxq as the one creating the flow).

[Darrell] Several ideas have some validity.
 However, this sounds reasonable and simple and we could 
revisit as needed.
 What do you think Yuanhan ?


We will not be able to do RSS on a specific hw-offloaded flow anyway. That 
would require a larger rework of the RSS configuration in OVS, as Darell points 
out.





> Alternatively, pushing this to the user in the general may be too much 
overhead; what do you think ?.

> As a user specification, it could be optional, however ?



Yes, that's what I was thinking. For the users that know their NIC supports 
MARK with QUEUE action, they could not add such option. For others, it's 
suggested to add it. Or, it's a must (if we do not do workaround like above). 
Makes sense?



> >

> > The reason I still want to send it out is to get more 
comments/thoughts

> > from community on this whole patchset. Meanwhile, I will 
try to resolve

> > the QUEUE action issue.

> >

> > Note that it's disabled by default, which can be enabled by:

> >
   

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-30 Thread Darrell Ball
One other general comment about the series:

It is not a ‘big deal’ at this point since we are discussing high level 
concepts, but
we would need to address these at some point anyways.

Could you run 

./utilities/checkpatch.py 

before the next iteration

Thanks Darrell


On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:

Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t 
number).
Later, we then get the flow directly from the mark id, bypassing the 
heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also resues the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
almost 80% performance boost. For PHY-vhost forwarding, I got about 50%
performance boost.


Though that being said, this patchset still has issues unresolved. The
major issue is that maybe most NIC (for instance, Mellanox and Intel)
can not support a pure MARK action. It has to be used together with a
QUEUE action, which in turn needs a queue index. That comes to the 
issue:
the queue index is not given in the flow context. To make it work, patch
5 just set the queue index to 0, which is obviously wrong. One possible
solution is to record the rxq and pass it down to the flow creation
stage. It would be much better, but it's still far away from being 
perfect.
Because it might have changed the steering rules stealthily, which may
break the default RSS setup by OVS-DPDK.

The reason I still want to send it out is to get more comments/thoughts
from community on this whole patchset. Meanwhile, I will try to resolve
the QUEUE action issue.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Thanks.

--yliu

---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  13 ++
 lib/dpif-netdev.c | 105 +++
 lib/netdev-dpdk.c | 511 
+-
 lib/netdev.h  |   6 +
 4 files changed, 634 insertions(+), 1 deletion(-)

-- 
2.7.4





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-30 Thread Finn Christensen
Hi,
Just wanted to add one comment to this correspondence. See below.

Thanks,
Finn Christensen

-Original Message-
From: ovs-dev-boun...@openvswitch.org [mailto:ovs-dev-boun...@openvswitch.org] 
On Behalf Of Yuanhan Liu
Sent: 30. august 2017 04:25
To: Darrell Ball <db...@vmware.com>
Cc: d...@openvswitch.org
Subject: Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote:
> 
> > Though that being said, this patchset still has issues 
> unresolved. The
> > major issue is that maybe most NIC (for instance, Mellanox and 
> Intel)
> > can not support a pure MARK action. It has to be used together 
> with a
> > QUEUE action, which in turn needs a queue index. That comes to 
> the issue:
> > the queue index is not given in the flow context. To make it 
> work, patch
> > 5 just set the queue index to 0, which is obviously wrong. One 
> possible
> > solution is to record the rxq and pass it down to the flow 
> creation
> > stage. It would be much better, but it's still far away from 
> being perfect.
> > Because it might have changed the steering rules stealthily, 
> which may
> > break the default RSS setup by OVS-DPDK.
> > 
> > If this cannot be solved by removing this restriction, I guess another 
> alternative is to actively
> > manage flow-queue associations.
> 
> do you mean let user provide the set_queue action?
> 
> I mean in the worst case, if the restriction cannot be lifted, we 
> might to need to do flow distribution across queues with additional 
> added logic/tracking,

Like the way mentioned above: record the rxq at recv stage and pass it down to 
flow_put?

> because we would not really want to keep this restriction without a 
> workaround.

For sure.

> This would be a fair bit of work, however.

Indeed. Another reason I don't really like above solution is it changes the 
core logic of OVS (just for OVS-DPDK use case). What's worse, it doesn't even 
solve the issue perfectly: it's still a workaround!

That being said, if we have to provide a workaround, I think maybe a simple 
round-robin queue distribution is better, judging it's our first attempt to 
have hw offload? Later, we could either improve the workaround, or remove the 
restriction from DPDK side (if it's possible). Good to you?

[Finn]
I think we should not further intermix the rxqs distributed to different pmd's, 
other than initially configured, when setting up hw-offload. If we make a 
round-robin distribution of the rxqs, a different pmd will most likely receive 
the hw-offloaded packets - not the same pmd that ran the slow-path originally 
creating the flow.
It is usual to optimize caches etc. per pmd and that would not work then. Maybe 
the further processing of the hw-offloaded packets does not need these 
optimizations at the moment, however, IMHO I think we would be better off using 
the first proposal above (use the same rxq as the one creating the flow). 
We will not be able to do RSS on a specific hw-offloaded flow anyway. That 
would require a larger rework of the RSS configuration in OVS, as Darell points 
out.


> Alternatively, pushing this to the user in the general may be too much 
> overhead; what do you think ?.
> As a user specification, it could be optional, however ?

Yes, that's what I was thinking. For the users that know their NIC supports 
MARK with QUEUE action, they could not add such option. For others, it's 
suggested to add it. Or, it's a must (if we do not do workaround like above). 
Makes sense?

> > 
> > The reason I still want to send it out is to get more 
> comments/thoughts
> > from community on this whole patchset. Meanwhile, I will try to 
> resolve
> > the QUEUE action issue.
> > 
> > Note that it's disabled by default, which can be enabled by:
> > 
> > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> > 
> > Maybe per in-port configuration would alleviate the issue to a certain 
> degree.
> 
> Yes, it could be done. I choose it for following reasons:
> 
> - the option is already there, used by tc offloads.
> - it also simplifies the (first) patchset a bit, IMO.
> 
> Of course, I understand.
> 
> However, I'm okay with making it per port. What's your suggestion for
> this? Making "hw-offload" be port, or introducing another one? If so,
> what's your suggestion on the naming?
> 
> I am not suggesting to drop the global configuration Mainly w

Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-29 Thread Yuanhan Liu
On Wed, Aug 30, 2017 at 01:32:10AM +, Darrell Ball wrote:
> 
> > Though that being said, this patchset still has issues 
> unresolved. The
> > major issue is that maybe most NIC (for instance, Mellanox and 
> Intel)
> > can not support a pure MARK action. It has to be used together 
> with a
> > QUEUE action, which in turn needs a queue index. That comes to 
> the issue:
> > the queue index is not given in the flow context. To make it 
> work, patch
> > 5 just set the queue index to 0, which is obviously wrong. One 
> possible
> > solution is to record the rxq and pass it down to the flow 
> creation
> > stage. It would be much better, but it's still far away from 
> being perfect.
> > Because it might have changed the steering rules stealthily, 
> which may
> > break the default RSS setup by OVS-DPDK.
> > 
> > If this cannot be solved by removing this restriction, I guess another 
> alternative is to actively
> > manage flow-queue associations.
> 
> do you mean let user provide the set_queue action?
> 
> I mean in the worst case, if the restriction cannot be lifted, we might to 
> need to do flow distribution
> across queues with additional added logic/tracking,

Like the way mentioned above: record the rxq at recv stage and pass it
down to flow_put?

> because we would not really want to keep this restriction
> without a workaround.

For sure.

> This would be a fair bit of work, however.

Indeed. Another reason I don't really like above solution is it changes
the core logic of OVS (just for OVS-DPDK use case). What's worse, it
doesn't even solve the issue perfectly: it's still a workaround!

That being said, if we have to provide a workaround, I think maybe a
simple round-robin queue distribution is better, judging it's our first
attempt to have hw offload? Later, we could either improve the workaround,
or remove the restriction from DPDK side (if it's possible). Good to you?

> Alternatively, pushing this to the user in the general may be too much 
> overhead; what do you think ?.
> As a user specification, it could be optional, however ?

Yes, that's what I was thinking. For the users that know their NIC supports
MARK with QUEUE action, they could not add such option. For others, it's
suggested to add it. Or, it's a must (if we do not do workaround like
above). Makes sense?

> > 
> > The reason I still want to send it out is to get more 
> comments/thoughts
> > from community on this whole patchset. Meanwhile, I will try to 
> resolve
> > the QUEUE action issue.
> > 
> > Note that it's disabled by default, which can be enabled by:
> > 
> > $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> > 
> > Maybe per in-port configuration would alleviate the issue to a certain 
> degree.
> 
> Yes, it could be done. I choose it for following reasons:
> 
> - the option is already there, used by tc offloads.
> - it also simplifies the (first) patchset a bit, IMO.
> 
> Of course, I understand.
> 
> However, I'm okay with making it per port. What's your suggestion for
> this? Making "hw-offload" be port, or introducing another one? If so,
> what's your suggestion on the naming?
> 
> I am not suggesting to drop the global configuration
> Mainly we ‘consider’ additional per interface configuration because of the 
> restriction with
> queue action we discuss. This reduces the scope of the queue remapping from 
> what RSS would yield
> with HWOL.
> I would expect that when such configuration is done (if it were done), that
> typically multiple ports would be configured, since traffic flows 
> bi-directionally at least.
> 
> If we were to do this, one of the possibilities would be something like:
> ovs-vsctl set Interface dpdk0 other_config:hw-offload=true

What's the scope between the global one and this one then? We simply
ignore the globle one for OVS-DPDK?

> Thanks for the review. BTW, would you please add me in 'to' or 'cc'
> list while replying to me?  Otherwise, it's easy to get missed: too
> many emails :/
> 
> of course

Thank you!

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-29 Thread Yuanhan Liu
On Tue, Aug 29, 2017 at 07:11:42AM +, Darrell Ball wrote:
> 
> On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:
> 
> Hi,
> 
> Here is a joint work from Mellanox and Napatech, to enable the flow hw
> offload with the DPDK generic flow interface (rte_flow).
> 
> The basic idea is to associate the flow with a mark id (a unit32_t 
> number).
> Later, we then get the flow directly from the mark id, bypassing the 
> heavy
> emc processing, including miniflow_extract.
> 
> The association is done with CMAP in patch 1. It also resues the flow
> APIs introduced while adding the tc offloads. The emc bypassing is 
> done
> in patch 2. The flow offload is done in patch 4, which mainly does two
> things:
> 
> - translate the ovs match to DPDK rte flow patterns
> - bind those patterns with a MARK action.
> 
> Afterwards, the NIC will set the mark id in every pkt's mbuf when it
> matches the flow. That's basically how we could get the flow directly
> from the received mbuf.
> 
> While testing with PHY-PHY forwarding with one core and one queue, I 
> got
> almost 80% performance boost. For PHY-vhost forwarding, I got about 
> 50%
> performance boost.
> 
> 
> Though that being said, this patchset still has issues unresolved. The
> major issue is that maybe most NIC (for instance, Mellanox and Intel)
> can not support a pure MARK action. It has to be used together with a
> QUEUE action, which in turn needs a queue index. That comes to the 
> issue:
> the queue index is not given in the flow context. To make it work, 
> patch
> 5 just set the queue index to 0, which is obviously wrong. One 
> possible
> solution is to record the rxq and pass it down to the flow creation
> stage. It would be much better, but it's still far away from being 
> perfect.
> Because it might have changed the steering rules stealthily, which may
> break the default RSS setup by OVS-DPDK.
> 
> If this cannot be solved by removing this restriction, I guess another 
> alternative is to actively
> manage flow-queue associations.

do you mean let user provide the set_queue action?

> 
> The reason I still want to send it out is to get more 
> comments/thoughts
> from community on this whole patchset. Meanwhile, I will try to 
> resolve
> the QUEUE action issue.
> 
> Note that it's disabled by default, which can be enabled by:
> 
> $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> 
> Maybe per in-port configuration would alleviate the issue to a certain degree.

Yes, it could be done. I choose it for following reasons:

- the option is already there, used by tc offloads.
- it also simplifies the (first) patchset a bit, IMO.

However, I'm okay with making it per port. What's your suggestion for
this? Making "hw-offload" be port, or introducing another one? If so,
what's your suggestion on the naming?

Thanks for the review. BTW, would you please add me in 'to' or 'cc'
list while replying to me?  Otherwise, it's easy to get missed: too
many emails :/

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-29 Thread Darrell Ball

On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:

Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t 
number).
Later, we then get the flow directly from the mark id, bypassing the 
heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also resues the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
almost 80% performance boost. For PHY-vhost forwarding, I got about 50%
performance boost.


Though that being said, this patchset still has issues unresolved. The
major issue is that maybe most NIC (for instance, Mellanox and Intel)
can not support a pure MARK action. It has to be used together with a
QUEUE action, which in turn needs a queue index. That comes to the 
issue:
the queue index is not given in the flow context. To make it work, patch
5 just set the queue index to 0, which is obviously wrong. One possible
solution is to record the rxq and pass it down to the flow creation
stage. It would be much better, but it's still far away from being 
perfect.
Because it might have changed the steering rules stealthily, which may
break the default RSS setup by OVS-DPDK.

If this cannot be solved by removing this restriction, I guess another 
alternative is to actively
manage flow-queue associations.


The reason I still want to send it out is to get more comments/thoughts
from community on this whole patchset. Meanwhile, I will try to resolve
the QUEUE action issue.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Maybe per in-port configuration would alleviate the issue to a certain degree.


Thanks.

--yliu

---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  13 ++
 lib/dpif-netdev.c | 105 +++
 lib/netdev-dpdk.c | 511 
+-
 lib/netdev.h  |   6 +
 4 files changed, 634 insertions(+), 1 deletion(-)

-- 
2.7.4





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-25 Thread Darrell Ball
Thank you for this contribution
It is a big step forward for HWOL; even partial offload for these patches have 
a significant benefit.
I had done a first pass on Wednesday when the patches came out but I’ll send 
comments out on Monday 

On 8/22/17, 11:24 PM, "Yuanhan Liu"  wrote:

Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also resues the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
almost 80% performance boost. For PHY-vhost forwarding, I got about 50%
performance boost.


Though that being said, this patchset still has issues unresolved. The
major issue is that maybe most NIC (for instance, Mellanox and Intel)
can not support a pure MARK action. It has to be used together with a
QUEUE action, which in turn needs a queue index. That comes to the issue:
the queue index is not given in the flow context. To make it work, patch
5 just set the queue index to 0, which is obviously wrong. One possible
solution is to record the rxq and pass it down to the flow creation
stage. It would be much better, but it's still far away from being perfect.
Because it might have changed the steering rules stealthily, which may
break the default RSS setup by OVS-DPDK.


The reason I still want to send it out is to get more comments/thoughts
from community on this whole patchset. Meanwhile, I will try to resolve
the QUEUE action issue.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Thanks.

--yliu

---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  13 ++
 lib/dpif-netdev.c | 105 +++
 lib/netdev-dpdk.c | 511 
+-
 lib/netdev.h  |   6 +
 4 files changed, 634 insertions(+), 1 deletion(-)

-- 
2.7.4



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/7 RFC] OVS-DPDK flow offload with rte_flow

2017-08-23 Thread Yuanhan Liu
Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also resues the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
almost 80% performance boost. For PHY-vhost forwarding, I got about 50%
performance boost.


Though that being said, this patchset still has issues unresolved. The
major issue is that maybe most NIC (for instance, Mellanox and Intel)
can not support a pure MARK action. It has to be used together with a
QUEUE action, which in turn needs a queue index. That comes to the issue:
the queue index is not given in the flow context. To make it work, patch
5 just set the queue index to 0, which is obviously wrong. One possible
solution is to record the rxq and pass it down to the flow creation
stage. It would be much better, but it's still far away from being perfect.
Because it might have changed the steering rules stealthily, which may
break the default RSS setup by OVS-DPDK.

The reason I still want to send it out is to get more comments/thoughts
from community on this whole patchset. Meanwhile, I will try to resolve
the QUEUE action issue.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Thanks.

--yliu

---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  13 ++
 lib/dpif-netdev.c | 105 +++
 lib/netdev-dpdk.c | 511 +-
 lib/netdev.h  |   6 +
 4 files changed, 634 insertions(+), 1 deletion(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev