Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-11-01 Thread Ben Pfaff
On Wed, Nov 01, 2017 at 11:24:30PM +0800, Yuanhan Liu wrote:
> On Tue, Oct 31, 2017 at 02:19:20PM -0700, Ben Pfaff wrote:
> > I see that this series appears to have stalled.  Should we expect a new
> > version sometime soon?  Do you need more reviews on the current version?
> 
> My bad. As I have stated in another email, I meant to send v4 earlier, but
> I found a bug. Meanwhile, I was heavily occupied by something else, that I
> don't have time to shape it.
> 
> Likely, I could be able to start working on v4 at mid of this month.

There's no hurry from my perspective, my main goal was to make sure that
I/we hadn't dropped the ball.  I'll look forward to the next version.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-11-01 Thread Yuanhan Liu
On Tue, Oct 31, 2017 at 02:19:20PM -0700, Ben Pfaff wrote:
> I see that this series appears to have stalled.  Should we expect a new
> version sometime soon?  Do you need more reviews on the current version?

My bad. As I have stated in another email, I meant to send v4 earlier, but
I found a bug. Meanwhile, I was heavily occupied by something else, that I
don't have time to shape it.

Likely, I could be able to start working on v4 at mid of this month.

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-10-31 Thread Ben Pfaff
I see that this series appears to have stalled.  Should we expect a new
version sometime soon?  Do you need more reviews on the current version?
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-10-18 Thread Yuanhan Liu
On Wed, Sep 27, 2017 at 10:09:20AM +0800, Yuanhan Liu wrote:
> On Wed, Sep 27, 2017 at 09:24:54AM +0800, Yuanhan Liu wrote:
> > > The support needed would be in some drivers and seems reasonably doable. 
> > > Moreover, this was discussed in the last dpdk meeting and the support was
> > > indicated as existing?, although I only verified the MLX code, myself.
> > 
> > >From the code point of view, yes, the code was there months ago.
> > 
> > > I had seen the MLX code supporting _RSS action and there are some checks 
> > > for
> > > supported cases; when you say “it didn't work”, what was the issue ?
> > 
> > I actually have met the error months ago, I even debugged it. IIRC,
> > the error is from libibverbs, when trying to create the hw flow. I
> > think I need double-confirm it with our colleague who developed this
> > feature.
> 
> Hmm, I think I was wrong, and apparenetly, I think I have did something
> stupid in the last try. I double confirmed it just now, at least the
> flow creation now works. I just need take some time to intergrate it
> to OVS-DPDK, to see how it goes. I will get it back to you when I have
> done that. Currently, I was occupied by something else.

Here is a short update I should have done in last week (apologize for
being late).

The RSS action should work. I also have intergated it to ovs-dpdk. I
intended to send v4 in last week. Later, I noticed a fatal bug (regarding
to MQ).

I was still heavily occupied by something else, that I don't have time
to fix that. I have tried to figure out a solution at spare time though.
I might have found a solution. I just don't know the results yet without
really trying it.

I will update progress when I have made some. Please also note that
it's very likely I won't have time on that before the end of this month :/

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-27 Thread Darrell Ball


On 9/27/17, 3:41 AM, "Finn Christensen"  wrote:

-Original Message-
From: Yuanhan Liu [mailto:y...@fridaylinux.org]
Sent: 27. september 2017 11:47
To: Finn Christensen 
Cc: Darrell Ball ; d...@openvswitch.org; Chandran
Sugesh ; Simon Horman

Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

On Wed, Sep 27, 2017 at 09:12:49AM +, Finn Christensen wrote:
>
> [Darrell] It looks fine; of course, if we could drop dependencies 
on cap
> probe, it would be ideal (.
>
>
> [Finn]
> From a Napatech point of view, we would definitely want to use the
> RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes
> good sense to me. I have 2 reasons for this:
> 1. It would fit nicely into the way Napatech does flow programming in
> nic 2. We would be fully RSS controlled through OVS Furthermore,
> Future RSS splitting on a per megaflow basis will be possible.
> IMO the "pure_mark_cap_probed" function is a temp work-around to
make it work.
> The RSS seems to be a solution to the queue problem.

Finn, that's really good to know! I do agree without this probe, it 
makes the
code simpler and cleaner.

Few questions though. Have Napatech already implemented rss action? If
not, what's the plan?

[Finn]
Our nic handles rss on a per flow basis, but we have not yet used rte rss 
action
for OVS. In OVS we simply handles RSS config outside it.
The full control of rss through OVS is better though.

And are you okay with following code?

add_mark_action();
add_rss_action_unconditionally();

flow = rte_create_flow(...);
if (!flow)
return -1;

That is, no probes, no feedbacks. If it failed, it just failed (flow 
offload then
is just skipped).

[Finn]
Yes, we can easily make this work. Good suggestion!


But I do think some feedbacks are necessary. If we know the rte_flow cap
in the begnning, we could simply disable the flow offload for a 
specific port
if we know it doesn't have offload ability. This would avoid the repeat
effort of trying to do hw offload completely.

[Finn]
This seems to be a good idea.

[Darrell] Regarding the general question of capability checking, it is fine to 
have this support
   in general and we already identified some cases where it would 
be best to use this
   if we can (the question mostly comes down to support by RTE and 
drivers).

  Probing is different and we usually use the term for some try and 
error checking to configure
  something during initialization and see if works and then mark 
the associated capability as enabled
 or disabled.  We could also use a different kind of probing here 
in a dynamic fashion, where we try to do
 HWOL and if it fails X times we don’t try again unless we detect 
the port has been reconfigured,
 for example, in case the capability check is not possible or not 
implemented yet.




--yliu
Disclaimer: This email and any files transmitted with it may contain 
confidential information intended for the addressee(s) only. The information is 
not to be surrendered or copied to unauthorized persons. If you have received 
this communication in error, please notify the sender immediately and delete 
this e-mail from your system.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-27 Thread Finn Christensen
-Original Message-
From: Yuanhan Liu [mailto:y...@fridaylinux.org]
Sent: 27. september 2017 11:47
To: Finn Christensen 
Cc: Darrell Ball ; d...@openvswitch.org; Chandran
Sugesh ; Simon Horman

Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

On Wed, Sep 27, 2017 at 09:12:49AM +, Finn Christensen wrote:
>
> [Darrell] It looks fine; of course, if we could drop dependencies on 
cap
> probe, it would be ideal (.
>
>
> [Finn]
> From a Napatech point of view, we would definitely want to use the
> RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes
> good sense to me. I have 2 reasons for this:
> 1. It would fit nicely into the way Napatech does flow programming in
> nic 2. We would be fully RSS controlled through OVS Furthermore,
> Future RSS splitting on a per megaflow basis will be possible.
> IMO the "pure_mark_cap_probed" function is a temp work-around to
make it work.
> The RSS seems to be a solution to the queue problem.

Finn, that's really good to know! I do agree without this probe, it makes 
the
code simpler and cleaner.

Few questions though. Have Napatech already implemented rss action? If
not, what's the plan? 

[Finn]
Our nic handles rss on a per flow basis, but we have not yet used rte rss 
action 
for OVS. In OVS we simply handles RSS config outside it.
The full control of rss through OVS is better though.

And are you okay with following code?

add_mark_action();
add_rss_action_unconditionally();

flow = rte_create_flow(...);
if (!flow)
return -1;

That is, no probes, no feedbacks. If it failed, it just failed (flow 
offload then
is just skipped).

[Finn]
Yes, we can easily make this work. Good suggestion!


But I do think some feedbacks are necessary. If we know the rte_flow cap
in the begnning, we could simply disable the flow offload for a specific 
port
if we know it doesn't have offload ability. This would avoid the repeat
effort of trying to do hw offload completely.

[Finn]
This seems to be a good idea.


--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-27 Thread Yuanhan Liu
On Wed, Sep 27, 2017 at 09:12:49AM +, Finn Christensen wrote:
> 
> [Darrell] It looks fine; of course, if we could drop dependencies on cap
> probe, it would be ideal (.
> 
> 
> [Finn]
> From a Napatech point of view, we would definitely want to use the
> RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes good
> sense to me. I have 2 reasons for this:
> 1. It would fit nicely into the way Napatech does flow programming in nic
> 2. We would be fully RSS controlled through OVS
> Furthermore, Future RSS splitting on a per megaflow basis will be possible.
> IMO the "pure_mark_cap_probed" function is a temp work-around to make it work.
> The RSS seems to be a solution to the queue problem.

Finn, that's really good to know! I do agree without this probe, it
makes the code simpler and cleaner.

Few questions though. Have Napatech already implemented rss action? If not,
what's the plan? And are you okay with following code?

add_mark_action();
add_rss_action_unconditionally();

flow = rte_create_flow(...);
if (!flow)
return -1;

That is, no probes, no feedbacks. If it failed, it just failed (flow
offload then is just skipped).

But I do think some feedbacks are necessary. If we know the rte_flow
cap in the begnning, we could simply disable the flow offload for a
specific port if we know it doesn't have offload ability. This would
avoid the repeat effort of trying to do hw offload completely.

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-27 Thread Finn Christensen
-Original Message-
From: Darrell Ball [mailto:db...@vmware.com]
Sent: 27. september 2017 05:54
To: Yuanhan Liu 
Cc: d...@openvswitch.org; Finn Christensen ;
Chandran Sugesh ; Simon Horman

Subject: Re: [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow



On 9/26/17, 8:27 PM, "Yuanhan Liu"  wrote:

On Wed, Sep 27, 2017 at 03:13:33AM +, Darrell Ball wrote:
>
>
> On 9/26/17, 6:25 PM, "Yuanhan Liu"  wrote:
>
> On Tue, Sep 26, 2017 at 09:58:16PM +, Darrell Ball wrote:
> > The second major change is there is a thread introduced to 
do
the real
> > flow offload. This is for diminishing the overhead by hw 
flow
offload
> > installation/deletion at data path. See patch 9 for more 
detailed
info.
> >
> > [Darrell] There might be other options to handle this such as
dividing time
> > to HWOL in a given PMD.
> > Another option to have PMD isolation.
>
> Good to know, though I'm not sure I understand it so far. But it 
can
be
> discussed later. That's also the reason I put it in the last 
patch, so
> that we could easily turn it to another solution, or even simpler 
(just
> remove it).
>
> > In the last discussion, RSS action was suggested to use to 
get rid
of
> > the QUEUE action workaround. Unfortunately, it didn't work. 
The
flow
> > creation failed with MLX5 PMD driver, and that's the only 
driver in
> > DPDK that supports RSS action so far.
> >
> >
> > [Darrell]
> > I wonder if we should take a pause before jumping into the next
version of the code.
>
> I have no objections.
>
> > If workarounds are required in OVS code, then so be it as long 
as
they are not
> > overly disruptive to the existing code and hard to maintain.
> > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a
reasonable option
> > to avoid some unpleasant OVS workarounds.
> > This would make a significant difference in the code paths, if 
we
supported it, so
> > we need to be sure as early as possible.
>
> I agree.
>
> > The support needed would be in some drivers and seems
reasonably doable.
> > Moreover, this was discussed in the last dpdk meeting and the
support was
> > indicated as existing?, although I only verified the MLX code,
myself.
>
> From the code point of view, yes, the code was there months ago.
>
> > I had seen the MLX code supporting _RSS action and there are
some checks for
> > supported cases; when you say “it didn't work”, what was the 
issue
?
>
> I actually have met the error months ago, I even debugged it. 
IIRC,
> the error is from libibverbs, when trying to create the hw flow. I
> think I need double-confirm it with our colleague who developed 
this
> feature.
>
> > Let us have a discussion also about the Intel nic side and the
Napatech side.
>
> I think it's not necessary for Napatech, because they can support
> MARK only action.
>
> It is not necessary for Napatech; however to avoid the need to detect
the Napatech
> special (good) case, ‘ideally’ we do the exact same programming even
in Napatech case.

Will following look okay to you (assuming we have the probe ability and
DPDK
has some basic capability feedback)?

if (!pure_mark_cap_probed) {
if (dpdk_rte_flow_has_rss_action_support) {
append_rss_action();
} else {
fail and return;
}
}

/* create flow once, with no retries, if fails, let it fail */
flow = rte_flow_create(...);

I assume that's how the code looks like finally. What do you think?


[Darrell] It looks fine; of course, if we could drop dependencies on cap
probe, it would be ideal (.


[Finn]
From a Napatech point of view, we would definitely want to use the 
RTE_FLOW_ACTION_TYPE_RSS if it were implement. It definitely makes good
sense to me. I have 2 reasons for this: 
1. It would fit nicely into the way Napatech does flow programming in nic
2. We would be fully RSS controlled through OVS
Furthermore, 

Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Darrell Ball


On 9/26/17, 8:27 PM, "Yuanhan Liu"  wrote:

On Wed, Sep 27, 2017 at 03:13:33AM +, Darrell Ball wrote:
> 
> 
> On 9/26/17, 6:25 PM, "Yuanhan Liu"  wrote:
> 
> On Tue, Sep 26, 2017 at 09:58:16PM +, Darrell Ball wrote:
> > The second major change is there is a thread introduced to do 
the real
> > flow offload. This is for diminishing the overhead by hw flow 
offload
> > installation/deletion at data path. See patch 9 for more 
detailed info.
> > 
> > [Darrell] There might be other options to handle this such as 
dividing time
> > to HWOL in a given PMD. 
> > Another option to have PMD isolation.
> 
> Good to know, though I'm not sure I understand it so far. But it can 
be
> discussed later. That's also the reason I put it in the last patch, so
> that we could easily turn it to another solution, or even simpler 
(just
> remove it).
> 
> > In the last discussion, RSS action was suggested to use to get 
rid of
> > the QUEUE action workaround. Unfortunately, it didn't work. The 
flow
> > creation failed with MLX5 PMD driver, and that's the only 
driver in
> > DPDK that supports RSS action so far.
> > 
> > 
> > [Darrell] 
> > I wonder if we should take a pause before jumping into the next 
version of the code.
> 
> I have no objections.
> 
> > If workarounds are required in OVS code, then so be it as long as 
they are not
> > overly disruptive to the existing code and hard to maintain.
> > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable 
option
> > to avoid some unpleasant OVS workarounds.
> > This would make a significant difference in the code paths, if we 
supported it, so
> > we need to be sure as early as possible.
> 
> I agree.
> 
> > The support needed would be in some drivers and seems reasonably 
doable. 
> > Moreover, this was discussed in the last dpdk meeting and the 
support was
> > indicated as existing?, although I only verified the MLX code, 
myself.
> 
> From the code point of view, yes, the code was there months ago.
> 
> > I had seen the MLX code supporting _RSS action and there are some 
checks for
> > supported cases; when you say “it didn't work”, what was the issue ?
> 
> I actually have met the error months ago, I even debugged it. IIRC,
> the error is from libibverbs, when trying to create the hw flow. I
> think I need double-confirm it with our colleague who developed this
> feature.
> 
> > Let us have a discussion also about the Intel nic side and the 
Napatech side.
> 
> I think it's not necessary for Napatech, because they can support
> MARK only action.
> 
> It is not necessary for Napatech; however to avoid the need to detect the 
Napatech
> special (good) case, ‘ideally’ we do the exact same programming even in 
Napatech case.

Will following look okay to you (assuming we have the probe ability and DPDK
has some basic capability feedback)?

if (!pure_mark_cap_probed) {
if (dpdk_rte_flow_has_rss_action_support) {
append_rss_action();
} else {
fail and return;
}
}

/* create flow once, with no retries, if fails, let it fail */
flow = rte_flow_create(...);

I assume that's how the code looks like finally. What do you think?


[Darrell] It looks fine; of course, if we could drop dependencies on cap probe, 
it would be ideal (.  


--yliu

> For Intel, yes, I think Intel folks could give some comments here.
> 
> > It seems reasonable to ask where the disconnect is and whether this 
support
> > can be added and then make a decision based on the answers. 
> 
> No objections.
> 
>   --yliu
> 
> 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Yuanhan Liu
On Wed, Sep 27, 2017 at 03:13:33AM +, Darrell Ball wrote:
> 
> 
> On 9/26/17, 6:25 PM, "Yuanhan Liu"  wrote:
> 
> On Tue, Sep 26, 2017 at 09:58:16PM +, Darrell Ball wrote:
> > The second major change is there is a thread introduced to do the 
> real
> > flow offload. This is for diminishing the overhead by hw flow 
> offload
> > installation/deletion at data path. See patch 9 for more detailed 
> info.
> > 
> > [Darrell] There might be other options to handle this such as dividing 
> time
> > to HWOL in a given PMD. 
> > Another option to have PMD isolation.
> 
> Good to know, though I'm not sure I understand it so far. But it can be
> discussed later. That's also the reason I put it in the last patch, so
> that we could easily turn it to another solution, or even simpler (just
> remove it).
> 
> > In the last discussion, RSS action was suggested to use to get rid 
> of
> > the QUEUE action workaround. Unfortunately, it didn't work. The flow
> > creation failed with MLX5 PMD driver, and that's the only driver in
> > DPDK that supports RSS action so far.
> > 
> > 
> > [Darrell] 
> > I wonder if we should take a pause before jumping into the next version 
> of the code.
> 
> I have no objections.
> 
> > If workarounds are required in OVS code, then so be it as long as they 
> are not
> > overly disruptive to the existing code and hard to maintain.
> > In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable 
> option
> > to avoid some unpleasant OVS workarounds.
> > This would make a significant difference in the code paths, if we 
> supported it, so
> > we need to be sure as early as possible.
> 
> I agree.
> 
> > The support needed would be in some drivers and seems reasonably 
> doable. 
> > Moreover, this was discussed in the last dpdk meeting and the support 
> was
> > indicated as existing?, although I only verified the MLX code, myself.
> 
> From the code point of view, yes, the code was there months ago.
> 
> > I had seen the MLX code supporting _RSS action and there are some 
> checks for
> > supported cases; when you say “it didn't work”, what was the issue ?
> 
> I actually have met the error months ago, I even debugged it. IIRC,
> the error is from libibverbs, when trying to create the hw flow. I
> think I need double-confirm it with our colleague who developed this
> feature.
> 
> > Let us have a discussion also about the Intel nic side and the Napatech 
> side.
> 
> I think it's not necessary for Napatech, because they can support
> MARK only action.
> 
> It is not necessary for Napatech; however to avoid the need to detect the 
> Napatech
> special (good) case, ‘ideally’ we do the exact same programming even in 
> Napatech case.

Will following look okay to you (assuming we have the probe ability and DPDK
has some basic capability feedback)?

if (!pure_mark_cap_probed) {
if (dpdk_rte_flow_has_rss_action_support) {
append_rss_action();
} else {
fail and return;
}
}

/* create flow once, with no retries, if fails, let it fail */
flow = rte_flow_create(...);

I assume that's how the code looks like finally. What do you think?

--yliu

> For Intel, yes, I think Intel folks could give some comments here.
> 
> > It seems reasonable to ask where the disconnect is and whether this 
> support
> > can be added and then make a decision based on the answers. 
> 
> No objections.
> 
>   --yliu
> 
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Darrell Ball


On 9/26/17, 6:25 PM, "Yuanhan Liu"  wrote:

On Tue, Sep 26, 2017 at 09:58:16PM +, Darrell Ball wrote:
> The second major change is there is a thread introduced to do the real
> flow offload. This is for diminishing the overhead by hw flow offload
> installation/deletion at data path. See patch 9 for more detailed 
info.
> 
> [Darrell] There might be other options to handle this such as dividing 
time
> to HWOL in a given PMD. 
> Another option to have PMD isolation.

Good to know, though I'm not sure I understand it so far. But it can be
discussed later. That's also the reason I put it in the last patch, so
that we could easily turn it to another solution, or even simpler (just
remove it).

> In the last discussion, RSS action was suggested to use to get rid of
> the QUEUE action workaround. Unfortunately, it didn't work. The flow
> creation failed with MLX5 PMD driver, and that's the only driver in
> DPDK that supports RSS action so far.
> 
> 
> [Darrell] 
> I wonder if we should take a pause before jumping into the next version 
of the code.

I have no objections.

> If workarounds are required in OVS code, then so be it as long as they 
are not
> overly disruptive to the existing code and hard to maintain.
> In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option
> to avoid some unpleasant OVS workarounds.
> This would make a significant difference in the code paths, if we 
supported it, so
> we need to be sure as early as possible.

I agree.

> The support needed would be in some drivers and seems reasonably doable. 
> Moreover, this was discussed in the last dpdk meeting and the support was
> indicated as existing?, although I only verified the MLX code, myself.

From the code point of view, yes, the code was there months ago.

> I had seen the MLX code supporting _RSS action and there are some checks 
for
> supported cases; when you say “it didn't work”, what was the issue ?

I actually have met the error months ago, I even debugged it. IIRC,
the error is from libibverbs, when trying to create the hw flow. I
think I need double-confirm it with our colleague who developed this
feature.

> Let us have a discussion also about the Intel nic side and the Napatech 
side.

I think it's not necessary for Napatech, because they can support
MARK only action.

It is not necessary for Napatech; however to avoid the need to detect the 
Napatech
special (good) case, ‘ideally’ we do the exact same programming even in 
Napatech case.


For Intel, yes, I think Intel folks could give some comments here.

> It seems reasonable to ask where the disconnect is and whether this 
support
> can be added and then make a decision based on the answers. 

No objections.

--yliu


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Yuanhan Liu
On Wed, Sep 27, 2017 at 09:24:54AM +0800, Yuanhan Liu wrote:
> > The support needed would be in some drivers and seems reasonably doable. 
> > Moreover, this was discussed in the last dpdk meeting and the support was
> > indicated as existing?, although I only verified the MLX code, myself.
> 
> >From the code point of view, yes, the code was there months ago.
> 
> > I had seen the MLX code supporting _RSS action and there are some checks for
> > supported cases; when you say “it didn't work”, what was the issue ?
> 
> I actually have met the error months ago, I even debugged it. IIRC,
> the error is from libibverbs, when trying to create the hw flow. I
> think I need double-confirm it with our colleague who developed this
> feature.

Hmm, I think I was wrong, and apparenetly, I think I have did something
stupid in the last try. I double confirmed it just now, at least the
flow creation now works. I just need take some time to intergrate it
to OVS-DPDK, to see how it goes. I will get it back to you when I have
done that. Currently, I was occupied by something else.

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Yuanhan Liu
On Tue, Sep 26, 2017 at 09:58:16PM +, Darrell Ball wrote:
> The second major change is there is a thread introduced to do the real
> flow offload. This is for diminishing the overhead by hw flow offload
> installation/deletion at data path. See patch 9 for more detailed info.
> 
> [Darrell] There might be other options to handle this such as dividing time
> to HWOL in a given PMD. 
> Another option to have PMD isolation.

Good to know, though I'm not sure I understand it so far. But it can be
discussed later. That's also the reason I put it in the last patch, so
that we could easily turn it to another solution, or even simpler (just
remove it).

> In the last discussion, RSS action was suggested to use to get rid of
> the QUEUE action workaround. Unfortunately, it didn't work. The flow
> creation failed with MLX5 PMD driver, and that's the only driver in
> DPDK that supports RSS action so far.
> 
> 
> [Darrell] 
> I wonder if we should take a pause before jumping into the next version of 
> the code.

I have no objections.

> If workarounds are required in OVS code, then so be it as long as they are not
> overly disruptive to the existing code and hard to maintain.
> In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option
> to avoid some unpleasant OVS workarounds.
> This would make a significant difference in the code paths, if we supported 
> it, so
> we need to be sure as early as possible.

I agree.

> The support needed would be in some drivers and seems reasonably doable. 
> Moreover, this was discussed in the last dpdk meeting and the support was
> indicated as existing?, although I only verified the MLX code, myself.

From the code point of view, yes, the code was there months ago.

> I had seen the MLX code supporting _RSS action and there are some checks for
> supported cases; when you say “it didn't work”, what was the issue ?

I actually have met the error months ago, I even debugged it. IIRC,
the error is from libibverbs, when trying to create the hw flow. I
think I need double-confirm it with our colleague who developed this
feature.

> Let us have a discussion also about the Intel nic side and the Napatech side.

I think it's not necessary for Napatech, because they can support
MARK only action.

For Intel, yes, I think Intel folks could give some comments here.

> It seems reasonable to ask where the disconnect is and whether this support
> can be added and then make a decision based on the answers. 

No objections.

--yliu
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-26 Thread Darrell Ball


On 9/25/17, 10:37 PM, "Yuanhan Liu"  wrote:

Some hightlights in v3
==

Here is the v3, with 2 major changes and further testings (including
many flows). This took more effort than I thought, thus v3 publication
has been delayed for a while.

The first major change is the mark and id association is done with array
instead of CMAP now. This gives us further performance gain: it could
be up to 70% now (please see the exact number below).

This change also make the code a bit more complex though, due to the
lock issue. RCU is used (not quite sure it's been used rightly though).
For now, RCU only protects the array base address update (due to
reallocate), it doesn't protect the array item (array[i] = xx]) change.
I think it's buggy and I need rethink about it.

The second major change is there is a thread introduced to do the real
flow offload. This is for diminishing the overhead by hw flow offload
installation/deletion at data path. See patch 9 for more detailed info.

[Darrell] There might be other options to handle this such as dividing time
to HWOL in a given PMD. 
Another option to have PMD isolation.

In the last discussion, RSS action was suggested to use to get rid of
the QUEUE action workaround. Unfortunately, it didn't work. The flow
creation failed with MLX5 PMD driver, and that's the only driver in
DPDK that supports RSS action so far.


[Darrell] 
I wonder if we should take a pause before jumping into the next version of the 
code.

If workarounds are required in OVS code, then so be it as long as they are not
overly disruptive to the existing code and hard to maintain.
In the case of RTE_FLOW_ACTION_TYPE_RSS, we might have a reasonable option
to avoid some unpleasant OVS workarounds.
This would make a significant difference in the code paths, if we supported it, 
so
we need to be sure as early as possible.
The support needed would be in some drivers and seems reasonably doable. 
Moreover, this was discussed in the last dpdk meeting and the support was
indicated as existing?, although I only verified the MLX code, myself.

I had seen the MLX code supporting _RSS action and there are some checks for
supported cases; when you say “it didn't work”, what was the issue ?
Let us have a discussion also about the Intel nic side and the Napatech side.
It seems reasonable to ask where the disconnect is and whether this support
can be added and then make a decision based on the answers. 

What do you think?


I also tested many flows this time. The result is more exciting: it
could be up to 267% boost, with 512 mega flows (with each has another
512 exact matching flows, thus it's 512*512=256K flows in total), one
core and one queue doing PHY-PHY forwarding. For the offload case, the
performance keeps the same as with one flow only: because the cost of
the mark to flow translation is constant, no matter how many flows
are inserted (as far as they are all offloaded). However, for the
vanilla ovs-dpdk, the more flows, the worse the performance is. In
another word, the more flows, the bigger difference we will see.

There were too many discussions in last version. I'm sorry if I missed
some comments and didn't do the corresponding changes in v3. Please let
me know if I made such mistakes.

And below are the formal cover letter introduction, for someone who
is the first time to see this patchset.

---
Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with array in patch 1. It also reuses the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core, one queue and one
flow, I got about 70% performance boost. For PHY-vhost forwarding, I got
about 50% performance boost. It's basically the performance I got with v1,
when the tcp_flags is the ignored. In summary, the CMPA to array change
gives up yet another 16% performance boost.

The major issue mentioned in v1 is also workarounded: the queue index
is never set to 0 blindly anymore, but set to the rxq that first
receives the 

[ovs-dev] [PATCH v3 0/9] OVS-DPDK flow offload with rte_flow

2017-09-25 Thread Yuanhan Liu
Some hightlights in v3
==

Here is the v3, with 2 major changes and further testings (including
many flows). This took more effort than I thought, thus v3 publication
has been delayed for a while.

The first major change is the mark and id association is done with array
instead of CMAP now. This gives us further performance gain: it could
be up to 70% now (please see the exact number below).

This change also make the code a bit more complex though, due to the
lock issue. RCU is used (not quite sure it's been used rightly though).
For now, RCU only protects the array base address update (due to
reallocate), it doesn't protect the array item (array[i] = xx]) change.
I think it's buggy and I need rethink about it.

The second major change is there is a thread introduced to do the real
flow offload. This is for diminishing the overhead by hw flow offload
installation/deletion at data path. See patch 9 for more detailed info.

In the last discussion, RSS action was suggested to use to get rid of
the QUEUE action workaround. Unfortunately, it didn't work. The flow
creation failed with MLX5 PMD driver, and that's the only driver in
DPDK that supports RSS action so far.

I also tested many flows this time. The result is more exciting: it
could be up to 267% boost, with 512 mega flows (with each has another
512 exact matching flows, thus it's 512*512=256K flows in total), one
core and one queue doing PHY-PHY forwarding. For the offload case, the
performance keeps the same as with one flow only: because the cost of
the mark to flow translation is constant, no matter how many flows
are inserted (as far as they are all offloaded). However, for the
vanilla ovs-dpdk, the more flows, the worse the performance is. In
another word, the more flows, the bigger difference we will see.

There were too many discussions in last version. I'm sorry if I missed
some comments and didn't do the corresponding changes in v3. Please let
me know if I made such mistakes.

And below are the formal cover letter introduction, for someone who
is the first time to see this patchset.

---
Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with array in patch 1. It also reuses the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core, one queue and one
flow, I got about 70% performance boost. For PHY-vhost forwarding, I got
about 50% performance boost. It's basically the performance I got with v1,
when the tcp_flags is the ignored. In summary, the CMPA to array change
gives up yet another 16% performance boost.

The major issue mentioned in v1 is also workarounded: the queue index
is never set to 0 blindly anymore, but set to the rxq that first
receives the upcall pkt.

Note that it's disabled by default, which can be enabled by:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

v3: - The mark and id association is done with array instead of CMAP.
- Added a thread to do hw offload operations
- Removed macros completely
- dropped the patch to set FDIR_CONF, which is a workround some
  Intel NICs.
- Added a debug patch to show all flow patterns we have created.
- Misc fixes

v2: - workaround the queue action issue
- fixed the tcp_flags being skipped issue, which also fixed the
  build warnings
- fixed l2 patterns for Intel nic
- Converted some macros to functions
- did not hardcode the max number of flow/action
- rebased on top of the latest code

Thanks.

--yliu

---
Finn Christensen (2):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action

Shachar Beiser (1):
  dpif-netdev: record rx queue id for the upcall

Yuanhan Liu (6):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion
  netdev-dpdk: add debug for rte flow patterns
  dpif-netdev: do hw flow offload in another thread

 lib/dp-packet.h   |  14 +
 lib/dpif-netdev.c | 421 -
 lib/flow.c| 155 ---
 lib/flow.h|   1 +
 lib/netdev-dpdk.c | 776 +-
 lib/netdev.c  |   1 +
 lib/netdev.h  |   7 +
 7 files changed, 1331