Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-14 Thread Cong Wang
On Sat, Aug 13, 2016 at 4:05 AM, Jamal Hadi Salim  wrote:
> I tested a lot more this time.
> Good news: performance regression now resolved.
> Some bad news - there's still one more oops:
>
> sudo $TC actions add action police rate 1kbit burst 90k drop index 1

Relax, Jamal, it is caused by other commit, not my previous one.

But I can definitely fix it since you are so busy. ;)


Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-13 Thread Jamal Hadi Salim

On 16-08-11 08:08 PM, Cong Wang wrote:

On Thu, Aug 11, 2016 at 9:20 AM, Jamal Hadi Salim  wrote:

On 16-08-10 04:06 PM, Cong Wang wrote:


On Wed, Aug 10, 2016 at 7:34 AM, Jamal Hadi Salim 
wrote:


On 16-08-08 04:46 PM, Cong Wang wrote:




tcf_exts_exec() is the culprit - and conversion to from flexarray
to linked list in the fast problem to be specific.



Ah, this reminds me that I don't have to use flex_array, initially
I thought the tcf_exts could hold as many actions as it wants,
but actually there is a upper bound, TCA_ACT_MAX_PRIO.
IOW, a regular dynamic array is just enough here.



Yes, a regular array would be enough.



I just replaced the flex_array with a regular one, it works fine
for me too, at least no crash with all of my test cases.






No problem Cong - except we have a kernel that crashes right now.
BTW: I just thought of another test which uses a different code
path
# add a policer rule
sudo $TC actions add action police rate 1kbit burst 90k drop
#dump rules..
sudo $TC -s actions ls action police





I tested a lot more this time.
Good news: performance regression now resolved.
Some bad news - there's still one more oops:

sudo $TC actions add action police rate 1kbit burst 90k drop index 1

note how i explicitly specified the index.
If i leave out the index, all works fine. I'll continue to see
if there are any other issue for the next while and will email.
I think you are close so  I will also make small comments on the
patches because you are going to make another update.

cheers,
jamal


Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-12 Thread Jamal Hadi Salim

On 16-08-11 08:08 PM, Cong Wang wrote:

On Thu, Aug 11, 2016 at 9:20 AM, Jamal Hadi Salim  wrote:

On 16-08-10 04:06 PM, Cong Wang wrote:



Ok, I did a quick look at your patch - I still see you converting
from array to list. I mean get tcf_exts_exec() to take an array
and walk it. That will restore the perf numbers to the same level.



Makes sense! I just did this change as you suggest.


Ok, took look at it. Looks promising! Thanks Cong. I will test
and get back to you (sorry, likely by end of day)

cheers,
jamal



Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-11 Thread Cong Wang
On Thu, Aug 11, 2016 at 9:20 AM, Jamal Hadi Salim  wrote:
> On 16-08-10 04:06 PM, Cong Wang wrote:
>>
>> On Wed, Aug 10, 2016 at 7:34 AM, Jamal Hadi Salim 
>> wrote:
>>>
>>> On 16-08-08 04:46 PM, Cong Wang wrote:
>
>
>>> tcf_exts_exec() is the culprit - and conversion to from flexarray
>>> to linked list in the fast problem to be specific.
>>
>>
>> Ah, this reminds me that I don't have to use flex_array, initially
>> I thought the tcf_exts could hold as many actions as it wants,
>> but actually there is a upper bound, TCA_ACT_MAX_PRIO.
>> IOW, a regular dynamic array is just enough here.
>>
>
> Yes, a regular array would be enough.
>
>
>> I just replaced the flex_array with a regular one, it works fine
>> for me too, at least no crash with all of my test cases.
>>
>
> Ok, I did a quick look at your patch - I still see you converting
> from array to list. I mean get tcf_exts_exec() to take an array
> and walk it. That will restore the perf numbers to the same level.


Makes sense! I just did this change as you suggest.


>
>> Please try v2, since you have more test cases that I do.
>> Or it would be great if you can share your test cases with
>> me or us.
>>
>> Be patient, every big change could have regression. :)
>>
>
> No problem Cong - except we have a kernel that crashes right now.
> BTW: I just thought of another test which uses a different code
> path
> # add a policer rule
> sudo $TC actions add action police rate 1kbit burst 90k drop
> #dump rules..
> sudo $TC -s actions ls action police
>

Passed:

[root@localhost ~]# tc actions add action police rate 1kbit burst 90k drop
[root@localhost ~]# tc actions ls action police

action order 0:  police 0x1 rate 1000bit burst 23440b mtu 2Kb action
drop overhead 0b
ref 1 bind 0
[root@localhost ~]# tc -s actions ls action police

action order 0:  police 0x1 rate 1000bit burst 23440b mtu 2Kb action
drop overhead 0b
ref 1 bind 0
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0


Thanks.


Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-11 Thread Jamal Hadi Salim

On 16-08-10 04:06 PM, Cong Wang wrote:

On Wed, Aug 10, 2016 at 7:34 AM, Jamal Hadi Salim  wrote:

On 16-08-08 04:46 PM, Cong Wang wrote:



tcf_exts_exec() is the culprit - and conversion to from flexarray
to linked list in the fast problem to be specific.


Ah, this reminds me that I don't have to use flex_array, initially
I thought the tcf_exts could hold as many actions as it wants,
but actually there is a upper bound, TCA_ACT_MAX_PRIO.
IOW, a regular dynamic array is just enough here.



Yes, a regular array would be enough.



I just replaced the flex_array with a regular one, it works fine
for me too, at least no crash with all of my test cases.



Ok, I did a quick look at your patch - I still see you converting
from array to list. I mean get tcf_exts_exec() to take an array
and walk it. That will restore the perf numbers to the same level.


Please try v2, since you have more test cases that I do.
Or it would be great if you can share your test cases with
me or us.

Be patient, every big change could have regression. :)



No problem Cong - except we have a kernel that crashes right now.
BTW: I just thought of another test which uses a different code
path
# add a policer rule
sudo $TC actions add action police rate 1kbit burst 90k drop
#dump rules..
sudo $TC -s actions ls action police

cheers,
jamal



Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-10 Thread Jamal Hadi Salim

On 16-08-08 04:46 PM, Cong Wang wrote:

This patchset fixes several regressions caused by the previous
code refactor. Thanks to Jamal for catching them!

Note, patch 3/5 and 4/5 are not strictly necessary, I just
want to carry them together.



Cong - there's good news and bad news.
The good news is that the oopses are fixed.
The bad news is you have now slowed down the system. It is noticeable
at high speed.
I narrowed it down to your use of flex arrays. In particular
tcf_exts_exec() call:
This is the fast path - was flexarray really necessary?
The conversion to list is slowing things down.

As hard as this is for me to say:
I am actually beginning to question this whole patch series.
Either you have a plan to fix this regression or lets just
pull this out for now to regain stability until we get our
act together. I think it would make a lot of sense to just pass
an array instead of a list.

cheers,
jamal




Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-10 Thread Cong Wang
On Wed, Aug 10, 2016 at 7:34 AM, Jamal Hadi Salim  wrote:
> On 16-08-08 04:46 PM, Cong Wang wrote:
>>
>> This patchset fixes several regressions caused by the previous
>> code refactor. Thanks to Jamal for catching them!
>>
>
> Cong,
>
> Good news: oops gone. I havent done more testing than I did
> before; but looks good so far.
>
> Bad news: You have introduced a performance regression which is
> noticeable at high speed.
>
> tcf_exts_exec() is the culprit - and conversion to from flexarray
> to linked list in the fast problem to be specific.

Ah, this reminds me that I don't have to use flex_array, initially
I thought the tcf_exts could hold as many actions as it wants,
but actually there is a upper bound, TCA_ACT_MAX_PRIO.
IOW, a regular dynamic array is just enough here.

I just replaced the flex_array with a regular one, it works fine
for me too, at least no crash with all of my test cases.

Please try v2, since you have more test cases that I do.
Or it would be great if you can share your test cases with
me or us.

Be patient, every big change could have regression. :)

Thanks.


Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-10 Thread Jamal Hadi Salim

On 16-08-08 04:46 PM, Cong Wang wrote:

This patchset fixes several regressions caused by the previous
code refactor. Thanks to Jamal for catching them!



Cong,

Good news: oops gone. I havent done more testing than I did
before; but looks good so far.

Bad news: You have introduced a performance regression which is
noticeable at high speed.

tcf_exts_exec() is the culprit - and conversion to from flexarray
to linked list in the fast problem to be specific.
The regression is problematic (and unacceptable). Two options:
a) You fix the regressions - which i think may require changing
what gets passed around an executed on as an array instead of a
list.
b) I am worried #a will take some work. So the second option is
to back out the patch since there are known stability options;
get regression issues resolved and then go back and submit.

cheers,
jamla


Re: [Patch net 0/5] net_sched: tc action fixes and updates

2016-08-10 Thread Jamal Hadi Salim

Sorry I have a really bad (expensive) connection and the SMTP
server seems to reject the first message and i retyped it and
sent again - and now notice both were sent.
Message didnt change ;->

cheers,
jamal