Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-22 Thread Toke Høiland-Jørgensen
Michal Kazior  writes:

> traffic-gen generates only BE traffic. Everything else runs UDP_RR
> which doesn't generate a lot of traffic.

Good point. Fixed that: the newest git version of traffic-gen supports a
-t parameter which will be set as the TOS byte on outgoing traffic
(literal; no smart diffserv handling, so you can override the ECN bits
as well).

Added support for a burst-tos test parameter in the Flent burst test
configs which will use this new parameter if set.

-Toke


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-22 Thread Michal Kazior
On 21 March 2016 at 18:10, Dave Taht  wrote:
> thx.
>
> a lot to digest.
>
> A) quick notes on "flent-gui bursts_11e-2016-03-21T09*.gz"
>
> 1) the new bursts_11e test *should* have stuck stuff in the VI and VO
> queues, and there *should* have been some sort of difference shown on
> the plots with it. There wasn't.

traffic-gen generates only BE traffic. Everything else runs UDP_RR
which doesn't generate a lot of traffic.


> For diffserv markings I used BE=CS0, BK=CS1, VI=CS5, and VO=EF.
> CS6/CS7 should also land in VO (at least with the soft mac handler
> last I looked). Is there a way to check if you are indeed exercising
> all four 802.11e hardware queues in this test? in ath9k it is the
> "xmit" sysfs var

Hmm.. there are no txq stats. I guess it makes sense to have them?

There is /sys/kernel/debug/ieee80211/phy*/fq which dumps state of all
queues which will be mostly empty with UDP_RR. You can run netperf UDP
stream with diffserv marking to see onto which tid they are mapped.
You can see tid-AC mappings here:
https://wireless.wiki.kernel.org/en/developers/documentation/mac80211/queues

I just checked and EF ends up as tid5 which is VI. It's actually the
same as CS5. You can use CS7 to run on tid7 which is VO.


> 2) In all the old cases the BE UDP_RR flow died on the first burst
> (why?), and the fullpatch preserved it.

I think it's related to my setup which involves veth pairs. I use them
to simulate bridging/AP behavior but maybe it's not doing the job
right, hmm..


> (I would have kind of hoped to
> have seen the BK flow die, actually, in the fullpatch)

There's no extra weight priority to BK. The difference between BE and
BK in 802.11 is contention window access time so BK gets less txops
statistically. Both share the same txop, which is 5.484ms in most
cases.


> 3) I am also confused on 802.11ac - can VO aggregate? ( can't in in 802.11n).

Yes, it should be albeit VI and VO have shorter txop compared to
BE/BK: 3.008ms and 1.504ms respectively.

UDP_RR doesn't really create a lot of opportunities for aggregation.
If you want to see how different queues behave when loaded you'll need
to modify traffic-gen and add bursts across different ACs in the
bursts_11e test.


Michał


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-20 Thread Michal Kazior
I've re-tested selected cases with wmm_enabled=0 set on the DUT AP.
I'm attaching results.

Naming:
 * "old-" is without mac/ath10k changes (referred to as kvalo-reverts
previously) and fq_codel on qdiscs,
 * "patched-" is all patches applied (both mac and ath),
 * "-be-bursts" is stock "bursts" flent test,
 * "-all-bursts" is modified "bursts" flent test to burst on all 3
tids simultaneously: tid0(BE), tid1(BK), tid5(VI).


Michał

On 16 March 2016 at 19:36, Dave Taht  wrote:
> That is the sanest 802.11e queue behavior I have ever seen!  (at both
> 6 and 300mbit! in the ath10k patched mac test)
>
> It would be good to add a flow to this test that exercises the VI
> queue (CS5 diffserv marking?), and to repeat this test with wmm
> disabled for comparison.
>
>
> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 8:37 AM, Dave Taht  wrote:
>> it is helpful to name the test files coherently in the flent tests, in
>> addition to using a directory structure and timestamp. It makes doing
>> comparison plots in data->add-other-open-data-files simpler. "-t
>> patched-mac-300mbps", for example.
>>
>> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
>> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
>> a while is "ok".
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> https://www.gofundme.com/savewifi
>>
>>
>> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior  
>> wrote:
>>> On 16 March 2016 at 11:17, Michal Kazior  wrote:
 Hi,

 Most notable changes:
>>> [...]
  * ath10k proof-of-concept that uses the new tx
scheduling (will post results in separate
email)
>>>
>>> I'm attaching a bunch of tests I've done using flent. They are all
>>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>>> topology is:
>>>
>>>AP > STA
>>>AP )) (( STA
>>>  [veth]--[br]--[wlan] )) (( [wlan]
>>>
>>> You can notice that in some tests plot data gets cut-off. There are 2
>>> problems I've identified:
>>>  - excess drops (not a problem with the patchset and can be seen when
>>> there's no codel-in-mac or scheduling isn't used)
>>>  - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>>> exactly, could be some hw/fw quirk)
>>>
>>> Let me know if you have questions or comments regarding my testing/results.
>>>
>>>
>>> Michał


bursts-2016-03-17T093033.443115.patched_all_bursts.flent.gz
Description: GNU Zip compressed data


bursts-2016-03-17T092946.721003.patched_be_bursts.flent.gz
Description: GNU Zip compressed data


bursts-2016-03-17T092445.132728.old_be_bursts.flent.gz
Description: GNU Zip compressed data


bursts-2016-03-17T091952.053950.old_all_bursts.flent.gz
Description: GNU Zip compressed data


Re: [Codel] [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Rick Jones

On 03/17/2016 10:00 AM, Dave Taht wrote:

netperf's udp_rr is not how much traffic conventionally behaves. It
doesn't do tcp slow start or congestion control in particular...


Nor would one expect it to need to, unless one were using "burst mode" 
to have more than one transaction inflight at one time.


And unless one uses the test-specific -e option to provide a very crude 
retransmission mechanism based on a socket read timeout, neither does 
UDP_RR recover from lost datagrams.


happy benchmarking,

rick jones
http://www.netperf.org/


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Dave Taht
On Thu, Mar 17, 2016 at 1:55 AM, Michal Kazior  wrote:

> I suspect the BK/BE latency difference has to do with the fact that
> there's bulk traffic going on BE queues (this isn't reflected
> explicitly in the plots). The `bursts` flent test includes short
> bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
> (seen as green and blue lines on the plot). Due to (intended) limited
> outflow (6mbps) BE queues build up and don't drain for the duration of
> the entire test creating more opportunities for aggregating BE traffic
> while other queues are near-empty and very short (time wise as well).

I agree with your explanation. Access to the media and queue length
are the two variables at play here.

I just committed a new flent test that should exercise the vo,vi,be,
and bk queues, "bursts_11e". I dropped the conventional ping from it
and just rely on netperf's udp_rr for each queue. It seems to "do the
right thing" on the ath9k

And while I'm all in favor of getting 802.11e's behaviors more right,
and this seems like a good way to get there...

netperf's udp_rr is not how much traffic conventionally behaves. It
doesn't do tcp slow start or congestion control in particular...

In the case of the VO queue, for example, the (2004) intended behavior
was 1 isochronous packet per 10ms per voice sending station and one
from the ap, not a "ping". And at the time, VI was intended to be
unicast video. TCP was an afterthought. (wifi's original (1993) mac
was actually designed for ipx/spx!)

I long for regular "rrul" and "rrul_be" tests against the new stuff to
blow it up thoroughly as references along the way.
(tcp_upload, tcp_download, (and several of the rtt_fair tests also
between stations)). Will get formal about it here as soon as we end up
on the same kernel trees

Furthermore 802.11e is not widely used - in particular, not much
internet bound/sourced traffic falls into more than BE and BK,
presently. and in some cases weirder - comcast remarks a very large
percentage of to the home inbound traffic as CS1 (BK), btw, and
stations tend to use CS0. Data comes in on BK, acks go out on BE.

I/we will try to come up with intermediate tests between the burst
tests and the rrul tests as we go along the way.

> If you consider Wi-Fi is half-duplex and latency in the entire stack

In the context of this test regime...


Saying wifi is "half"-duplex is a misleading way to think about it in
many respects. it is a shared medium more like early, non-switched
ethernet, with a weird mac that governs what sort of packets get
access to (a txop) the medium first, across all stations co-operating
within EDCA.

Half or full duplex is something that mostly applied to p2p serial
connections (or p2p wifi), not P2MP. Additionally characteristics like
exponential backoff make no sense were wifi any form of duplex, full
or half.

Certainly much stuff within a txop (block acks for example) can be
considered half duplex in a microcosmic context.

I wish we actually had words that accurately described wifi's actual behavior.


> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).

yes. always having a request pending for each of the 802.11e queues is
actually not the best idea, it is better to take advantage of better
aggregation afforded by 802.11n/ac, to only have one or two of the
queues in use against any given station and promote or demote traffic
into a more-right queue.

simple example of the damage having all 4 queues always contending is
exemplified by running the rrul and rrul_be tests against nearly any
given AP.

>
> I've modified traffic-gen and re-run tests with bursts on all tested
> tids/ACs (tid0, tid1, tid5). I'm attaching the results.
>
> With bursts on all tids you can clearly see BK has much higher latency than 
> BE.

The long term goal here, of course, is for BK (or the other queues) to
not have seconds of queuing latency but something more bounded to 2x
media access time...

> (Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
> this test; it doesn't have the weird hiccups I was seeing on QCA99X0
> and newer QCA988X firmware reports bogus expected throughput which is
> most likely a result of my sloppy proof-of-concept change in ath10k).

So I should avoid ben greer's firmware for now?

>
>
> Michał
>
> On 16 March 2016 at 20:48, Jasmine Strong  wrote:
>> BK usually has 0 txop, so it doesn't do aggregation.
>>
>> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland  wrote:
>>>
>>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>>> > That is the sanest 802.11e queue behavior I have ever seen!  (at both
>>> > 6 and 300mbit! in the ath10k patched mac test)
>>>
>>> Out of curiosity, why does BE have larger latency than BK in that chart?
>>> I'd have expected the opposite.

Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Dave Taht
it is helpful to name the test files coherently in the flent tests, in
addition to using a directory structure and timestamp. It makes doing
comparison plots in data->add-other-open-data-files simpler. "-t
patched-mac-300mbps", for example.

Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
a while is "ok".
Dave Täht
Let's go make home routers and wifi faster! With better software!
https://www.gofundme.com/savewifi


On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior  wrote:
> On 16 March 2016 at 11:17, Michal Kazior  wrote:
>> Hi,
>>
>> Most notable changes:
> [...]
>>  * ath10k proof-of-concept that uses the new tx
>>scheduling (will post results in separate
>>email)
>
> I'm attaching a bunch of tests I've done using flent. They are all
> "burst" tests with burst-ports=1 and burst-length=2. The testing
> topology is:
>
>AP > STA
>AP )) (( STA
>  [veth]--[br]--[wlan] )) (( [wlan]
>
> You can notice that in some tests plot data gets cut-off. There are 2
> problems I've identified:
>  - excess drops (not a problem with the patchset and can be seen when
> there's no codel-in-mac or scheduling isn't used)
>  - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
> sometimes at times and doesn't Rx frames causing UDP_RR to stop
> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
> exactly, could be some hw/fw quirk)
>
> Let me know if you have questions or comments regarding my testing/results.
>
>
> Michał


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Bob Copeland
On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
> That is the sanest 802.11e queue behavior I have ever seen!  (at both
> 6 and 300mbit! in the ath10k patched mac test)

Out of curiosity, why does BE have larger latency than BK in that chart?
I'd have expected the opposite.

-- 
Bob Copeland %% http://bobcopeland.com/


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Michal Kazior
On 16 March 2016 at 16:37, Dave Taht  wrote:
> it is helpful to name the test files coherently in the flent tests, in
> addition to using a directory structure and timestamp. It makes doing
> comparison plots in data->add-other-open-data-files simpler. "-t
> patched-mac-300mbps", for example.

Sorry. I'm still trying to figure out what variables are worth
considering for comparison purposes.


> Also netperf from svn (maybe 2.7, don't remember) will restart udp_rr
> after a packet loss in 250ms. Seeing a loss on UDP_RR and it stop for
> a while is "ok".

I'm using 2.6 straight out of debian repos so yeah. I guess I'll try
using more recent netperf if I can't figure out the hiccups.


Michał


> Dave Täht
> Let's go make home routers and wifi faster! With better software!
> https://www.gofundme.com/savewifi
>
>
> On Wed, Mar 16, 2016 at 3:26 AM, Michal Kazior  
> wrote:
>> On 16 March 2016 at 11:17, Michal Kazior  wrote:
>>> Hi,
>>>
>>> Most notable changes:
>> [...]
>>>  * ath10k proof-of-concept that uses the new tx
>>>scheduling (will post results in separate
>>>email)
>>
>> I'm attaching a bunch of tests I've done using flent. They are all
>> "burst" tests with burst-ports=1 and burst-length=2. The testing
>> topology is:
>>
>>AP > STA
>>AP )) (( STA
>>  [veth]--[br]--[wlan] )) (( [wlan]
>>
>> You can notice that in some tests plot data gets cut-off. There are 2
>> problems I've identified:
>>  - excess drops (not a problem with the patchset and can be seen when
>> there's no codel-in-mac or scheduling isn't used)
>>  - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
>> sometimes at times and doesn't Rx frames causing UDP_RR to stop
>> mid-way; confirmed with logs and sniffer; I haven't figured out *why*
>> exactly, could be some hw/fw quirk)
>>
>> Let me know if you have questions or comments regarding my testing/results.
>>
>>
>> Michał


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Michal Kazior
TxOP 0 has a special meaning in the standard. For HT/VHT it means the
it is actually limited to 5484us (mixed-mode) or 1us (greenfield).

I suspect the BK/BE latency difference has to do with the fact that
there's bulk traffic going on BE queues (this isn't reflected
explicitly in the plots). The `bursts` flent test includes short
bursts of traffic on tid0 (BE) which is shared with ICMP and BE UDP_RR
(seen as green and blue lines on the plot). Due to (intended) limited
outflow (6mbps) BE queues build up and don't drain for the duration of
the entire test creating more opportunities for aggregating BE traffic
while other queues are near-empty and very short (time wise as well).
If you consider Wi-Fi is half-duplex and latency in the entire stack
(for processing ICMP and UDP_RR) is greater than 11e contention window
timings you can get your BE flow responses with extra delay (since
other queues might have responses ready quicker).

I've modified traffic-gen and re-run tests with bursts on all tested
tids/ACs (tid0, tid1, tid5). I'm attaching the results.

With bursts on all tids you can clearly see BK has much higher latency than BE.

(Note, I've changed my AP to QCA988X with oldie firmware 10.1.467 for
this test; it doesn't have the weird hiccups I was seeing on QCA99X0
and newer QCA988X firmware reports bogus expected throughput which is
most likely a result of my sloppy proof-of-concept change in ath10k).


Michał

On 16 March 2016 at 20:48, Jasmine Strong  wrote:
> BK usually has 0 txop, so it doesn't do aggregation.
>
> On Wed, Mar 16, 2016 at 11:55 AM, Bob Copeland  wrote:
>>
>> On Wed, Mar 16, 2016 at 11:36:31AM -0700, Dave Taht wrote:
>> > That is the sanest 802.11e queue behavior I have ever seen!  (at both
>> > 6 and 300mbit! in the ath10k patched mac test)
>>
>> Out of curiosity, why does BE have larger latency than BK in that chart?
>> I'd have expected the opposite.
>>
>> --
>> Bob Copeland %% http://bobcopeland.com/
>>
>> ___
>> ath10k mailing list
>> ath...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/ath10k
>
>


bursts-2016-03-17T083932.549858.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz
Description: GNU Zip compressed data


bursts-2016-03-17T083803.348752.qca988x_10_1_467_fqmac_ath10k_with_tx_sched_6mbps_.flent.gz
Description: GNU Zip compressed data


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-19 Thread Bob Copeland
On Thu, Mar 17, 2016 at 09:55:03AM +0100, Michal Kazior wrote:
> If you consider Wi-Fi is half-duplex and latency in the entire stack
> (for processing ICMP and UDP_RR) is greater than 11e contention window
> timings you can get your BE flow responses with extra delay (since
> other queues might have responses ready quicker).

Got it, that makes sense.  Thanks for the explanation!

-- 
Bob Copeland %% http://bobcopeland.com/


Re: [RFCv2 0/3] mac80211: implement fq codel

2016-03-16 Thread Michal Kazior
On 16 March 2016 at 11:17, Michal Kazior  wrote:
> Hi,
>
> Most notable changes:
[...]
>  * ath10k proof-of-concept that uses the new tx
>scheduling (will post results in separate
>email)

I'm attaching a bunch of tests I've done using flent. They are all
"burst" tests with burst-ports=1 and burst-length=2. The testing
topology is:

   AP > STA
   AP )) (( STA
 [veth]--[br]--[wlan] )) (( [wlan]

You can notice that in some tests plot data gets cut-off. There are 2
problems I've identified:
 - excess drops (not a problem with the patchset and can be seen when
there's no codel-in-mac or scheduling isn't used)
 - UDP_RR hangs (apparently QCA99X0 I have hangs for a few hundred ms
sometimes at times and doesn't Rx frames causing UDP_RR to stop
mid-way; confirmed with logs and sniffer; I haven't figured out *why*
exactly, could be some hw/fw quirk)

Let me know if you have questions or comments regarding my testing/results.


Michał


fq.tar.gz
Description: GNU Zip compressed data


[RFCv2 0/3] mac80211: implement fq codel

2016-03-16 Thread Michal Kazior
Hi,

Most notable changes:
 * fixes (duh); fairness should work better now,
 * EWMA codel target based on estimated service
   time,
 * new tx scheduling helper with in-flight
   duration limiting (same idea Emmanuel
   had for iwlwifi),
 * added a few debugfs hooks.
 * ath10k proof-of-concept that uses the new tx
   scheduling (will post results in separate
   email)

The patch grew pretty big and I plan on splitting
it before next submission. Any suggestions?

The tx scheduling probably needs more work and
testing. I didn't evaluate how CPU intensive it is
nor how it influences things like peak throughput
(lab conditions et al) yet.

I've uploaded a branch for convenience:

  https://github.com/kazikcz/linux/tree/fqmac-rfc-v2

This is based on Kalle's ath tree.


Michal Kazior (3):
  mac80211: implement fq_codel for software queuing
  ath10k: report per-station tx/rate rates to mac80211
  ath10k: use ieee80211_tx_schedule()

 drivers/net/wireless/ath/ath10k/core.c  |   2 -
 drivers/net/wireless/ath/ath10k/core.h  |   8 +-
 drivers/net/wireless/ath/ath10k/debug.c |  61 ++-
 drivers/net/wireless/ath/ath10k/mac.c   | 126 +++---
 drivers/net/wireless/ath/ath10k/wmi.h   |   2 +-
 include/net/mac80211.h  |  96 -
 net/mac80211/agg-tx.c   |   8 +-
 net/mac80211/cfg.c  |   2 +-
 net/mac80211/codel.h| 264 +
 net/mac80211/codel_i.h  |  89 +
 net/mac80211/debugfs.c  | 267 +
 net/mac80211/ieee80211_i.h  |  45 ++-
 net/mac80211/iface.c|  25 +-
 net/mac80211/main.c |   9 +-
 net/mac80211/rx.c   |   2 +-
 net/mac80211/sta_info.c |  10 +-
 net/mac80211/sta_info.h |  27 ++
 net/mac80211/status.c   |  64 
 net/mac80211/tx.c   | 658 ++--
 net/mac80211/util.c |  21 +-
 20 files changed, 1629 insertions(+), 157 deletions(-)
 create mode 100644 net/mac80211/codel.h
 create mode 100644 net/mac80211/codel_i.h

-- 
2.1.4