Re: Avery's notes from LPC2016 wireless track (Santa Fe)

2016-11-03 Thread Avery Pennarun
On Thu, Nov 3, 2016 at 7:55 PM, Kathy Giori <kathy.gi...@gmail.com> wrote:
> On Thu, Nov 3, 2016 at 1:43 PM, Avery Pennarun <apenw...@gmail.com> wrote:
>> We talked about many topics at the Linux Plumbers' Conference in Santa
>> Fe on Tuesday.  I took fairly detailed notes, which you can find at
>> the links below.
>
> Great notes Avery. Did you happen to take note of how many
> participants there were? Or was the attendee list posted on the wiki
> beforehand fairly accurate?
> https://wireless.wiki.kernel.org/en/developers/summits/santa-fe-2016

There were quite a few people not on the list.  Google sent an
extra-large contingent last year.  People estimated about 50% of the
wireless meeting were Googlers, which comes to maybe 15 out of around
30-ish.  Google is doing a lot of (sometimes redundant :)) wireless
work lately.

> Was there any talk about having more frequent "live" discussions of
> these topics (via video conf or conf call or scheduled IRC), to help
> overall collaboration without having to wait for the next f2f summit?
> Mailing list interaction doesn't seem to elicit the same energy as
> occurs over live dialog. Quarterly?

It didn't come up.  Honestly, it feels to me like 6 months is the
right cadence.  A lot of work does happen in the background, such as
the fq_codel work (yay!) which was definitely launched by one or two
of these, but proceeded well afterwards.

Hope you're doing well at whatever you're doing now!

Have fun,

Avery


Re: Avery's notes from LPC2016 wireless track (Santa Fe)

2016-11-03 Thread Avery Pennarun
On Thu, Nov 3, 2016 at 5:55 PM, Barry Day  wrote:
> Thanks for that. Can I take this as meaning there won't be any videos?
> I would like to have seen Jes Sorensen's talk on rtl8xxxu

As far as I know, no talks at this LPC were recorded.


Avery's notes from LPC2016 wireless track (Santa Fe)

2016-11-03 Thread Avery Pennarun
Hi all,

We talked about many topics at the Linux Plumbers' Conference in Santa
Fe on Tuesday.  I took fairly detailed notes, which you can find at
the links below.

Fancy html commentable version:
https://docs.google.com/document/d/1Cr2bEf23wLkhiXXtyuBJtvvpb9xvCw7zsU-m1q1g4tA/edit

Less fancy version that (I think) will not ask for a Google account
(let me know if it gives you trouble):
https://docs.google.com/document/u/1/d/e/2PACX-1vQXbVQ-3zQt3Bcr3OWfwzbw_C49tTvf0ed8Hmf7b20E6tXc3a40tWZmPku49iGDE-OhgxNmO_lkkHEn/pub

Thanks, everyone, for the great discussion!

Have fun,

Avery


Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-04-18 Thread Avery Pennarun
On Sat, Apr 9, 2016 at 9:59 PM, bruce m beach  wrote:
>> If any people more familiar with ARM are reading this - does the value
>> 0x5b35da40 ring a bell?
>
> It could be anything. It depends on the chip implementation. For instance on 
> an
> exynos that is an undefined region. What device are we talking about?

This is a mindspeed c2k CPU, in case that helps at all.  But I'm
guessing it really is just some pointer garbage.  The only way to
trigger the crash is to do (something) with an attached station at the
same moment as reading the agg_status file.  Some station types
trigger it more than others, but I'm not sure which.
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-04-08 Thread Avery Pennarun
On Fri, Apr 8, 2016 at 4:31 AM, Avery Pennarun <apenw...@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 3:15 AM, Johannes Berg <johan...@sipsolutions.net> 
> wrote:
>> On Fri, 2016-04-08 at 09:01 +0200, Johannes Berg wrote:
>>> On Fri, 2016-04-08 at 08:56 +0200, Johannes Berg wrote:
>>> > On Thu, 2016-04-07 at 21:32 -0400, Avery Pennarun wrote:
>>> > > Yes.  Here it is:
>>> > > http://apenwarr.ca/tmp/mac80211-agg-status-crash.ko
>>> > >
>>> > Unfortunately there are no debug symbols in this file, so it
>>> > doesn't
>>> > help me much. I can't even seem to get objdump to disassemble it
>>> > correctly: looks like the file is in thumb, going from things
>>> > like R_ARM_THM_CALL relocations, but even -Mforce-thumb doesn't
>>> > seem
>>> > to DRT; sta_agg_status_read+0xeb isn't even a valid instruction
>>> > offset in regular ARM mode.
>>> >
>>> It *seems* that it most likely crashes on the first access to tid_tx,
>>> which is consistent with the story of disabling TX aggregation
>>> timeouts
>>> reducing the chances.
>>>
>>> So I guess we have to look for some TX aggregation teardown RCU
>>> pointer problem?
>>
>> Can't find anything. The only other thing I saw now is that the TID
>> appears to be 7 (in r7), might be worth looking for whether that's a
>> common thing or not?
>
> Just to be clear, this crash is only from *reading* the agg_status
> files.  I don't know if the crashiness reduces when disabling the
> aggregation timeouts, since that's a separate bug (in which the queue
> gets stuck and the 'pending' column of this file just keeps
> increasing).
>
> I'll try twiddling some options again tomorrow and see if I can get
> one with proper debug symbols.  For what it's worth, this platform is
> "ARMv7 Processor rev 1 (v7l)" and the gcc build is made for Cortex A9.
> You can find an x86 build of our toolchain in the git repo at
> https://gfiber.googlesource.com/toolchains/mindspeed.

Updated .ko file that definitely has debug symbols this time:
http://apenwarr.ca/tmp/mac80211-agg-status-crash-debugsyms.ko
a gdb compiled for x86-64 that can definitely read the above .ko file:
http://apenwarr.ca/tmp/arm-gdb

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-04-08 Thread Avery Pennarun
On Fri, Apr 8, 2016 at 3:15 AM, Johannes Berg <johan...@sipsolutions.net> wrote:
> On Fri, 2016-04-08 at 09:01 +0200, Johannes Berg wrote:
>> On Fri, 2016-04-08 at 08:56 +0200, Johannes Berg wrote:
>> > On Thu, 2016-04-07 at 21:32 -0400, Avery Pennarun wrote:
>> > > Yes.  Here it is:
>> > > http://apenwarr.ca/tmp/mac80211-agg-status-crash.ko
>> > >
>> > Unfortunately there are no debug symbols in this file, so it
>> > doesn't
>> > help me much. I can't even seem to get objdump to disassemble it
>> > correctly: looks like the file is in thumb, going from things
>> > like R_ARM_THM_CALL relocations, but even -Mforce-thumb doesn't
>> > seem
>> > to DRT; sta_agg_status_read+0xeb isn't even a valid instruction
>> > offset in regular ARM mode.
>> >
>> It *seems* that it most likely crashes on the first access to tid_tx,
>> which is consistent with the story of disabling TX aggregation
>> timeouts
>> reducing the chances.
>>
>> So I guess we have to look for some TX aggregation teardown RCU
>> pointer problem?
>
> Can't find anything. The only other thing I saw now is that the TID
> appears to be 7 (in r7), might be worth looking for whether that's a
> common thing or not?

Just to be clear, this crash is only from *reading* the agg_status
files.  I don't know if the crashiness reduces when disabling the
aggregation timeouts, since that's a separate bug (in which the queue
gets stuck and the 'pending' column of this file just keeps
increasing).

I'll try twiddling some options again tomorrow and see if I can get
one with proper debug symbols.  For what it's worth, this platform is
"ARMv7 Processor rev 1 (v7l)" and the gcc build is made for Cortex A9.
You can find an x86 build of our toolchain in the git repo at
https://gfiber.googlesource.com/toolchains/mindspeed.

Thanks for looking into it :)

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] mac80211: implement fair queuing per txq

2016-04-07 Thread Avery Pennarun
On Fri, Mar 25, 2016 at 5:27 AM, Michal Kazior  wrote:
> mac80211's software queues were designed to work
> very closely with device tx queues. They are
> required to make use of 802.11 packet aggregation
> easily and efficiently.
>
> However the logic imposed a per-AC queue limit.
> With the limit too small mac80211 wasn't be able
> to guarantee fairness across TIDs nor stations
> because single burst to a slow station could
> monopolize queues and reach per-AC limit
> preventing traffic from other stations being
> queued into mac80211's software queues. Having the
> limit too large would make smart qdiscs, e.g.
> fq_codel, a lot less efficient as they are
> designed on the premise that they are very close
> to the actualy device tx queues.

As usual, I'm way behind on everything, but I have been testing this
patch series in the background (no clear results to report yet) and
wanted to comment at a very high level.  I think you are actually
doing several stages of improvements all at once here:

[0. Baseline: one big queue going into the driver]
1. Switch ath10k to mac80211 per-station queues.
2. Change per-station queues to use NO_QUEUE qdisc and *not* ever stop
the kernel netdev queue (since there no longer is one).
3. Actively manage per-station queues with fq_codel.
4. DQL-like control system for managing hardware queues.

Just to clarify what I mean by #2, if I understand correctly, before
this patch, the driver+mac80211 keeps track of the total number of
packets in all the mac80211 queues.  When the total exceeds a fixed
amount (or when one of the per-station queues gets full?) mac80211
tells the kernel to stop sending in new packets, so they sit around in
the qdisc instead.  The problem with this behaviour is we probably
have a lot of packets for one station, and not many packets for other
stations, even if the netdev qdisc has plenty of packets still waiting
for those other stations.  When you then go to drain the mac80211
queues in a round-robin fashion, only the fullest queue (corresponding
to the busiest stream to the fastest station) can get optimal results.
The driver can then either send out from the fullest queue (unfair but
fast) or round robin using the non-full queues (fair but non-optimal
speed).

Upon implementing #2, we would essentially never tell the kernel to
stop sending packets; instead, it just always forwards them to
mac80211, which needs to learn how to drop them instead of providing
backpressure.  This moves the entire qdisc functionality into
mac80211, hence the use of NO_QUEUE.

It's then obvious that if you just did the obvious thing (tail drop),
you'll end up with high latency, so you added fq_codel to the mix.

However, as people on this thread have noticed, fq_codel is
complicated.  I'd like to be able to evaluate the performance impact
of each of the above steps separately.  In particular, my theory is
that if we implement #2 with just a simple FIFO queue per station,
then if we have two stations competing (one slow, one fast), and
dequeue aggregates using round robin, then we should get all of:

a) Full airtime utilization and max-length aggregates
and
b) High latency only on busy stations, but near-zero latency on idle
stations (because of round-robin servicing of the per-station queues).

Using just a tail drop implementation, it should be very easy for me
to test that (a) and (b) are true.  It should also be strictly equal
(one station) or better (multiple stations) than using mac80211 soft
queues with the pfifo_fast qdisc.  If that isn't what happens, then
we'll know something went wrong with that part of the code, and we can
debug that before moving on to a wifi-aware fq_codel.

So my request: do you mind splitting your patch into two patches, one
that implements just NO_QUEUE and per-station fifo tail drop, with a
second patch that converts the tail drop to fq_codel?

Another advantage of the split is that we could then test NO_QUEUE +
tail_drop + DQL.  Again, that should be strictly better than the
NO_QUEUE + tail_drop + fixed_driver_queue.  Then it might be easier to
debug the (much more fiddly) fq_codel on top.

Thoughts?

Thanks,

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-04-05 Thread Avery Pennarun
On Tue, Feb 23, 2016 at 3:05 PM, Johannes Berg
<johan...@sipsolutions.net> wrote:
> On Tue, 2016-02-23 at 13:43 -0500, Avery Pennarun wrote:
>> We're putting my version of the patch into our devices in order to be
>> able to try different values and see how it changes the percentage of
>> devices with nonzero 'pending' field in agg_status.  I'm hoping using
>> zero here will result in total elimination of the pending problem,
>> but we'll see.
>
> :)
> I for one would be interested in the result. And, if you find mac80211
> is at fault, knowing what happens there.

Here's the promised update!  The news is not as good as I had hoped.

Across the GFiber fleet, number of APs per day observing the problem
(ie. the pending field > 0 for more than a minute for any station),
with the original aggregation timeout, is about 41% (yikes).  With the
aggregation timeout set to zero, the number of APs observing the
problem in a day drops to about 10%.

Obviously this is a huge improvement, but the problem isn't completely
eliminated.  In retrospect that's not totally surprising, as there are
reasons other than an AP-side aggregation timeout that an aggregation
would need to be negotiated, and a race condition in aggregation queue
setup could happen at any of those times.  I was just hoping that
those other cases would be much less frequent than they apparently
are.

This test was with backports-20150525 on ath9k.  (We have newer
versions in the queue, but they haven't rolled out to our customers
yet.  Anyway, earlier in this thread, I was able to trigger the race
condition on much newer backports.  Unfortunately the current fix
makes my reproducible test case go away, but I don't know any reason
to assume the race condition is fixed.)

While we're here, unfortunately it turns out that just observing the
agg_status file can cause crashes (though not very often... except for
a few unlucky customers), probably due to a different race condition.
Any suggestions about this one?  Stack trace attached below.  (I think
the stack trace suggests a mac80211 problem?)

Thanks!

Avery


03/30,133400.674 Unable to handle kernel paging request at virtual
address 5b35da9e
03/30,133400.675 pgd = ac238000
03/30,133400.675 [5b35da9e] *pgd=
03/30,133400.675 Internal error: Oops: 5 [#1] PREEMPT SMP
03/30,133400.680 Modules linked in: ccm nf_conntrack_netlink
auto_bridge(O) fci(O) nfnetlink pktgen ath9k_htc(O) mwifiex_usb(O)
mwifiex(O) ath10k_pci(O) ath10k_core(O) arc4 ath9k(O) mac80211(O)
ath9k_common(O) ath9k_hw(O) ath(O) cfg80211(O) compat(O) bmoca(O)
xt_connmark ip6table_mangle xt_CLASSIFY iptable_mangle xt_helper
nf_nat_sip nf_conntrack_sip ip6t_REJECT ip6t_LOG nf_conntrack_ipv6
nf_defrag_ipv6 ip6table_filter ip6_tables nf_nat_rtsp
nf_conntrack_rtsp nf_nat_h323 nf_conntrack_h323 nf_nat_irc
nf_conntrack_irc nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre
nf_nat_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_ftp
nf_conntrack_ftp ipt_MASQUERADE ipt_REJECT ipt_LOG xt_limit xt_pkttype
xt_conntrack xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables pfe(O)
03/30,133400.753 CPU: 0Tainted: G   O  (3.2.26 #1)
03/30,133400.758 PC is at sta_agg_status_read+0xeb/0x170 [mac80211]
03/30,133400.764 LR is at sta_agg_status_read+0xd8/0x170 [mac80211]
03/30,133400.770 pc : [<838b4d0c>]lr : [<838b4cf9>]psr: 20010033
03/30,133400.770 sp : ac0c3c58  ip : 000f  fp : ac0c3c71
03/30,133400.782 r10: ac341800  r9 : af7f3b53  r8 : 0001
03/30,133400.787 r7 : 0007  r6 : 5b35da40  r5 : ac0c3f38  r4 : ac0c3d90
03/30,133400.794 r3 : ac0c3d8d  r2 : 838c6958  r1 : 01a8  r0 : ac0c3d90
03/30,133400.800 Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb
 Segment user
03/30,133400.807 Control: 50c53c7d  Table: 2c23804a  DAC: 0015
03/30,133400.813 Process psstat (pid: 25220, stack limit = 0xac0c22f0)
03/30,133400.819 Stack: (0xac0c3c58 to 0xac0c4000)
03/30,133400.824 3c40:
  0209 a6199050
03/30,133400.832 3c60: ac0c3d58 7e957143 0001 ac0c3f88 78656e00
69642074 676f6c61 6b6f745f
03/30,133400.840 3c80: 203a6e65 0a317830 09444954 09585209 4e4b5444
4e535309 58540909 4b544409
03/30,133400.848 3ca0: 6570094e 6e69646e 30300a67 09300909 30307830
30783009 09093030 78300930
03/30,133400.857 3cc0: 30093030 300a3030 30090931 30783009 78300930
09303030 30093009 09303078
03/30,133400.865 3ce0: 0a303030 09093230 78300930 30093030 30303078
09300909 30307830 30303009
03/30,133400.873 3d00: 0933300a 30093009 09303078 30307830 30090930
30783009 30300930 34300a30
03/30,133400.881 3d20: 09300909 30307830 30783009 09093030 78300930
30093030 300a3030 30090935
03/30,133400.889 3d40: 30783009 78300930 09303030 30093009 09303078
0a303030 09093630 78300931
03/30,133400.898 3d60: 30096632 32323678 31090966 38783009 32310933
30343230 3538 0937300a
03/30,133400.906 3d80: 30093109 09303578 31307830 31090961 3

Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-07 Thread Avery Pennarun
On Mon, Mar 7, 2016 at 10:09 AM, Felix Fietkau <n...@openwrt.org> wrote:
> On 2016-03-07 15:05, Avery Pennarun wrote:
>> On Fri, Mar 4, 2016 at 1:32 AM, Michal Kazior <michal.kaz...@tieto.com> 
>> wrote:
>>> On 4 March 2016 at 03:48, Tim Shepard <s...@alum.mit.edu> wrote:
>>> [...]
>>>> (I am interested in knowing what other mac80211 drivers have been
>>>>  modified to use the mac80211 intermediate software queues.   I know
>>>>  Michal mentioned he has patches for ath10k that are not yet released,
>>>>  and I know Felix is finishing up the mt76 driver which uses them.)
>>>
>>> Patches for ath10k are under review since quite some time now (but are
>>> not merged yet). The latest re-spin is:
>>>
>>>   http://lists.infradead.org/pipermail/ath10k/2016-March/006923.html
>>
>> Hi all, on Friday I had a chance to experiment with some of these
>> patches, specifically Tim's ath9k patch (to use intermediate queues),
>> plus MIchal's patch to use fq_codel with the intermediate queues.  I
>> didn't attempt any fine tuning; I just slapped them together to see
>> what happens.  (I tried applying Michal's ath10k patches too, but got
>> stuck since they seem to be applied against the upstream v4.4 kernel
>> and didn't merge cleanly with the latest mac80211 branch.  Maybe I was
>> doing something wrong.)
>>
>> Test setup:
>>AP (ath9k) -> 2x2 strong signal -> STA1 (mwifiex)
>> -> attenuator (-40 dB) -> 1x1 weak signal  -> STA2 (mwifiex)
>>
>> STA2 generally gets modulation levels around MCS0-2 and STA1 usually
>> gets something like MCS12-15.
>>
>> With or without this patch, results with TCP iperf were fishy - I
>> think packet loss patterns were particularly bad and caused 2-second
>> TCP retry timeouts occasionally - so I removed TCP from the test and
>> switched the UDP iperf instead.
>>
>> I ran isoping 
>> (https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/isoping.c)
>> from the AP to both stations to measure two-way latency during all
>> tests.  (I used -r2 for two packets/sec in each direction in order not
>> to affect the test results too much.)
>>
>> Overall results:
>>
>> - Running one iperf at a time, I saw ~45 Mbps to STA1 and ~7 Mbps to STA2.
>>
>> - Running both iperfs at once, without the patches, latencies got
>> extremely high (~600ms sometimes) and results were closer to
>> byte-fairness than airtime-fairness (ie. ~7 Mbps each).
>>
>> - Running both iperfs at once, with the patches, latencies were still
>> high (usually high 2-digit, sometimes low 3-digit latencies) but we
>> got closer to airtime-fairness than byte-fairness (~17 Mbps and ~2
>> Mbps).
>>
>> - With only one iperf running, without the patches, latencies were
>> high to both stations.  With the patches, latency was
>> mid-double-digits to the non-iperf station (pretty good!) while being
>> low-mid triple-digits to the busy iperf station.  This suggests that
>> we are getting per-station queuing (yay!) but does make me question
>> whether the fq_ in fq_codel was working.
>
> Please change the 'if (flow->txqi)' check in ieee80211_txq_enqueue to:
> if (flow->txqi && flow->txqi != txqi)
> This should hopefully fix the fq_ part ;)

Oops, I saw your message about that earlier and totally forgot to
apply the change.  But maybe that was for the best, because it doesn't
seem to uniformly make things better.

*Without* your change, I observe that my iperf3 session to STA1 (high
speed) seems to complain about a lot of out-of-order packets.  *With*
your change, the out-of-order complaints seem to go away, which is
nice.  The throughput measurements look about the same both ways.

However, *without* your change, isoping latency to STA1 (low speed)
seems to be pretty stable in the ~100ms range (although it fluctuates
a bit).  *With* your change, STA2 latency fluctuates wildly as low as
1.x ms (yay!) but as high as 800ms (boo).  STA1 latency is fairly low
in both cases.

I have to admit, I haven't read any of this code in enough detail to
have a guess as to why this might be. But I did switch back and forth
between the two versions a few times to confirm that it seems to be
repeatable.

Just to compare, I went back to a version that contains only Tim's
patch (intermediate queues) but not fq_codel.  That one seems to have
much less variability in the isoping times (~50-100ms under load).
The best case isn't as good, but the worst case is much less bad.
This suggests to me that maybe codel's per-station drop rate is
oscillating (perhaps it needs to ramp less quickly?).  I wonder if the
competing codels between stations also confuse each other: as one
ramps down, maybe the other one would be encouraged to ramp up?
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT] mac80211: implement fq_codel for software queuing

2016-03-07 Thread Avery Pennarun
On Fri, Mar 4, 2016 at 1:32 AM, Michal Kazior  wrote:
> On 4 March 2016 at 03:48, Tim Shepard  wrote:
> [...]
>> (I am interested in knowing what other mac80211 drivers have been
>>  modified to use the mac80211 intermediate software queues.   I know
>>  Michal mentioned he has patches for ath10k that are not yet released,
>>  and I know Felix is finishing up the mt76 driver which uses them.)
>
> Patches for ath10k are under review since quite some time now (but are
> not merged yet). The latest re-spin is:
>
>   http://lists.infradead.org/pipermail/ath10k/2016-March/006923.html

Hi all, on Friday I had a chance to experiment with some of these
patches, specifically Tim's ath9k patch (to use intermediate queues),
plus MIchal's patch to use fq_codel with the intermediate queues.  I
didn't attempt any fine tuning; I just slapped them together to see
what happens.  (I tried applying Michal's ath10k patches too, but got
stuck since they seem to be applied against the upstream v4.4 kernel
and didn't merge cleanly with the latest mac80211 branch.  Maybe I was
doing something wrong.)

Test setup:
   AP (ath9k) -> 2x2 strong signal -> STA1 (mwifiex)
-> attenuator (-40 dB) -> 1x1 weak signal  -> STA2 (mwifiex)

STA2 generally gets modulation levels around MCS0-2 and STA1 usually
gets something like MCS12-15.

With or without this patch, results with TCP iperf were fishy - I
think packet loss patterns were particularly bad and caused 2-second
TCP retry timeouts occasionally - so I removed TCP from the test and
switched the UDP iperf instead.

I ran isoping 
(https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/isoping.c)
from the AP to both stations to measure two-way latency during all
tests.  (I used -r2 for two packets/sec in each direction in order not
to affect the test results too much.)

Overall results:

- Running one iperf at a time, I saw ~45 Mbps to STA1 and ~7 Mbps to STA2.

- Running both iperfs at once, without the patches, latencies got
extremely high (~600ms sometimes) and results were closer to
byte-fairness than airtime-fairness (ie. ~7 Mbps each).

- Running both iperfs at once, with the patches, latencies were still
high (usually high 2-digit, sometimes low 3-digit latencies) but we
got closer to airtime-fairness than byte-fairness (~17 Mbps and ~2
Mbps).

- With only one iperf running, without the patches, latencies were
high to both stations.  With the patches, latency was
mid-double-digits to the non-iperf station (pretty good!) while being
low-mid triple-digits to the busy iperf station.  This suggests that
we are getting per-station queuing (yay!) but does make me question
whether the fq_ in fq_codel was working.

- More generally, the latencies were all still higher than expected.
I didn't investigate why this might be, but the obvious guess (which
Tim has agreed with) is that we need something BQL-like in addition to
the fq_codel layer.  The BQL-like thing is what Emmanuel's earlier
latency patch did with iwlwifi, with apparently good results.  If
someone wants to try making a similar patch for ath9k, I'd be happy to
help test it out.

Although things aren't yet nearly as good as I'd like to see them,
I'll note that Tim's and Michal's patches don't seem to make things
*worse*, at least in my setup, and do improve results in my test.  So
if they pass code review, it may make sense to apply them as one small
step forward to reducing wifi latency under load.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-02-23 Thread Avery Pennarun
On Tue, Feb 23, 2016 at 5:14 AM, Johannes Berg
<johan...@sipsolutions.net> wrote:
> On Tue, 2016-02-16 at 16:28 -0500, Avery Pennarun wrote:
>> Since around the beginning of time, ath9k aggregates have timed out
>> after
>> 5000 TU (around 5000ms) of inactivity, but nobody seems to be quite
>> sure
>> why, and this magic number seems to have migrated around from one
>> place to
>> another.  An openbsd mailing list recently had a patch to disable the
>> timeout completely, which they say matches some commercial routers:
>> https://www.mail-archive.com/tech@openbsd.org/msg29456.html
>>
>> Even in Linux, several non-ath9k drivers default to no timeout
>> already.  I
>> think changing it directly to zero would be safe, but to allow a more
>> structured investigation, let's make it configurable for now.
>>
> Since we just made it zero, perhaps we don't need this?
>
> Although perhaps we still want it to be able to debug it?

We're putting my version of the patch into our devices in order to be
able to try different values and see how it changes the percentage of
devices with nonzero 'pending' field in agg_status.  I'm hoping using
zero here will result in total elimination of the pending problem, but
we'll see.

It probably makes sense not to apply this upstream if the default
value is zero now anyway.

> Anyway - you shouldn't create a debugfs file and play with the extern
> stuff etc., let minstrel create the debugfs file in minstrel_ht_alloc()

Good point.  I had a feeling I was doing that in the wrong place :)

If people think this is important, I can respin the patch, otherwise
feel free to discard.

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mac80211: minstrel_ht: set default tx aggregation timeout to 0

2016-02-18 Thread Avery Pennarun
Acked-by: Avery Pennarun <apenw...@gmail.com>

This fixes serious problems on our platform, especially with iPhone 4
generation devices.


On Thu, Feb 18, 2016 at 1:49 PM, Felix Fietkau <n...@openwrt.org> wrote:
> The value 5000 was put here with the addition of the timeout field to
> ieee80211_start_tx_ba_session. It was originally added in mac80211 to
> save resources for drivers like iwlwifi, which only supports a limited
> number of concurrent aggregation sessions.
>
> Since iwlwifi does not use minstrel_ht and other drivers don't need
> this, 0 is a better default - especially since there have been
> recent reports of aggregation setup related issues reproduced with
> ath9k. This should improve stability without causing any adverse
> effects.
>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Felix Fietkau <n...@openwrt.org>
> ---
>  net/mac80211/rc80211_minstrel_ht.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/mac80211/rc80211_minstrel_ht.c 
> b/net/mac80211/rc80211_minstrel_ht.c
> index 3928dbd..a7d9227 100644
> --- a/net/mac80211/rc80211_minstrel_ht.c
> +++ b/net/mac80211/rc80211_minstrel_ht.c
> @@ -691,7 +691,7 @@ minstrel_aggr_check(struct ieee80211_sta *pubsta, struct 
> sk_buff *skb)
> if (likely(sta->ampdu_mlme.tid_tx[tid]))
> return;
>
> -   ieee80211_start_tx_ba_session(pubsta, tid, 5000);
> +   ieee80211_start_tx_ba_session(pubsta, tid, 0);
>  }
>
>  static void
> --
> 2.2.2
>
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

2016-02-16 Thread Avery Pennarun
On Wed, Feb 17, 2016 at 1:23 AM, Krishna Chaitanya
 wrote:
> From a quick glance of symptoms, i think the below patch is worth a
> try, even though
> i don't see you are doing any background scans for which this applies.
>
> https://patchwork.kernel.org/patch/8015321/

Thanks, Krishna.  We are in fact doing background scans occasionally,
however, none was in progress around the time of the glitch, and the
problem was still reproducible with background scans disabled.  We
also aren't combining AP and STA on the same radio (in this particular
use case).
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

2016-02-16 Thread Avery Pennarun
On Tue, Feb 16, 2016 at 5:05 PM, Johannes Berg
<johan...@sipsolutions.net> wrote:
> On Tue, 2016-02-16 at 16:28 -0500, Avery Pennarun wrote:
>> Changing default_agg_timeout to zero (as it is on most non-ath9k
>> drivers) makes the problem pretty much go away.  However, I think
>> it's because I'm just dodging the code path that triggers a race
>> condition.
>
> That does seem likely. Perhaps you could reproduce it while running
> mac80211 tracing? There should be a fair amount of information about
> aggregation and queue stops in there, though as you note queue stops
> aren't really happening, only aggregation related things. Perhaps the
> tracepoints for that aren't quite sufficient.

So far that hasn't seemed to help, although maybe you can read traces
better than I can.  The big problem is that the actual queue doesn't
seem to have stopped; it might be an ath9k bug.

>> Notes:
>>
>> - I'm using exactly the same ath9k driver (currently 20150525, but
>> we've   tried newer ones with no difference) on two totally different
>> platforms: a dual-core mindspeed c2k host CPU (ARMv7) with separate
>> ath9k, and a single-core QCA9531 (MIPS) with on-chip ath9k.
>>
>> - I've been unable to trigger the problem on the QCA9531, but I have
>> on MIPS.
>
> That's ... not what I would have expected, especially since the MIPS is
> single core. That makes the races stranger than expected.

Oops, typo.  The QCA9531 *is* MIPS.  The one where it triggers is the
dual-core ARM.

>> The aggregation code is... a little hairy.  Does anyone have any
>> guesses where I might look for the race condition?  Or better still,
>> a patch I can try?
>
> I'm not aware of any race conditions in the code right now :)

Aw.  That would have made it a lot easier!
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: debugfs var for the default aggregation timeout.

2016-02-16 Thread Avery Pennarun
Since around the beginning of time, ath9k aggregates have timed out after
5000 TU (around 5000ms) of inactivity, but nobody seems to be quite sure
why, and this magic number seems to have migrated around from one place to
another.  An openbsd mailing list recently had a patch to disable the
timeout completely, which they say matches some commercial routers:
https://www.mail-archive.com/tech@openbsd.org/msg29456.html

Even in Linux, several non-ath9k drivers default to no timeout already.  I
think changing it directly to zero would be safe, but to allow a more
structured investigation, let's make it configurable for now.

Signed-off-by: Avery Pennarun <apenw...@gmail.com>
---
 net/mac80211/debugfs_netdev.c  | 4 
 net/mac80211/rc80211_minstrel_ht.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)
 

diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c
index 37ea30e..5ae160b 100644
--- a/net/mac80211/debugfs_netdev.c
+++ b/net/mac80211/debugfs_netdev.c
@@ -715,6 +715,8 @@ static void add_mesh_config(struct ieee80211_sub_if_data 
*sdata)
 }
 #endif
 
+u32 default_agg_timeout = 5000;
+
 static void add_files(struct ieee80211_sub_if_data *sdata)
 {
if (!sdata->vif.debugfs_dir)
@@ -725,6 +727,8 @@ static void add_files(struct ieee80211_sub_if_data *sdata)
DEBUGFS_ADD(txpower);
DEBUGFS_ADD(user_power_level);
DEBUGFS_ADD(ap_power_level);
+   debugfs_create_u32("default_agg_timeout", 0600, sdata->vif.debugfs_dir,
+   _agg_timeout);
 
if (sdata->vif.type != NL80211_IFTYPE_MONITOR)
add_common_files(sdata);
diff --git a/net/mac80211/rc80211_minstrel_ht.c 
b/net/mac80211/rc80211_minstrel_ht.c
index 3928dbd..028d9d4 100644
--- a/net/mac80211/rc80211_minstrel_ht.c
+++ b/net/mac80211/rc80211_minstrel_ht.c
@@ -671,6 +671,8 @@ minstrel_downgrade_rate(struct minstrel_ht_sta *mi, u16 
*idx, bool primary)
}
 }
 
+extern u32 default_agg_timeout;
+
 static void
 minstrel_aggr_check(struct ieee80211_sta *pubsta, struct sk_buff *skb)
 {
@@ -691,7 +693,7 @@ minstrel_aggr_check(struct ieee80211_sta *pubsta, struct 
sk_buff *skb)
if (likely(sta->ampdu_mlme.tid_tx[tid]))
return;
 
-   ieee80211_start_tx_ba_session(pubsta, tid, 5000);
+   ieee80211_start_tx_ba_session(pubsta, tid, default_agg_timeout);
 }
 
 static void
-- 
2.7.0.rc3.207.g0ac5344

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

2016-02-16 Thread Avery Pennarun
Okay, I've made much more progress on this old thread.  I haven't actually
fixed the bug, which I suspect is a race condition only on multicore
machines, but I at least have better reproduction steps and a workaround.

The bug seems to trigger when three things happen at once:
1) Background interference causes retries
2) AP wants to send data to the STA, which has been idle for a while
3) We want to negotiate a new BA session from AP to STA.

Sometimes, the background interference will cause the time between ADDBA
Request (from AP) and ADDBA Response (from STA) to be longer than usual.  In
my tests, it's usually <1ms, but in high-interference situations I've seen
it be >3ms.  Sometimes, when the delay is longer, I see the symptom that the
agg_status file for the station in question starts showing TID#0's "pending"
column increasing slowly, until it eventually reaches 64.  A wifi capture on
a separate sniffer indicates that no data is being transmitted to that
station, although traffic to other stations (and broadcast/multicast)
continues unabated.  I guess this means the device's queues are themselves
not stopped, but the station's per-TID aggregation queue is stuck.

Twiddling the agg_status of a different queue (in this case TID#1) unblocks
TID#0:
echo "tx start 1" >/sys/kernel/debug/ieee80211/phy0/.../agg_status

So does having another aggregation-capable device join the network.  Having
an 802.11g-only device join the network does *not* unblock the queue.

However, trying to stop TID#0 doesn't help (and it also doesn't successfully
stop the aggregation):
echo "tx stop 0" >/sys/kernel/debug/ieee80211/phy0/.../agg_status

The following patch makes the problem easier to reproduce by letting you
turn the aggregation timeout way down.  For myself, I used a
default_agg_timeout of 500ms and just pinged repeatedly once per second from
the AP to STA.  This causes the aggregation sessions to be repeatedly
brought up and torn down, which triggers the problem for me within a few
minutes (when run on a channel with fairly high noise).

Changing default_agg_timeout to zero (as it is on most non-ath9k drivers)
makes the problem pretty much go away.  However, I think it's because I'm
just dodging the code path that triggers a race condition.

Notes:

- I'm using exactly the same ath9k driver (currently 20150525, but we've
  tried newer ones with no difference) on two totally different platforms: a
  dual-core mindspeed c2k host CPU (ARMv7) with separate ath9k, and a
  single-core QCA9531 (MIPS) with on-chip ath9k.

- I've been unable to trigger the problem on the QCA9531, but I have on
  MIPS.

The aggregation code is... a little hairy.  Does anyone have any guesses
where I might look for the race condition?  Or better still, a patch I can
try?


Avery Pennarun (1):
  mac80211: add a debugfs var for the default aggregation timeout.

 net/mac80211/debugfs_netdev.c  | 4 
 net/mac80211/rc80211_minstrel_ht.c | 4 +++-
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.0.rc3.207.g0ac5344

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

2015-11-03 Thread Avery Pennarun
[fixed ath9k list address.  sorry for the spam]

Hi all,

I have a pretty weird problem I've been chasing for a few weeks and
have narrowed it down, but not quite solved it.  It may be caused by
bugs in aggregation-related code.

Steps:
- Set up an ath9k-based Linux AP on an ARM processor (currently using
this version of backports, though I've tried older and newer versions
with no change: "backported from Linux (next-20150525-0-gc201847)
using backports backports-20150525-0-g49969bd")
- Join my iPhone 4S (running iOS 7.1.2) to the network
- Use it for a while
- Eventually it will stay connected, but Internet access doesn't work
- Wireless packet captures show that packets are received *from* the
iPhone, and ACKs are returned for those packets from the ath9k, and
those packets are correctly forwarded to the AP's br0 interface.  But
outgoing packets show up on br0 and wlan0 with tcpdump, but never make
it onto the air.
- Putting the iPhone 4S into airplane mode and then letting it
reconnecting will fix it for a few more seconds/minutes before it
stops again.

More details:
- It only seems to happen to my iPhone 4S client (never seen it with a
different client).
- It only seems to happen with my ath9k AP.
- It only seems to happen on my home network (another instance of the
same AP hardware on another network doesn't trigger the problem).
- It only seems to happen when no other 802.11n-capable devices are
connected to the same AP.
- The moment I join an 802.11n-capable device to the AP, traffic
instantly unblocks (see packet capture below).
- Joining an 802.11g-only device (no aggregation) does *not* unblock traffic.
- Disabling encryption and turning wmm_enable on and off have no effect.
- Disabling 802.11n support on the AP (so that everyone has to use
802.11g) makes the problem go away.
- 'ip -s link show dev wlan0' shows tx packet counters continuing to
increase during the outage, even though packets aren't flowing.
- I applied a patch from Tim Shepard to track the most recent tx
attempt, acked tx, and rx packet times inside mac80211.  According to
this data, mac80211 thinks rx happened at most a couple of seconds ago
(as expected).  The most recent tx was acked, but it was back around
the time the outage started.  Note that this disagrees with 'ip -s
link' and tcpdump, which think they transmitted much more recently
than that.  (The patch is here:
https://gfiber-review.googlesource.com/#/c/1250/ )

I captured a pcap of a new 802.11n-capable device joining the network
and unblocking the transmit.  The action starts around frame 325:
   http://apenwarr.ca/tmp/iPod4-fixing-iPhone4-trimmed.pcap.gz

In this pcap, the main players are:
   ath9k AP: 88:dc:96:08:60:50
   iPhone 4S with the problem: e4:25:e7:73:e6:31
   New client fixing the problem (iPod 4): 18:e7:f4:7e:c1:42

Observations from the pcap:
- Upstream packets (iPhone->ath9k) are received and acked (see eg. frame 154)
- Beacons from the ath9k show an empty TIM bitmap until the iPod
joins, then it's nonempty and things unblock.

Does anyone have any thoughts about what to look for here?

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins

2015-11-03 Thread Avery Pennarun
Hi all,

I have a pretty weird problem I've been chasing for a few weeks and
have narrowed it down, but not quite solved it.  It may be caused by
bugs in aggregation-related code.

Steps:
- Set up an ath9k-based Linux AP on an ARM processor (currently using
this version of backports, though I've tried older and newer versions
with no change: "backported from Linux (next-20150525-0-gc201847)
using backports backports-20150525-0-g49969bd")
- Join my iPhone 4S (running iOS 7.1.2) to the network
- Use it for a while
- Eventually it will stay connected, but Internet access doesn't work
- Wireless packet captures show that packets are received *from* the
iPhone, and ACKs are returned for those packets from the ath9k, and
those packets are correctly forwarded to the AP's br0 interface.  But
outgoing packets show up on br0 and wlan0 with tcpdump, but never make
it onto the air.
- Putting the iPhone 4S into airplane mode and then letting it
reconnecting will fix it for a few more seconds/minutes before it
stops again.

More details:
- It only seems to happen to my iPhone 4S client (never seen it with a
different client).
- It only seems to happen with my ath9k AP.
- It only seems to happen on my home network (another instance of the
same AP hardware on another network doesn't trigger the problem).
- It only seems to happen when no other 802.11n-capable devices are
connected to the same AP.
- The moment I join an 802.11n-capable device to the AP, traffic
instantly unblocks (see packet capture below).
- Joining an 802.11g-only device (no aggregation) does *not* unblock traffic.
- Disabling encryption and turning wmm_enable on and off have no effect.
- Disabling 802.11n support on the AP (so that everyone has to use
802.11g) makes the problem go away.
- 'ip -s link show dev wlan0' shows tx packet counters continuing to
increase during the outage, even though packets aren't flowing.
- I applied a patch from Tim Shepard to track the most recent tx
attempt, acked tx, and rx packet times inside mac80211.  According to
this data, mac80211 thinks rx happened at most a couple of seconds ago
(as expected).  The most recent tx was acked, but it was back around
the time the outage started.  Note that this disagrees with 'ip -s
link' and tcpdump, which think they transmitted much more recently
than that.  (The patch is here:
https://gfiber-review.googlesource.com/#/c/1250/ )

I captured a pcap of a new 802.11n-capable device joining the network
and unblocking the transmit.  The action starts around frame 325:
   http://apenwarr.ca/tmp/iPod4-fixing-iPhone4-trimmed.pcap.gz

In this pcap, the main players are:
   ath9k AP: 88:dc:96:08:60:50
   iPhone 4S with the problem: e4:25:e7:73:e6:31
   New client fixing the problem (iPod 4): 18:e7:f4:7e:c1:42

Observations from the pcap:
- Upstream packets (iPhone->ath9k) are received and acked (see eg. frame 154)
- Beacons from the ath9k show an empty TIM bitmap until the iPod
joins, then it's nonempty and things unblock.

Does anyone have any thoughts about what to look for here?

Have fun,

Avery
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ath10k: Improve performance by reducing tx_lock contention.

2015-07-22 Thread Avery Pennarun
On Wed, Jul 22, 2015 at 2:16 AM, Kalle Valo kv...@codeaurora.org wrote:
 Marty Faltesek mfalte...@google.com writes:
 From: Qi Zhou qiz...@google.com

 During tx completion, tx_lock is held for longer than required, preventing
 efficient refill of htt-pending_tx. Refactor the code so that only MSDU
 related operations are protected by the lock.

 Improves downstream performance on a 3x3 client from 495 to 580 Mbps.

 It would be good to mention the actual platform/CPU as I guess this
 improvement is platform specific?

That's a good idea to mention in the commit message, although given
that it's just an in-driver lock contention patch, it probably affects
any underpowered multicore CPU.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Capturing hardware-decrypted packets in monitor mode on ath9k/ath10k

2015-03-09 Thread Avery Pennarun
Hi,

On a station or AP device, I'd like to capture packets in monitor mode
(ie. with radiotap headers).  Normally this captures the encrypted
packets as they appear on the air.  In my case, I'd like to capture
the *decrypted* packets where possible (ie. packets communicating with
this node, where the local machine already knows the session key and
is presumably decrypting the packets anyway so that it can carry on
the session).

I know wireshark (etc) can decrypt packets for a given session if you
capture the EAPOL frames.  The advantages of having the driver do it
in hardware are a) hopefully less performance impact, and b) you can
easily start capturing at any time, even post EAPOL, because the
driver already has a cached copy of the keys.

Is there a flag somewhere I can set to make this happen?  Is this even
a feature supported by most hardware?

Thanks,

Avery
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Open Source RRM Hand-Over Optimization (WAS: Throughput regression with `tcp: refine TSO autosizing`)

2015-02-02 Thread Avery Pennarun
On Mon, Feb 2, 2015 at 11:44 AM, Björn Smedman b...@anyfi.net wrote:
 On Mon, Feb 2, 2015 at 5:21 AM, Avery Pennarun apenw...@google.com wrote:
 While there is definitely some work to be done in handoff, it seems
 like there are some find implementations of this already in existence.
 Several brands of enterprise access point setups seem to do well at
 this.  It would be nice if they interoperated, I guess.

 The fact that there's no open source version of this kind of handoff
 feature bugs me, but we are working on it here and the work is all
 planned to be open source, for example: (very early version)
 https://gfiber.googlesource.com/vendor/google/platform/+/master/waveguide/

 We've got an SDN-inspired architecture with 802.11 frame tunneling (a
 la CAPWAP), airtime fairness, infrastructure initiated hand-over,
 Opportunistic Key Caching (OKC), IEEE 802.11r Fast BSS Transition and
 a few more goodies. It's currently free as in beer
 (http://anyfi.net/software,
 https://github.com/carrierwrt/carrierwrt/pull/7 and
 http://www.anyfinetworks.com/download) up to 100 APs, but we're
 definitely going to open source in one form or another.

 We've also tried to raise some interest in fixing up CAPWAP
 (https://www.ietf.org/mail-archive/web/opsawg/current/msg03196.html),
 which is (unfortunately) the best open standard at the moment.
 Interest seems marginal though...

This sounds cool.  Is the CAPWAP/encapsulation stuff separable from
the rest?  At 802.11ac speeds, a super fast WAN link, and a low-cost
SoC, too many layers can be a killer.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing`

2015-02-01 Thread Avery Pennarun
On Sun, Feb 1, 2015 at 6:34 PM, Andrew McGregor andrewm...@gmail.com wrote:
 I missed one item in my list of potential improvements: the most braindead
 thing 802.11 has to say about rates is that broadcast and multicast packets
 should be sent at 'the lowest basic rate in the current supported rate set',
 which is really wasteful.  There are a couple of ways of dealing with this:
 one, ignore the standard and pick the rate that is most likely to get the
 frame to as many neighbours as possible (by a scan of the Minstrel tables).
 Or two, fan it out as unicast, which might well take less airtime (due to
 aggregation) as well as being much more likely to be delivered, since you
 get ACKs and retries by doing that.

As far as I can see, the only sensible thing to do with
multicast/broadcast is some variation of the unicast fanout, unless
you've got a truly huge number of nodes.  I don't know of any
protocols (certainly not video streams) that actually work well with
the kind of packet loss you see at medium/long range with wifi if
retransmits aren't used.  I've heard that openwrt already has a patch
included that does this kind of fanout at the bridge layer.

I've also heard of a new reliable multicast in some newer 802.11
variant, which essentially sends out a single multicast packet and
expects an ACK from each intended recipient.  Other than adding
complexity, it seems like the best of both worlds.
--
To unsubscribe from this list: send the line unsubscribe linux-wireless in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing`

2015-02-01 Thread Avery Pennarun
On Sun, Feb 1, 2015 at 9:43 AM,  dpr...@reed.com wrote:
 Just to clarify, managing queueing in a single access point WiFi network is
 only a small part of the problem of fixing the rapidly degrading performance
 of WiFi based systems.

Can you explain what you mean by rapidly degrading?  The performance
in odd situations is certainly not inspirational, but I haven't
noticed it getting worse over time.

 Similarly, mesh routing is only a small part of the
 problem with the scalability of cooperative meshes based on the WiFi MAC.

That's certainly true.  Not to say the mesh routing algorithms are
much good either.

  Also, as we noted
 earlier, handoff from one next hop to another is a huge problem with
 performance in practical deployments (a factor of 10x at least, just in
 that).

While there is definitely some work to be done in handoff, it seems
like there are some find implementations of this already in existence.
Several brands of enterprise access point setups seem to do well at
this.  It would be nice if they interoperated, I guess.

The fact that there's no open source version of this kind of handoff
feature bugs me, but we are working on it here and the work is all
planned to be open source, for example: (very early version)
https://gfiber.googlesource.com/vendor/google/platform/+/master/waveguide/

 Propagation information is not used at all when 802.11 systems share a
 channel, even in single AP deployments, yet all stations can measure
 propagation quite accurately in their hardware.

802.11k seems to provide for sharing this information.  But I'm not
clear what I should use it for. :)

 Finally, Listen-before-talk is highly wasteful for two reasons: 1) any
 random radio noise from other sources unnecessarily degrades communications 
 [...]
 2) the transmitter cannot tell when the intended receiver will be perfectly
 able to decode the signal without interference with the station it hears
 (this second point is actually proven in theory in a paper by Jon Peha that
 argued against trivial etiquettes as a mechanism for sharing among
 uncooperative and non-interoperable stations).

I've thought quite a bit about your point #2 above, but I don't know
which direction to pursue.  The idea is that sometimes just shout
over the background noise is a globally optimal solution, right?  The
question seems to be to figure out when that is true and when it
isn't.

 I agree that, to the extent that managing queues in a single box or a single
 operating system doesn't require cooperation, it's much easier to get such
 things into the market.  That's why CeroWRT has been as effective as it has
 been.  But has Microsoft done anything at all about it?   Do the better ECN
 signals that can arise from good queue management get used by the TCP
 endpoints, or for that matter UDP-based protocol endpoints?

If we don't know the answer to the questions, then that is itself the
problem.  It's a lot easier to say, hey, ChromeOS and MacOS have good
network performance but Microsoft has bad network performance, if it's
true and we have good reproducible tests to demonstrate that.

 The reason no one is making progress on any of these particular issues is
 that there is no coordination at the systems level around creating rising
 tides that lift all boats in the WiFi-ish space.  It's all about ripping the
 competition by creating stuff that can sell better than the other guys'
 stuff, and avoiding cooperation at all costs.
 [...]
 But the big wins in making WiFi better are going begging.  As WiFi becomes
 more closed, as it will as the major Internet Access Providers and Gadget
 builders (Google, Apple) start excluding innovators in wireless from the
 market by closed, proprietary solutions, the problem WILL get worse.  You
 won't be able to fix those problems at all.  If you have a solution you will
 have to convince the oligopoly to even bother trying it.

As someone who works at Google Fiber (which is both a gadget maker and
an ISP) and who pushes all day long for our wifi stuff to be open
source, I'm slightly offended to be lumped in with other vendors in
your story :)  I think the ChromeOS team (which insists on only open
source wifi drivers in all chromebooks) would feel similarly.  We are
lucky to have defined our competitive advantage as something other
than short-lived slight improvements in wifi that will soon be
wastefully duplicated by everyone else.

That said, I see what you mean about the general state of the
industry.  The way to fix it is the way Linux always fixes it: make
the open source version so much better that building a proprietary
one, just to gather a small incremental advantage, is a huge waste of
time and effort.  Work on minstrel and fq_codel go really far here.

 I personally think that things like promoting semi-closed, essentially
 proprietary ESSID-based bridged distribution systems as good ideas are
 counterproductive to this goal.  But that's perhaps too radical for this
 crowd.

Not