Re: Avery's notes from LPC2016 wireless track (Santa Fe)
On Thu, Nov 3, 2016 at 7:55 PM, Kathy Giori <kathy.gi...@gmail.com> wrote: > On Thu, Nov 3, 2016 at 1:43 PM, Avery Pennarun <apenw...@gmail.com> wrote: >> We talked about many topics at the Linux Plumbers' Conference in Santa >> Fe on Tuesday. I took fairly detailed notes, which you can find at >> the links below. > > Great notes Avery. Did you happen to take note of how many > participants there were? Or was the attendee list posted on the wiki > beforehand fairly accurate? > https://wireless.wiki.kernel.org/en/developers/summits/santa-fe-2016 There were quite a few people not on the list. Google sent an extra-large contingent last year. People estimated about 50% of the wireless meeting were Googlers, which comes to maybe 15 out of around 30-ish. Google is doing a lot of (sometimes redundant :)) wireless work lately. > Was there any talk about having more frequent "live" discussions of > these topics (via video conf or conf call or scheduled IRC), to help > overall collaboration without having to wait for the next f2f summit? > Mailing list interaction doesn't seem to elicit the same energy as > occurs over live dialog. Quarterly? It didn't come up. Honestly, it feels to me like 6 months is the right cadence. A lot of work does happen in the background, such as the fq_codel work (yay!) which was definitely launched by one or two of these, but proceeded well afterwards. Hope you're doing well at whatever you're doing now! Have fun, Avery
Re: Avery's notes from LPC2016 wireless track (Santa Fe)
On Thu, Nov 3, 2016 at 5:55 PM, Barry Daywrote: > Thanks for that. Can I take this as meaning there won't be any videos? > I would like to have seen Jes Sorensen's talk on rtl8xxxu As far as I know, no talks at this LPC were recorded.
Avery's notes from LPC2016 wireless track (Santa Fe)
Hi all, We talked about many topics at the Linux Plumbers' Conference in Santa Fe on Tuesday. I took fairly detailed notes, which you can find at the links below. Fancy html commentable version: https://docs.google.com/document/d/1Cr2bEf23wLkhiXXtyuBJtvvpb9xvCw7zsU-m1q1g4tA/edit Less fancy version that (I think) will not ask for a Google account (let me know if it gives you trouble): https://docs.google.com/document/u/1/d/e/2PACX-1vQXbVQ-3zQt3Bcr3OWfwzbw_C49tTvf0ed8Hmf7b20E6tXc3a40tWZmPku49iGDE-OhgxNmO_lkkHEn/pub Thanks, everyone, for the great discussion! Have fun, Avery
Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.
On Sat, Apr 9, 2016 at 9:59 PM, bruce m beachwrote: >> If any people more familiar with ARM are reading this - does the value >> 0x5b35da40 ring a bell? > > It could be anything. It depends on the chip implementation. For instance on > an > exynos that is an undefined region. What device are we talking about? This is a mindspeed c2k CPU, in case that helps at all. But I'm guessing it really is just some pointer garbage. The only way to trigger the crash is to do (something) with an attached station at the same moment as reading the agg_status file. Some station types trigger it more than others, but I'm not sure which. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.
On Fri, Apr 8, 2016 at 4:31 AM, Avery Pennarun <apenw...@gmail.com> wrote: > On Fri, Apr 8, 2016 at 3:15 AM, Johannes Berg <johan...@sipsolutions.net> > wrote: >> On Fri, 2016-04-08 at 09:01 +0200, Johannes Berg wrote: >>> On Fri, 2016-04-08 at 08:56 +0200, Johannes Berg wrote: >>> > On Thu, 2016-04-07 at 21:32 -0400, Avery Pennarun wrote: >>> > > Yes. Here it is: >>> > > http://apenwarr.ca/tmp/mac80211-agg-status-crash.ko >>> > > >>> > Unfortunately there are no debug symbols in this file, so it >>> > doesn't >>> > help me much. I can't even seem to get objdump to disassemble it >>> > correctly: looks like the file is in thumb, going from things >>> > like R_ARM_THM_CALL relocations, but even -Mforce-thumb doesn't >>> > seem >>> > to DRT; sta_agg_status_read+0xeb isn't even a valid instruction >>> > offset in regular ARM mode. >>> > >>> It *seems* that it most likely crashes on the first access to tid_tx, >>> which is consistent with the story of disabling TX aggregation >>> timeouts >>> reducing the chances. >>> >>> So I guess we have to look for some TX aggregation teardown RCU >>> pointer problem? >> >> Can't find anything. The only other thing I saw now is that the TID >> appears to be 7 (in r7), might be worth looking for whether that's a >> common thing or not? > > Just to be clear, this crash is only from *reading* the agg_status > files. I don't know if the crashiness reduces when disabling the > aggregation timeouts, since that's a separate bug (in which the queue > gets stuck and the 'pending' column of this file just keeps > increasing). > > I'll try twiddling some options again tomorrow and see if I can get > one with proper debug symbols. For what it's worth, this platform is > "ARMv7 Processor rev 1 (v7l)" and the gcc build is made for Cortex A9. > You can find an x86 build of our toolchain in the git repo at > https://gfiber.googlesource.com/toolchains/mindspeed. Updated .ko file that definitely has debug symbols this time: http://apenwarr.ca/tmp/mac80211-agg-status-crash-debugsyms.ko a gdb compiled for x86-64 that can definitely read the above .ko file: http://apenwarr.ca/tmp/arm-gdb Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.
On Fri, Apr 8, 2016 at 3:15 AM, Johannes Berg <johan...@sipsolutions.net> wrote: > On Fri, 2016-04-08 at 09:01 +0200, Johannes Berg wrote: >> On Fri, 2016-04-08 at 08:56 +0200, Johannes Berg wrote: >> > On Thu, 2016-04-07 at 21:32 -0400, Avery Pennarun wrote: >> > > Yes. Here it is: >> > > http://apenwarr.ca/tmp/mac80211-agg-status-crash.ko >> > > >> > Unfortunately there are no debug symbols in this file, so it >> > doesn't >> > help me much. I can't even seem to get objdump to disassemble it >> > correctly: looks like the file is in thumb, going from things >> > like R_ARM_THM_CALL relocations, but even -Mforce-thumb doesn't >> > seem >> > to DRT; sta_agg_status_read+0xeb isn't even a valid instruction >> > offset in regular ARM mode. >> > >> It *seems* that it most likely crashes on the first access to tid_tx, >> which is consistent with the story of disabling TX aggregation >> timeouts >> reducing the chances. >> >> So I guess we have to look for some TX aggregation teardown RCU >> pointer problem? > > Can't find anything. The only other thing I saw now is that the TID > appears to be 7 (in r7), might be worth looking for whether that's a > common thing or not? Just to be clear, this crash is only from *reading* the agg_status files. I don't know if the crashiness reduces when disabling the aggregation timeouts, since that's a separate bug (in which the queue gets stuck and the 'pending' column of this file just keeps increasing). I'll try twiddling some options again tomorrow and see if I can get one with proper debug symbols. For what it's worth, this platform is "ARMv7 Processor rev 1 (v7l)" and the gcc build is made for Cortex A9. You can find an x86 build of our toolchain in the git repo at https://gfiber.googlesource.com/toolchains/mindspeed. Thanks for looking into it :) Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] mac80211: implement fair queuing per txq
On Fri, Mar 25, 2016 at 5:27 AM, Michal Kaziorwrote: > mac80211's software queues were designed to work > very closely with device tx queues. They are > required to make use of 802.11 packet aggregation > easily and efficiently. > > However the logic imposed a per-AC queue limit. > With the limit too small mac80211 wasn't be able > to guarantee fairness across TIDs nor stations > because single burst to a slow station could > monopolize queues and reach per-AC limit > preventing traffic from other stations being > queued into mac80211's software queues. Having the > limit too large would make smart qdiscs, e.g. > fq_codel, a lot less efficient as they are > designed on the premise that they are very close > to the actualy device tx queues. As usual, I'm way behind on everything, but I have been testing this patch series in the background (no clear results to report yet) and wanted to comment at a very high level. I think you are actually doing several stages of improvements all at once here: [0. Baseline: one big queue going into the driver] 1. Switch ath10k to mac80211 per-station queues. 2. Change per-station queues to use NO_QUEUE qdisc and *not* ever stop the kernel netdev queue (since there no longer is one). 3. Actively manage per-station queues with fq_codel. 4. DQL-like control system for managing hardware queues. Just to clarify what I mean by #2, if I understand correctly, before this patch, the driver+mac80211 keeps track of the total number of packets in all the mac80211 queues. When the total exceeds a fixed amount (or when one of the per-station queues gets full?) mac80211 tells the kernel to stop sending in new packets, so they sit around in the qdisc instead. The problem with this behaviour is we probably have a lot of packets for one station, and not many packets for other stations, even if the netdev qdisc has plenty of packets still waiting for those other stations. When you then go to drain the mac80211 queues in a round-robin fashion, only the fullest queue (corresponding to the busiest stream to the fastest station) can get optimal results. The driver can then either send out from the fullest queue (unfair but fast) or round robin using the non-full queues (fair but non-optimal speed). Upon implementing #2, we would essentially never tell the kernel to stop sending packets; instead, it just always forwards them to mac80211, which needs to learn how to drop them instead of providing backpressure. This moves the entire qdisc functionality into mac80211, hence the use of NO_QUEUE. It's then obvious that if you just did the obvious thing (tail drop), you'll end up with high latency, so you added fq_codel to the mix. However, as people on this thread have noticed, fq_codel is complicated. I'd like to be able to evaluate the performance impact of each of the above steps separately. In particular, my theory is that if we implement #2 with just a simple FIFO queue per station, then if we have two stations competing (one slow, one fast), and dequeue aggregates using round robin, then we should get all of: a) Full airtime utilization and max-length aggregates and b) High latency only on busy stations, but near-zero latency on idle stations (because of round-robin servicing of the per-station queues). Using just a tail drop implementation, it should be very easy for me to test that (a) and (b) are true. It should also be strictly equal (one station) or better (multiple stations) than using mac80211 soft queues with the pfifo_fast qdisc. If that isn't what happens, then we'll know something went wrong with that part of the code, and we can debug that before moving on to a wifi-aware fq_codel. So my request: do you mind splitting your patch into two patches, one that implements just NO_QUEUE and per-station fifo tail drop, with a second patch that converts the tail drop to fq_codel? Another advantage of the split is that we could then test NO_QUEUE + tail_drop + DQL. Again, that should be strictly better than the NO_QUEUE + tail_drop + fixed_driver_queue. Then it might be easier to debug the (much more fiddly) fq_codel on top. Thoughts? Thanks, Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.
On Tue, Feb 23, 2016 at 3:05 PM, Johannes Berg <johan...@sipsolutions.net> wrote: > On Tue, 2016-02-23 at 13:43 -0500, Avery Pennarun wrote: >> We're putting my version of the patch into our devices in order to be >> able to try different values and see how it changes the percentage of >> devices with nonzero 'pending' field in agg_status. I'm hoping using >> zero here will result in total elimination of the pending problem, >> but we'll see. > > :) > I for one would be interested in the result. And, if you find mac80211 > is at fault, knowing what happens there. Here's the promised update! The news is not as good as I had hoped. Across the GFiber fleet, number of APs per day observing the problem (ie. the pending field > 0 for more than a minute for any station), with the original aggregation timeout, is about 41% (yikes). With the aggregation timeout set to zero, the number of APs observing the problem in a day drops to about 10%. Obviously this is a huge improvement, but the problem isn't completely eliminated. In retrospect that's not totally surprising, as there are reasons other than an AP-side aggregation timeout that an aggregation would need to be negotiated, and a race condition in aggregation queue setup could happen at any of those times. I was just hoping that those other cases would be much less frequent than they apparently are. This test was with backports-20150525 on ath9k. (We have newer versions in the queue, but they haven't rolled out to our customers yet. Anyway, earlier in this thread, I was able to trigger the race condition on much newer backports. Unfortunately the current fix makes my reproducible test case go away, but I don't know any reason to assume the race condition is fixed.) While we're here, unfortunately it turns out that just observing the agg_status file can cause crashes (though not very often... except for a few unlucky customers), probably due to a different race condition. Any suggestions about this one? Stack trace attached below. (I think the stack trace suggests a mac80211 problem?) Thanks! Avery 03/30,133400.674 Unable to handle kernel paging request at virtual address 5b35da9e 03/30,133400.675 pgd = ac238000 03/30,133400.675 [5b35da9e] *pgd= 03/30,133400.675 Internal error: Oops: 5 [#1] PREEMPT SMP 03/30,133400.680 Modules linked in: ccm nf_conntrack_netlink auto_bridge(O) fci(O) nfnetlink pktgen ath9k_htc(O) mwifiex_usb(O) mwifiex(O) ath10k_pci(O) ath10k_core(O) arc4 ath9k(O) mac80211(O) ath9k_common(O) ath9k_hw(O) ath(O) cfg80211(O) compat(O) bmoca(O) xt_connmark ip6table_mangle xt_CLASSIFY iptable_mangle xt_helper nf_nat_sip nf_conntrack_sip ip6t_REJECT ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_nat_rtsp nf_conntrack_rtsp nf_nat_h323 nf_conntrack_h323 nf_nat_irc nf_conntrack_irc nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE ipt_REJECT ipt_LOG xt_limit xt_pkttype xt_conntrack xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables pfe(O) 03/30,133400.753 CPU: 0Tainted: G O (3.2.26 #1) 03/30,133400.758 PC is at sta_agg_status_read+0xeb/0x170 [mac80211] 03/30,133400.764 LR is at sta_agg_status_read+0xd8/0x170 [mac80211] 03/30,133400.770 pc : [<838b4d0c>]lr : [<838b4cf9>]psr: 20010033 03/30,133400.770 sp : ac0c3c58 ip : 000f fp : ac0c3c71 03/30,133400.782 r10: ac341800 r9 : af7f3b53 r8 : 0001 03/30,133400.787 r7 : 0007 r6 : 5b35da40 r5 : ac0c3f38 r4 : ac0c3d90 03/30,133400.794 r3 : ac0c3d8d r2 : 838c6958 r1 : 01a8 r0 : ac0c3d90 03/30,133400.800 Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user 03/30,133400.807 Control: 50c53c7d Table: 2c23804a DAC: 0015 03/30,133400.813 Process psstat (pid: 25220, stack limit = 0xac0c22f0) 03/30,133400.819 Stack: (0xac0c3c58 to 0xac0c4000) 03/30,133400.824 3c40: 0209 a6199050 03/30,133400.832 3c60: ac0c3d58 7e957143 0001 ac0c3f88 78656e00 69642074 676f6c61 6b6f745f 03/30,133400.840 3c80: 203a6e65 0a317830 09444954 09585209 4e4b5444 4e535309 58540909 4b544409 03/30,133400.848 3ca0: 6570094e 6e69646e 30300a67 09300909 30307830 30783009 09093030 78300930 03/30,133400.857 3cc0: 30093030 300a3030 30090931 30783009 78300930 09303030 30093009 09303078 03/30,133400.865 3ce0: 0a303030 09093230 78300930 30093030 30303078 09300909 30307830 30303009 03/30,133400.873 3d00: 0933300a 30093009 09303078 30307830 30090930 30783009 30300930 34300a30 03/30,133400.881 3d20: 09300909 30307830 30783009 09093030 78300930 30093030 300a3030 30090935 03/30,133400.889 3d40: 30783009 78300930 09303030 30093009 09303078 0a303030 09093630 78300931 03/30,133400.898 3d60: 30096632 32323678 31090966 38783009 32310933 30343230 3538 0937300a 03/30,133400.906 3d80: 30093109 09303578 31307830 31090961 3
Re: [RFC/RFT] mac80211: implement fq_codel for software queuing
On Mon, Mar 7, 2016 at 10:09 AM, Felix Fietkau <n...@openwrt.org> wrote: > On 2016-03-07 15:05, Avery Pennarun wrote: >> On Fri, Mar 4, 2016 at 1:32 AM, Michal Kazior <michal.kaz...@tieto.com> >> wrote: >>> On 4 March 2016 at 03:48, Tim Shepard <s...@alum.mit.edu> wrote: >>> [...] >>>> (I am interested in knowing what other mac80211 drivers have been >>>> modified to use the mac80211 intermediate software queues. I know >>>> Michal mentioned he has patches for ath10k that are not yet released, >>>> and I know Felix is finishing up the mt76 driver which uses them.) >>> >>> Patches for ath10k are under review since quite some time now (but are >>> not merged yet). The latest re-spin is: >>> >>> http://lists.infradead.org/pipermail/ath10k/2016-March/006923.html >> >> Hi all, on Friday I had a chance to experiment with some of these >> patches, specifically Tim's ath9k patch (to use intermediate queues), >> plus MIchal's patch to use fq_codel with the intermediate queues. I >> didn't attempt any fine tuning; I just slapped them together to see >> what happens. (I tried applying Michal's ath10k patches too, but got >> stuck since they seem to be applied against the upstream v4.4 kernel >> and didn't merge cleanly with the latest mac80211 branch. Maybe I was >> doing something wrong.) >> >> Test setup: >>AP (ath9k) -> 2x2 strong signal -> STA1 (mwifiex) >> -> attenuator (-40 dB) -> 1x1 weak signal -> STA2 (mwifiex) >> >> STA2 generally gets modulation levels around MCS0-2 and STA1 usually >> gets something like MCS12-15. >> >> With or without this patch, results with TCP iperf were fishy - I >> think packet loss patterns were particularly bad and caused 2-second >> TCP retry timeouts occasionally - so I removed TCP from the test and >> switched the UDP iperf instead. >> >> I ran isoping >> (https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/isoping.c) >> from the AP to both stations to measure two-way latency during all >> tests. (I used -r2 for two packets/sec in each direction in order not >> to affect the test results too much.) >> >> Overall results: >> >> - Running one iperf at a time, I saw ~45 Mbps to STA1 and ~7 Mbps to STA2. >> >> - Running both iperfs at once, without the patches, latencies got >> extremely high (~600ms sometimes) and results were closer to >> byte-fairness than airtime-fairness (ie. ~7 Mbps each). >> >> - Running both iperfs at once, with the patches, latencies were still >> high (usually high 2-digit, sometimes low 3-digit latencies) but we >> got closer to airtime-fairness than byte-fairness (~17 Mbps and ~2 >> Mbps). >> >> - With only one iperf running, without the patches, latencies were >> high to both stations. With the patches, latency was >> mid-double-digits to the non-iperf station (pretty good!) while being >> low-mid triple-digits to the busy iperf station. This suggests that >> we are getting per-station queuing (yay!) but does make me question >> whether the fq_ in fq_codel was working. > > Please change the 'if (flow->txqi)' check in ieee80211_txq_enqueue to: > if (flow->txqi && flow->txqi != txqi) > This should hopefully fix the fq_ part ;) Oops, I saw your message about that earlier and totally forgot to apply the change. But maybe that was for the best, because it doesn't seem to uniformly make things better. *Without* your change, I observe that my iperf3 session to STA1 (high speed) seems to complain about a lot of out-of-order packets. *With* your change, the out-of-order complaints seem to go away, which is nice. The throughput measurements look about the same both ways. However, *without* your change, isoping latency to STA1 (low speed) seems to be pretty stable in the ~100ms range (although it fluctuates a bit). *With* your change, STA2 latency fluctuates wildly as low as 1.x ms (yay!) but as high as 800ms (boo). STA1 latency is fairly low in both cases. I have to admit, I haven't read any of this code in enough detail to have a guess as to why this might be. But I did switch back and forth between the two versions a few times to confirm that it seems to be repeatable. Just to compare, I went back to a version that contains only Tim's patch (intermediate queues) but not fq_codel. That one seems to have much less variability in the isoping times (~50-100ms under load). The best case isn't as good, but the worst case is much less bad. This suggests to me that maybe codel's per-station drop rate is oscillating (perhaps it needs to ramp less quickly?). I wonder if the competing codels between stations also confuse each other: as one ramps down, maybe the other one would be encouraged to ramp up? -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/RFT] mac80211: implement fq_codel for software queuing
On Fri, Mar 4, 2016 at 1:32 AM, Michal Kaziorwrote: > On 4 March 2016 at 03:48, Tim Shepard wrote: > [...] >> (I am interested in knowing what other mac80211 drivers have been >> modified to use the mac80211 intermediate software queues. I know >> Michal mentioned he has patches for ath10k that are not yet released, >> and I know Felix is finishing up the mt76 driver which uses them.) > > Patches for ath10k are under review since quite some time now (but are > not merged yet). The latest re-spin is: > > http://lists.infradead.org/pipermail/ath10k/2016-March/006923.html Hi all, on Friday I had a chance to experiment with some of these patches, specifically Tim's ath9k patch (to use intermediate queues), plus MIchal's patch to use fq_codel with the intermediate queues. I didn't attempt any fine tuning; I just slapped them together to see what happens. (I tried applying Michal's ath10k patches too, but got stuck since they seem to be applied against the upstream v4.4 kernel and didn't merge cleanly with the latest mac80211 branch. Maybe I was doing something wrong.) Test setup: AP (ath9k) -> 2x2 strong signal -> STA1 (mwifiex) -> attenuator (-40 dB) -> 1x1 weak signal -> STA2 (mwifiex) STA2 generally gets modulation levels around MCS0-2 and STA1 usually gets something like MCS12-15. With or without this patch, results with TCP iperf were fishy - I think packet loss patterns were particularly bad and caused 2-second TCP retry timeouts occasionally - so I removed TCP from the test and switched the UDP iperf instead. I ran isoping (https://gfiber.googlesource.com/vendor/google/platform/+/master/cmds/isoping.c) from the AP to both stations to measure two-way latency during all tests. (I used -r2 for two packets/sec in each direction in order not to affect the test results too much.) Overall results: - Running one iperf at a time, I saw ~45 Mbps to STA1 and ~7 Mbps to STA2. - Running both iperfs at once, without the patches, latencies got extremely high (~600ms sometimes) and results were closer to byte-fairness than airtime-fairness (ie. ~7 Mbps each). - Running both iperfs at once, with the patches, latencies were still high (usually high 2-digit, sometimes low 3-digit latencies) but we got closer to airtime-fairness than byte-fairness (~17 Mbps and ~2 Mbps). - With only one iperf running, without the patches, latencies were high to both stations. With the patches, latency was mid-double-digits to the non-iperf station (pretty good!) while being low-mid triple-digits to the busy iperf station. This suggests that we are getting per-station queuing (yay!) but does make me question whether the fq_ in fq_codel was working. - More generally, the latencies were all still higher than expected. I didn't investigate why this might be, but the obvious guess (which Tim has agreed with) is that we need something BQL-like in addition to the fq_codel layer. The BQL-like thing is what Emmanuel's earlier latency patch did with iwlwifi, with apparently good results. If someone wants to try making a similar patch for ath9k, I'd be happy to help test it out. Although things aren't yet nearly as good as I'd like to see them, I'll note that Tim's and Michal's patches don't seem to make things *worse*, at least in my setup, and do improve results in my test. So if they pass code review, it may make sense to apply them as one small step forward to reducing wifi latency under load. Have fun, Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mac80211: debugfs var for the default aggregation timeout.
On Tue, Feb 23, 2016 at 5:14 AM, Johannes Berg <johan...@sipsolutions.net> wrote: > On Tue, 2016-02-16 at 16:28 -0500, Avery Pennarun wrote: >> Since around the beginning of time, ath9k aggregates have timed out >> after >> 5000 TU (around 5000ms) of inactivity, but nobody seems to be quite >> sure >> why, and this magic number seems to have migrated around from one >> place to >> another. An openbsd mailing list recently had a patch to disable the >> timeout completely, which they say matches some commercial routers: >> https://www.mail-archive.com/tech@openbsd.org/msg29456.html >> >> Even in Linux, several non-ath9k drivers default to no timeout >> already. I >> think changing it directly to zero would be safe, but to allow a more >> structured investigation, let's make it configurable for now. >> > Since we just made it zero, perhaps we don't need this? > > Although perhaps we still want it to be able to debug it? We're putting my version of the patch into our devices in order to be able to try different values and see how it changes the percentage of devices with nonzero 'pending' field in agg_status. I'm hoping using zero here will result in total elimination of the pending problem, but we'll see. It probably makes sense not to apply this upstream if the default value is zero now anyway. > Anyway - you shouldn't create a debugfs file and play with the extern > stuff etc., let minstrel create the debugfs file in minstrel_ht_alloc() Good point. I had a feeling I was doing that in the wrong place :) If people think this is important, I can respin the patch, otherwise feel free to discard. Have fun, Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mac80211: minstrel_ht: set default tx aggregation timeout to 0
Acked-by: Avery Pennarun <apenw...@gmail.com> This fixes serious problems on our platform, especially with iPhone 4 generation devices. On Thu, Feb 18, 2016 at 1:49 PM, Felix Fietkau <n...@openwrt.org> wrote: > The value 5000 was put here with the addition of the timeout field to > ieee80211_start_tx_ba_session. It was originally added in mac80211 to > save resources for drivers like iwlwifi, which only supports a limited > number of concurrent aggregation sessions. > > Since iwlwifi does not use minstrel_ht and other drivers don't need > this, 0 is a better default - especially since there have been > recent reports of aggregation setup related issues reproduced with > ath9k. This should improve stability without causing any adverse > effects. > > Cc: sta...@vger.kernel.org > Signed-off-by: Felix Fietkau <n...@openwrt.org> > --- > net/mac80211/rc80211_minstrel_ht.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/mac80211/rc80211_minstrel_ht.c > b/net/mac80211/rc80211_minstrel_ht.c > index 3928dbd..a7d9227 100644 > --- a/net/mac80211/rc80211_minstrel_ht.c > +++ b/net/mac80211/rc80211_minstrel_ht.c > @@ -691,7 +691,7 @@ minstrel_aggr_check(struct ieee80211_sta *pubsta, struct > sk_buff *skb) > if (likely(sta->ampdu_mlme.tid_tx[tid])) > return; > > - ieee80211_start_tx_ba_session(pubsta, tid, 5000); > + ieee80211_start_tx_ba_session(pubsta, tid, 0); > } > > static void > -- > 2.2.2 > -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins
On Wed, Feb 17, 2016 at 1:23 AM, Krishna Chaitanyawrote: > From a quick glance of symptoms, i think the below patch is worth a > try, even though > i don't see you are doing any background scans for which this applies. > > https://patchwork.kernel.org/patch/8015321/ Thanks, Krishna. We are in fact doing background scans occasionally, however, none was in progress around the time of the glitch, and the problem was still reproducible with background scans disabled. We also aren't combining AP and STA on the same radio (in this particular use case). -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins
On Tue, Feb 16, 2016 at 5:05 PM, Johannes Berg <johan...@sipsolutions.net> wrote: > On Tue, 2016-02-16 at 16:28 -0500, Avery Pennarun wrote: >> Changing default_agg_timeout to zero (as it is on most non-ath9k >> drivers) makes the problem pretty much go away. However, I think >> it's because I'm just dodging the code path that triggers a race >> condition. > > That does seem likely. Perhaps you could reproduce it while running > mac80211 tracing? There should be a fair amount of information about > aggregation and queue stops in there, though as you note queue stops > aren't really happening, only aggregation related things. Perhaps the > tracepoints for that aren't quite sufficient. So far that hasn't seemed to help, although maybe you can read traces better than I can. The big problem is that the actual queue doesn't seem to have stopped; it might be an ath9k bug. >> Notes: >> >> - I'm using exactly the same ath9k driver (currently 20150525, but >> we've tried newer ones with no difference) on two totally different >> platforms: a dual-core mindspeed c2k host CPU (ARMv7) with separate >> ath9k, and a single-core QCA9531 (MIPS) with on-chip ath9k. >> >> - I've been unable to trigger the problem on the QCA9531, but I have >> on MIPS. > > That's ... not what I would have expected, especially since the MIPS is > single core. That makes the races stranger than expected. Oops, typo. The QCA9531 *is* MIPS. The one where it triggers is the dual-core ARM. >> The aggregation code is... a little hairy. Does anyone have any >> guesses where I might look for the race condition? Or better still, >> a patch I can try? > > I'm not aware of any race conditions in the code right now :) Aw. That would have made it a lot easier! -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mac80211: debugfs var for the default aggregation timeout.
Since around the beginning of time, ath9k aggregates have timed out after 5000 TU (around 5000ms) of inactivity, but nobody seems to be quite sure why, and this magic number seems to have migrated around from one place to another. An openbsd mailing list recently had a patch to disable the timeout completely, which they say matches some commercial routers: https://www.mail-archive.com/tech@openbsd.org/msg29456.html Even in Linux, several non-ath9k drivers default to no timeout already. I think changing it directly to zero would be safe, but to allow a more structured investigation, let's make it configurable for now. Signed-off-by: Avery Pennarun <apenw...@gmail.com> --- net/mac80211/debugfs_netdev.c | 4 net/mac80211/rc80211_minstrel_ht.c | 4 +++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/net/mac80211/debugfs_netdev.c b/net/mac80211/debugfs_netdev.c index 37ea30e..5ae160b 100644 --- a/net/mac80211/debugfs_netdev.c +++ b/net/mac80211/debugfs_netdev.c @@ -715,6 +715,8 @@ static void add_mesh_config(struct ieee80211_sub_if_data *sdata) } #endif +u32 default_agg_timeout = 5000; + static void add_files(struct ieee80211_sub_if_data *sdata) { if (!sdata->vif.debugfs_dir) @@ -725,6 +727,8 @@ static void add_files(struct ieee80211_sub_if_data *sdata) DEBUGFS_ADD(txpower); DEBUGFS_ADD(user_power_level); DEBUGFS_ADD(ap_power_level); + debugfs_create_u32("default_agg_timeout", 0600, sdata->vif.debugfs_dir, + _agg_timeout); if (sdata->vif.type != NL80211_IFTYPE_MONITOR) add_common_files(sdata); diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c index 3928dbd..028d9d4 100644 --- a/net/mac80211/rc80211_minstrel_ht.c +++ b/net/mac80211/rc80211_minstrel_ht.c @@ -671,6 +671,8 @@ minstrel_downgrade_rate(struct minstrel_ht_sta *mi, u16 *idx, bool primary) } } +extern u32 default_agg_timeout; + static void minstrel_aggr_check(struct ieee80211_sta *pubsta, struct sk_buff *skb) { @@ -691,7 +693,7 @@ minstrel_aggr_check(struct ieee80211_sta *pubsta, struct sk_buff *skb) if (likely(sta->ampdu_mlme.tid_tx[tid])) return; - ieee80211_start_tx_ba_session(pubsta, tid, 5000); + ieee80211_start_tx_ba_session(pubsta, tid, default_agg_timeout); } static void -- 2.7.0.rc3.207.g0ac5344 -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins
Okay, I've made much more progress on this old thread. I haven't actually fixed the bug, which I suspect is a race condition only on multicore machines, but I at least have better reproduction steps and a workaround. The bug seems to trigger when three things happen at once: 1) Background interference causes retries 2) AP wants to send data to the STA, which has been idle for a while 3) We want to negotiate a new BA session from AP to STA. Sometimes, the background interference will cause the time between ADDBA Request (from AP) and ADDBA Response (from STA) to be longer than usual. In my tests, it's usually <1ms, but in high-interference situations I've seen it be >3ms. Sometimes, when the delay is longer, I see the symptom that the agg_status file for the station in question starts showing TID#0's "pending" column increasing slowly, until it eventually reaches 64. A wifi capture on a separate sniffer indicates that no data is being transmitted to that station, although traffic to other stations (and broadcast/multicast) continues unabated. I guess this means the device's queues are themselves not stopped, but the station's per-TID aggregation queue is stuck. Twiddling the agg_status of a different queue (in this case TID#1) unblocks TID#0: echo "tx start 1" >/sys/kernel/debug/ieee80211/phy0/.../agg_status So does having another aggregation-capable device join the network. Having an 802.11g-only device join the network does *not* unblock the queue. However, trying to stop TID#0 doesn't help (and it also doesn't successfully stop the aggregation): echo "tx stop 0" >/sys/kernel/debug/ieee80211/phy0/.../agg_status The following patch makes the problem easier to reproduce by letting you turn the aggregation timeout way down. For myself, I used a default_agg_timeout of 500ms and just pinged repeatedly once per second from the AP to STA. This causes the aggregation sessions to be repeatedly brought up and torn down, which triggers the problem for me within a few minutes (when run on a channel with fairly high noise). Changing default_agg_timeout to zero (as it is on most non-ath9k drivers) makes the problem pretty much go away. However, I think it's because I'm just dodging the code path that triggers a race condition. Notes: - I'm using exactly the same ath9k driver (currently 20150525, but we've tried newer ones with no difference) on two totally different platforms: a dual-core mindspeed c2k host CPU (ARMv7) with separate ath9k, and a single-core QCA9531 (MIPS) with on-chip ath9k. - I've been unable to trigger the problem on the QCA9531, but I have on MIPS. The aggregation code is... a little hairy. Does anyone have any guesses where I might look for the race condition? Or better still, a patch I can try? Avery Pennarun (1): mac80211: add a debugfs var for the default aggregation timeout. net/mac80211/debugfs_netdev.c | 4 net/mac80211/rc80211_minstrel_ht.c | 4 +++- 2 files changed, 7 insertions(+), 1 deletion(-) -- 2.7.0.rc3.207.g0ac5344 -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins
[fixed ath9k list address. sorry for the spam] Hi all, I have a pretty weird problem I've been chasing for a few weeks and have narrowed it down, but not quite solved it. It may be caused by bugs in aggregation-related code. Steps: - Set up an ath9k-based Linux AP on an ARM processor (currently using this version of backports, though I've tried older and newer versions with no change: "backported from Linux (next-20150525-0-gc201847) using backports backports-20150525-0-g49969bd") - Join my iPhone 4S (running iOS 7.1.2) to the network - Use it for a while - Eventually it will stay connected, but Internet access doesn't work - Wireless packet captures show that packets are received *from* the iPhone, and ACKs are returned for those packets from the ath9k, and those packets are correctly forwarded to the AP's br0 interface. But outgoing packets show up on br0 and wlan0 with tcpdump, but never make it onto the air. - Putting the iPhone 4S into airplane mode and then letting it reconnecting will fix it for a few more seconds/minutes before it stops again. More details: - It only seems to happen to my iPhone 4S client (never seen it with a different client). - It only seems to happen with my ath9k AP. - It only seems to happen on my home network (another instance of the same AP hardware on another network doesn't trigger the problem). - It only seems to happen when no other 802.11n-capable devices are connected to the same AP. - The moment I join an 802.11n-capable device to the AP, traffic instantly unblocks (see packet capture below). - Joining an 802.11g-only device (no aggregation) does *not* unblock traffic. - Disabling encryption and turning wmm_enable on and off have no effect. - Disabling 802.11n support on the AP (so that everyone has to use 802.11g) makes the problem go away. - 'ip -s link show dev wlan0' shows tx packet counters continuing to increase during the outage, even though packets aren't flowing. - I applied a patch from Tim Shepard to track the most recent tx attempt, acked tx, and rx packet times inside mac80211. According to this data, mac80211 thinks rx happened at most a couple of seconds ago (as expected). The most recent tx was acked, but it was back around the time the outage started. Note that this disagrees with 'ip -s link' and tcpdump, which think they transmitted much more recently than that. (The patch is here: https://gfiber-review.googlesource.com/#/c/1250/ ) I captured a pcap of a new 802.11n-capable device joining the network and unblocking the transmit. The action starts around frame 325: http://apenwarr.ca/tmp/iPod4-fixing-iPhone4-trimmed.pcap.gz In this pcap, the main players are: ath9k AP: 88:dc:96:08:60:50 iPhone 4S with the problem: e4:25:e7:73:e6:31 New client fixing the problem (iPod 4): 18:e7:f4:7e:c1:42 Observations from the pcap: - Upstream packets (iPhone->ath9k) are received and acked (see eg. frame 154) - Beacons from the ath9k show an empty TIM bitmap until the iPod joins, then it's nonempty and things unblock. Does anyone have any thoughts about what to look for here? Have fun, Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ath9k(?): AP stops sending traffic to iPhone 4S until another 802.11n-capable STA joins
Hi all, I have a pretty weird problem I've been chasing for a few weeks and have narrowed it down, but not quite solved it. It may be caused by bugs in aggregation-related code. Steps: - Set up an ath9k-based Linux AP on an ARM processor (currently using this version of backports, though I've tried older and newer versions with no change: "backported from Linux (next-20150525-0-gc201847) using backports backports-20150525-0-g49969bd") - Join my iPhone 4S (running iOS 7.1.2) to the network - Use it for a while - Eventually it will stay connected, but Internet access doesn't work - Wireless packet captures show that packets are received *from* the iPhone, and ACKs are returned for those packets from the ath9k, and those packets are correctly forwarded to the AP's br0 interface. But outgoing packets show up on br0 and wlan0 with tcpdump, but never make it onto the air. - Putting the iPhone 4S into airplane mode and then letting it reconnecting will fix it for a few more seconds/minutes before it stops again. More details: - It only seems to happen to my iPhone 4S client (never seen it with a different client). - It only seems to happen with my ath9k AP. - It only seems to happen on my home network (another instance of the same AP hardware on another network doesn't trigger the problem). - It only seems to happen when no other 802.11n-capable devices are connected to the same AP. - The moment I join an 802.11n-capable device to the AP, traffic instantly unblocks (see packet capture below). - Joining an 802.11g-only device (no aggregation) does *not* unblock traffic. - Disabling encryption and turning wmm_enable on and off have no effect. - Disabling 802.11n support on the AP (so that everyone has to use 802.11g) makes the problem go away. - 'ip -s link show dev wlan0' shows tx packet counters continuing to increase during the outage, even though packets aren't flowing. - I applied a patch from Tim Shepard to track the most recent tx attempt, acked tx, and rx packet times inside mac80211. According to this data, mac80211 thinks rx happened at most a couple of seconds ago (as expected). The most recent tx was acked, but it was back around the time the outage started. Note that this disagrees with 'ip -s link' and tcpdump, which think they transmitted much more recently than that. (The patch is here: https://gfiber-review.googlesource.com/#/c/1250/ ) I captured a pcap of a new 802.11n-capable device joining the network and unblocking the transmit. The action starts around frame 325: http://apenwarr.ca/tmp/iPod4-fixing-iPhone4-trimmed.pcap.gz In this pcap, the main players are: ath9k AP: 88:dc:96:08:60:50 iPhone 4S with the problem: e4:25:e7:73:e6:31 New client fixing the problem (iPod 4): 18:e7:f4:7e:c1:42 Observations from the pcap: - Upstream packets (iPhone->ath9k) are received and acked (see eg. frame 154) - Beacons from the ath9k show an empty TIM bitmap until the iPod joins, then it's nonempty and things unblock. Does anyone have any thoughts about what to look for here? Have fun, Avery -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ath10k: Improve performance by reducing tx_lock contention.
On Wed, Jul 22, 2015 at 2:16 AM, Kalle Valo kv...@codeaurora.org wrote: Marty Faltesek mfalte...@google.com writes: From: Qi Zhou qiz...@google.com During tx completion, tx_lock is held for longer than required, preventing efficient refill of htt-pending_tx. Refactor the code so that only MSDU related operations are protected by the lock. Improves downstream performance on a 3x3 client from 495 to 580 Mbps. It would be good to mention the actual platform/CPU as I guess this improvement is platform specific? That's a good idea to mention in the commit message, although given that it's just an in-driver lock contention patch, it probably affects any underpowered multicore CPU. -- To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Capturing hardware-decrypted packets in monitor mode on ath9k/ath10k
Hi, On a station or AP device, I'd like to capture packets in monitor mode (ie. with radiotap headers). Normally this captures the encrypted packets as they appear on the air. In my case, I'd like to capture the *decrypted* packets where possible (ie. packets communicating with this node, where the local machine already knows the session key and is presumably decrypting the packets anyway so that it can carry on the session). I know wireshark (etc) can decrypt packets for a given session if you capture the EAPOL frames. The advantages of having the driver do it in hardware are a) hopefully less performance impact, and b) you can easily start capturing at any time, even post EAPOL, because the driver already has a cached copy of the keys. Is there a flag somewhere I can set to make this happen? Is this even a feature supported by most hardware? Thanks, Avery -- To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Open Source RRM Hand-Over Optimization (WAS: Throughput regression with `tcp: refine TSO autosizing`)
On Mon, Feb 2, 2015 at 11:44 AM, Björn Smedman b...@anyfi.net wrote: On Mon, Feb 2, 2015 at 5:21 AM, Avery Pennarun apenw...@google.com wrote: While there is definitely some work to be done in handoff, it seems like there are some find implementations of this already in existence. Several brands of enterprise access point setups seem to do well at this. It would be nice if they interoperated, I guess. The fact that there's no open source version of this kind of handoff feature bugs me, but we are working on it here and the work is all planned to be open source, for example: (very early version) https://gfiber.googlesource.com/vendor/google/platform/+/master/waveguide/ We've got an SDN-inspired architecture with 802.11 frame tunneling (a la CAPWAP), airtime fairness, infrastructure initiated hand-over, Opportunistic Key Caching (OKC), IEEE 802.11r Fast BSS Transition and a few more goodies. It's currently free as in beer (http://anyfi.net/software, https://github.com/carrierwrt/carrierwrt/pull/7 and http://www.anyfinetworks.com/download) up to 100 APs, but we're definitely going to open source in one form or another. We've also tried to raise some interest in fixing up CAPWAP (https://www.ietf.org/mail-archive/web/opsawg/current/msg03196.html), which is (unfortunately) the best open standard at the moment. Interest seems marginal though... This sounds cool. Is the CAPWAP/encapsulation stuff separable from the rest? At 802.11ac speeds, a super fast WAN link, and a low-cost SoC, too many layers can be a killer. -- To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing`
On Sun, Feb 1, 2015 at 6:34 PM, Andrew McGregor andrewm...@gmail.com wrote: I missed one item in my list of potential improvements: the most braindead thing 802.11 has to say about rates is that broadcast and multicast packets should be sent at 'the lowest basic rate in the current supported rate set', which is really wasteful. There are a couple of ways of dealing with this: one, ignore the standard and pick the rate that is most likely to get the frame to as many neighbours as possible (by a scan of the Minstrel tables). Or two, fan it out as unicast, which might well take less airtime (due to aggregation) as well as being much more likely to be delivered, since you get ACKs and retries by doing that. As far as I can see, the only sensible thing to do with multicast/broadcast is some variation of the unicast fanout, unless you've got a truly huge number of nodes. I don't know of any protocols (certainly not video streams) that actually work well with the kind of packet loss you see at medium/long range with wifi if retransmits aren't used. I've heard that openwrt already has a patch included that does this kind of fanout at the bridge layer. I've also heard of a new reliable multicast in some newer 802.11 variant, which essentially sends out a single multicast packet and expects an ACK from each intended recipient. Other than adding complexity, it seems like the best of both worlds. -- To unsubscribe from this list: send the line unsubscribe linux-wireless in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Cerowrt-devel] Fwd: Throughput regression with `tcp: refine TSO autosizing`
On Sun, Feb 1, 2015 at 9:43 AM, dpr...@reed.com wrote: Just to clarify, managing queueing in a single access point WiFi network is only a small part of the problem of fixing the rapidly degrading performance of WiFi based systems. Can you explain what you mean by rapidly degrading? The performance in odd situations is certainly not inspirational, but I haven't noticed it getting worse over time. Similarly, mesh routing is only a small part of the problem with the scalability of cooperative meshes based on the WiFi MAC. That's certainly true. Not to say the mesh routing algorithms are much good either. Also, as we noted earlier, handoff from one next hop to another is a huge problem with performance in practical deployments (a factor of 10x at least, just in that). While there is definitely some work to be done in handoff, it seems like there are some find implementations of this already in existence. Several brands of enterprise access point setups seem to do well at this. It would be nice if they interoperated, I guess. The fact that there's no open source version of this kind of handoff feature bugs me, but we are working on it here and the work is all planned to be open source, for example: (very early version) https://gfiber.googlesource.com/vendor/google/platform/+/master/waveguide/ Propagation information is not used at all when 802.11 systems share a channel, even in single AP deployments, yet all stations can measure propagation quite accurately in their hardware. 802.11k seems to provide for sharing this information. But I'm not clear what I should use it for. :) Finally, Listen-before-talk is highly wasteful for two reasons: 1) any random radio noise from other sources unnecessarily degrades communications [...] 2) the transmitter cannot tell when the intended receiver will be perfectly able to decode the signal without interference with the station it hears (this second point is actually proven in theory in a paper by Jon Peha that argued against trivial etiquettes as a mechanism for sharing among uncooperative and non-interoperable stations). I've thought quite a bit about your point #2 above, but I don't know which direction to pursue. The idea is that sometimes just shout over the background noise is a globally optimal solution, right? The question seems to be to figure out when that is true and when it isn't. I agree that, to the extent that managing queues in a single box or a single operating system doesn't require cooperation, it's much easier to get such things into the market. That's why CeroWRT has been as effective as it has been. But has Microsoft done anything at all about it? Do the better ECN signals that can arise from good queue management get used by the TCP endpoints, or for that matter UDP-based protocol endpoints? If we don't know the answer to the questions, then that is itself the problem. It's a lot easier to say, hey, ChromeOS and MacOS have good network performance but Microsoft has bad network performance, if it's true and we have good reproducible tests to demonstrate that. The reason no one is making progress on any of these particular issues is that there is no coordination at the systems level around creating rising tides that lift all boats in the WiFi-ish space. It's all about ripping the competition by creating stuff that can sell better than the other guys' stuff, and avoiding cooperation at all costs. [...] But the big wins in making WiFi better are going begging. As WiFi becomes more closed, as it will as the major Internet Access Providers and Gadget builders (Google, Apple) start excluding innovators in wireless from the market by closed, proprietary solutions, the problem WILL get worse. You won't be able to fix those problems at all. If you have a solution you will have to convince the oligopoly to even bother trying it. As someone who works at Google Fiber (which is both a gadget maker and an ISP) and who pushes all day long for our wifi stuff to be open source, I'm slightly offended to be lumped in with other vendors in your story :) I think the ChromeOS team (which insists on only open source wifi drivers in all chromebooks) would feel similarly. We are lucky to have defined our competitive advantage as something other than short-lived slight improvements in wifi that will soon be wastefully duplicated by everyone else. That said, I see what you mean about the general state of the industry. The way to fix it is the way Linux always fixes it: make the open source version so much better that building a proprietary one, just to gather a small incremental advantage, is a huge waste of time and effort. Work on minstrel and fq_codel go really far here. I personally think that things like promoting semi-closed, essentially proprietary ESSID-based bridged distribution systems as good ideas are counterproductive to this goal. But that's perhaps too radical for this crowd. Not