Re: brcmfmac signal/interference issues
On Thu, Mar 8, 2018 at 4:47 AM, Arend van Sprielwrote: >> 43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf >> >> Yes, ndis. So no easy way to run the same firmware on the 2 OSes. > > Indeed. I could try building nearly same firmware target. Can you provide > the firmware version as well. Full string is: 43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf Version: 7.45.87.0 CRC: 7cb2470e Date: Thu 2016-04-21 22:31:44 PDT Ucode Ver: 1043.2070 FWID: 01-f68ec182 If you could build that config but for Linux instead of ndis I would love to try it. Also, here is the string for the current one in linux-firmware: 43455c0-roml/sdio-ag-p2p-pno-aoe-pktfilter-keepalive-mchan-pktctx-proptxstatus-ampduhostreorder-lpc-pwropt-txbf-wnm-okc-ccx-ltecx-wfds-wl11u-mfp-tdls-ve Version: 7.45.18.0 CRC: d7226371 Date: Sun 2015-03-01 07:31:57 PST Ucode Ver: 1026.2 FWID: 01-6a2c8ad4 I note that the Version and UcodeVer in the linux-firmware version are lower than the windows one. If it's possible to also rebuild the linux-firmware config but with those newer versions (or even the latest, if there is something newer), I will test that too. > So it picks up something in the PC. Some sources of interference that I have > seen before are USB3 and HDMI. Maybe try to shield those if present and see > if that helps. The nvram contains sensitivity parameters, but as you stated > you are using the same nvram for windows and linux for now we can rule it > out for debugging the issue. Yeah, there are some options here which we can try to explore. What's still unknown though is why windows appears immune to this exact same interference. A software fix would be much more convenient... :) Daniel
Re: brcmfmac signal/interference issues
On Thu, Mar 8, 2018 at 9:54 AM, Steve deRosierwrote: > Did you check the Bluetooth? I don't know if this chip has it or if > it's an independent chip on this board, but if Linux is leaving it > powered up but not properly configured you could have issues. I had already disabled it via hciconfig, without any effect on the problem. Based on your suggestion I also checked BT_REG_ON, which was not being affected by hciconfig. On AP6255 I believe it is active high, so I brought it low to disable bluetooth, confirmed with a scope, and the problem is still there. Thanks for the suggestion anyway! Daniel
Re: brcmfmac signal/interference issues
On Fri, Feb 23, 2018 at 12:54 PM, Arend van Sprielwrote: > Yup. Windows firmware talks NDIS. If you run 'strings 4345r6rtecdc.bin | > tail -1' you can see the firmware build target and it likely has 'ndis' in > it. 43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf Yes, ndis. So no easy way to run the same firmware on the 2 OSes. > Now are you using BT as well on this device? Another suggestion I got is to > disable transmit beamforming which brcmfmac enables by default. Not sure if > this device supports it, but could you try the patch below. Thanks for the ideas. I had already tried with the bluetooth disabled - no change there. Also reproduced the problem after applying your patch. Daniel
Re: brcmfmac signal/interference issues
Hi, On Wed, Feb 21, 2018 at 12:39 PM, Daniel Drake <dr...@endlessm.com> wrote: > Thanks for looking into this. Here is the brcmfmac43455-sdio.txt file > we are using: > https://gist.github.com/dsd/d7ee3caa6dfd77f0bcd16cf272b20298 > This is identical to the 4345r6nvram.txt file from windows. I checked Windows again and it seems to be using a firmware file 4345r6rtecdc.bin alongside this nvram data. This firmware is different from the one in linux-firmware. I've uploaded it here: https://drive.google.com/open?id=1MUsiaoozslJb8SCYOR-FNbJFuD-h4PY_ I was hoping to try this on Linux to see if it makes any difference to the issue seen here. However, with thisi firmware in place, I can't connect to the network at all. It associates, wpa_supplicant never sees the first WPA2 key message sent from the AP - even though wireshark on a separate monitor shows that the key message was sent, and that the STA acked it. I turned off WPA2 to make it an open network instead, and now I am unable to complete the DHCP conversation. According to the monitor station, the STA succesfully transmits DHCPDISCOVER and the AP responds with DHCPOFFER. The offer is acked, but dhclient never sees it, and eventually times out. Any ideas why this firmware may not be working at all on linux? Thanks, Daniel
Re: brcmfmac signal/interference issues
Hi Arend, On Wed, Feb 21, 2018 at 12:07 PM, Arend van Spriel <arend.vanspr...@broadcom.com> wrote: > > On 2/21/2018 9:14 AM, Daniel Drake wrote: >> >> Hi, >> >> We're working with the Weibu F3C MiniPC which includes BCM43455 SDIO >> wifi chip 0x004345(17221) rev 0x06 (AP6255 module). >> >> We are seeing a strange issue where usually within an hour of usage, >> the wifi connection becomes so unstable and lossy that it is unusable. >> While investigating this my standard test is to send ICMP pings to the >> IP address of the local access point. Normally the latency is 5-10ms, >> but when this problem is seen it will go to 500ms and then increase up >> towards 20s before completely timing out. >> >> Sometimes it is possible to induce the problem on-demand by stressing >> some combination of CPU, disk and/or USB. At this point, ping reply >> latency increases from ~5ms to 500ms+ before increasing even further. >> Killing the stress test, the pings immediately return to normal. This >> is not concrete though - I also seem to have a lot of luck hitting the >> problem in the morning when booting up the computer from stone cold >> state, while it is idle. >> >> When the problem is being reproduced (ping times are high or get no >> response), touching the exposed metal on the antenna connector with my >> finger makes ping times return to normal. Touching it with a piece of >> plastic does not have the same effect - so it is some effect of body >> capacitance or similar. Also, disconnecting the antenna makes ping >> times return to normal, although outside of the simple pings, >> bandwidth is much reduced. >> >> Additionally, when the problem is being reproduced, if I move the >> antenna outside of the case, ping times return to normal. When I move >> the antenna back into the miniPC case vicinity, it goes slow and lossy >> again. >> >> I have used a separate monitoring station with wireshark to look at >> the 802.11 traffic while this is happening. When the problem is >> reproduced, the miniPC is mostly unable to TX anything, and the AP >> sends frames and retries them but with no ACK visible from the miniPC. >> Immediately when I touch the antenna connector with my finger, tx >> frames from the miniPC appear and the conversation comes back to life. >> >> Running Linux 4.15 but we believe all versions are affected. >> >> This very much sounds like a hardware issue, but here is where things >> get interesting: Windows 10 on the same unit has no such problem. >> >> I set up 2 units side by side - one running Windows 10 and the other >> running Linux, connected to the same AP. The top part of the MiniPC >> case has been removed so I can see the motherboard. I free up the >> antennas from the MiniPC casing and they are on a relatively long >> cable, so they can be freely moved around in this test, allowing me to >> dangle the antenna into the vicinity of the neighbouring unit miniPC >> case. >> >> If I place both antenna terminals inside the Linux MiniPC case, the >> Linux pings are bad but the Windows pings are fine. >> >> If I place both antenna terminals inside the Windows MiniPC case, it >> is the same: Linux pings are bad, but the Windows pings are fine. >> >> And when the Linux antenna is placed outside of both cases, the Linux >> pings are fine. I've repeated these tests a handful of times in quick >> succession to make sure that I'm not going crazy and that this is not >> a case of the problem intermittency causing misleading results. These >> findings appear very solid. >> >> This suggests that regardless of the running OS, the MiniPC produces >> some kind of interference that intermittently has an extremely >> detrimental effect on wifi signal when you are running Linux. However, >> Windows is somehow immune to this. >> >> Any ideas for how to continue debugging this? How can we make the >> Linux driver immune to this interference like the windows one is? > > > Hi Daniel, > > Thanks. I forwarded your detailed report. My first hunch would be the nvram > file used. Are you using the same nvram file on Linux as the one on Windows? > If not can you compare them or better just sent them. Thanks for looking into this. Here is the brcmfmac43455-sdio.txt file we are using: https://gist.github.com/dsd/d7ee3caa6dfd77f0bcd16cf272b20298 This is identical to the 4345r6nvram.txt file from windows. Daniel
brcmfmac signal/interference issues
Hi, We're working with the Weibu F3C MiniPC which includes BCM43455 SDIO wifi chip 0x004345(17221) rev 0x06 (AP6255 module). We are seeing a strange issue where usually within an hour of usage, the wifi connection becomes so unstable and lossy that it is unusable. While investigating this my standard test is to send ICMP pings to the IP address of the local access point. Normally the latency is 5-10ms, but when this problem is seen it will go to 500ms and then increase up towards 20s before completely timing out. Sometimes it is possible to induce the problem on-demand by stressing some combination of CPU, disk and/or USB. At this point, ping reply latency increases from ~5ms to 500ms+ before increasing even further. Killing the stress test, the pings immediately return to normal. This is not concrete though - I also seem to have a lot of luck hitting the problem in the morning when booting up the computer from stone cold state, while it is idle. When the problem is being reproduced (ping times are high or get no response), touching the exposed metal on the antenna connector with my finger makes ping times return to normal. Touching it with a piece of plastic does not have the same effect - so it is some effect of body capacitance or similar. Also, disconnecting the antenna makes ping times return to normal, although outside of the simple pings, bandwidth is much reduced. Additionally, when the problem is being reproduced, if I move the antenna outside of the case, ping times return to normal. When I move the antenna back into the miniPC case vicinity, it goes slow and lossy again. I have used a separate monitoring station with wireshark to look at the 802.11 traffic while this is happening. When the problem is reproduced, the miniPC is mostly unable to TX anything, and the AP sends frames and retries them but with no ACK visible from the miniPC. Immediately when I touch the antenna connector with my finger, tx frames from the miniPC appear and the conversation comes back to life. Running Linux 4.15 but we believe all versions are affected. This very much sounds like a hardware issue, but here is where things get interesting: Windows 10 on the same unit has no such problem. I set up 2 units side by side - one running Windows 10 and the other running Linux, connected to the same AP. The top part of the MiniPC case has been removed so I can see the motherboard. I free up the antennas from the MiniPC casing and they are on a relatively long cable, so they can be freely moved around in this test, allowing me to dangle the antenna into the vicinity of the neighbouring unit miniPC case. If I place both antenna terminals inside the Linux MiniPC case, the Linux pings are bad but the Windows pings are fine. If I place both antenna terminals inside the Windows MiniPC case, it is the same: Linux pings are bad, but the Windows pings are fine. And when the Linux antenna is placed outside of both cases, the Linux pings are fine. I've repeated these tests a handful of times in quick succession to make sure that I'm not going crazy and that this is not a case of the problem intermittency causing misleading results. These findings appear very solid. This suggests that regardless of the running OS, the MiniPC produces some kind of interference that intermittently has an extremely detrimental effect on wifi signal when you are running Linux. However, Windows is somehow immune to this. Any ideas for how to continue debugging this? How can we make the Linux driver immune to this interference like the windows one is? Thanks Daniel
Re: Make brcmfmac repeat authentication requests
On Thu, Feb 15, 2018 at 3:46 PM, Arend van Sprielwrote: > Ok. Could you create a log with driver debugging enabled, ie. build driver > CONFIG_BRCMDBG=y and load with module param 'debug=0x1416'. The problem is > probably when the firmware is configured. Logs from driver load at boot: https://gist.github.com/dsd/7f9a7e8b0f8e20794aaed6298b2cb96a Logs from interface up: https://gist.github.com/dsd/13909ed821f7429e6be6a97ed91a61af Logs from connection attempt: https://gist.github.com/dsd/ae4a664c45e3d379d765231d96ae20d7 By the way, I noticed that the new parameter is called assoc_retry_max. However here the problem is at the authentication stage. We do not reach association. Does assoc_retry_max also affect the authentication codepath, or is there an equivalent parameter for retrying auth? Thanks Daniel
Re: Make brcmfmac repeat authentication requests
Hi, Thanks for the fast response. On Tue, Feb 13, 2018 at 12:50 PM, Arend van Sprielwrote: > I tried to find info about that access point equipment, but not getting any > hits apart from a olivetti laser printer, but I doubt it is that. Can you > provide more details. The device itself is basically unbranded (just says "4G LTE"). It's an access point and mifi bridge (so insert a sim card and it shares your mobile data connection on the LAN). It comes as part of a solar home solutions package. MF928 is listed as the product name behind the battery. In the web UI it says it is from the EV910 product family, hardware version LR521_V1.0. I can't find info online about it. > User-space (wpa_supplicant) would retry the connect attempt so I guess you > are saying that the timing between the two auth requests is important? Yes, the error goes up to userspace which then retries. However around 15 seconds pass before the authentication request is sent again, and also as part of the retry it redoes the probe requests etc. Windows does the same but there is only a 3 second delay. I haven't checked if this device needs the authentication request resent in less than 3 seconds, or if the problem is that it needs to be sent twice in consecutive frames (i.e. without another probe request in the middle). > Is firmware not repeating at all or is the time between the two auth > requests too long? Firmware is not repeating at all. > Checking firmware there is a 300ms timeout and it does a retry if the limit > is not reached. However, that limit is initialized to zero :-p > > Could you try the patch below? Thanks for looking into the firmware! Unfortunately the change does not appear to make any difference. As before, the auth request is ACKed by the AP but then the conversation halts until userspace steps in on timeout a few seconds later. Daniel > Regards, > Arend > > diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > b/drivers > index 19686ef..af1ab00 100644 > --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c > @@ -384,6 +384,9 @@ int brcmf_c_preinit_dcmds(struct brcmf_if *ifp) > goto done; > } > > + /* allow join retry by firmware */ > + (void)brcmf_fil_iovar_int_set(ifp, "assoc_retry_max", 1); > + > /* Enable tx beamforming, errors can be ignored (not supported) */ > (void)brcmf_fil_iovar_int_set(ifp, "txbf", 1); >
Make brcmfmac repeat authentication requests
Hi, We are working with the Weibu F3C MiniPC which includes BCM43455 SDIO wifi chip 0x004345(17221) rev 0x06 Testing Linux 4.15, this wifi adapter is unable to authenticate with the MF928 MiFi Access Point which is common in Africa. The STA sends the authentication request, which is ACKed by the AP, but then the conversation ends there (a timeout later bubbles up to userspace). Windows 10 with broadcom driver version 1.605.1.0 is also unable to connect. My laptop with ath10k can authenticate and connect fine. There the conversation is: 1. STA sends authentication request 2. AP sends ACK 3. After 0.1s timeout, STA sends another auth request 4. AP sends ACK 5. AP sends authentication response 6. etc. Also confirmed the same pattern on a couple of smartphones, where the delay seems to be 0.3s before repeating the authentication request. Clearly this AP is not behaving correctly; the authentication request should not have to be repeated. However of all the devices to hand, unfortunately only this broadcom device is unable to connect. Is there a way to adjust the driver/firmware to repeat the authentication requests when they are not responded to? This would match the behaviour of other devices and work around this issue. Thanks Daniel
Re: [v2] ath9k: add MSI support
On Mon, Jan 8, 2018 at 6:24 AM, Kalle Valo <kv...@qca.qualcomm.com> wrote: > (Adding AceLan) > > Daniel Drake <dr...@endlessm.com> writes: > >> On Wed, Nov 15, 2017 at 7:38 AM, Daniel Drake <dr...@endlessm.com> wrote: >>> On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valo <kv...@qca.qualcomm.com> wrote: >>>>> Can't be fixed in firmware, but it would be good to have confirmation >>>>> of the hardware behavivour, and maybe some other solution is possible? >>>>> Are you following this up within Qualcomm? >>>> >>>> No time to do that right now, sorry. >>> >>> I got several autoresponders from people on this thread from Qualcomm >>> Taiwan. Would it be useful for us to drop off a sample of the affected >>> product at your Taipei or Hsinchu office so that you can investigate >>> further? >> >> Ping - how can we collaborate on this? > > Are you asking me? While looking at my todo list for this year I doubt I > can find time to help with the MSI implementation or bugfixing. So far you are the only Qualcomm person to reply to the many mails I have written on this topic, so I appreciate the response. I have sunk many hours into this unfortunate situation so I'd really appreciate if you could point me to someone at Qualcomm who can provide a response. I am willing to continue doing the hard work, but I do need some Qualcomm help in getting past brick walls. > But my plan is that first I would apply Russel's patch which makes it > possible to enable MSI with a module parameter: > > https://patchwork.kernel.org/patch/249/ This isn't enough to fix many of the systems that are affected by this issue. You add the parameter, enable it, and MSI support totally fails to deliver any interrupts. Pasting again from earlier: I have tested your patch on Acer Aspire ES1-432. It does not work - I still can't connect to wifi. /proc/interrupts shows that no MSI interrupts are delivered, the counters are 0. lspci -vv shows: Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+ Address: fee0f00c Data: 4142 Masking: 000e Pending: So MSI is enabled and the vector number is 0x42 (decimal 66). However my kernel log is now totally spammed with: do_IRQ: 0.64 No irq handler for vector My assumption here is that the ath9k hardware implementation of MSI is buggy, and it is therefore corrupting the MSI vector number by zeroing out the lower 2 bits (e.g. 66 -> 64). It would be very useful if Qualcomm could confirm if this behaviour is really true. For more info please see: https://marc.info/?l=linux-pci=150238260826803=2 https://marc.info/?t=15063128321=1=2 https://marc.info/?l=linux-pci=150831581725596=2 Thanks, Daniel
Re: [v2] ath9k: add MSI support
On Wed, Nov 15, 2017 at 7:38 AM, Daniel Drake <dr...@endlessm.com> wrote: > On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valo <kv...@qca.qualcomm.com> wrote: >>> Can't be fixed in firmware, but it would be good to have confirmation >>> of the hardware behavivour, and maybe some other solution is possible? >>> Are you following this up within Qualcomm? >> >> No time to do that right now, sorry. > > I got several autoresponders from people on this thread from Qualcomm > Taiwan. Would it be useful for us to drop off a sample of the affected > product at your Taipei or Hsinchu office so that you can investigate > further? Ping - how can we collaborate on this? Also, we have been testing the MSI support patch and while it seems to be working fine on AR9565, multiple users hit failures on AR9462. The most common report is that the system simply cannot maintain the connection with the AP for more than a few seconds. It hits a check in mac80211 where it sends a nullfunc to the AP and expects an ack in less than 500ms, but it disconnects since it doesn't see the ack. https://marc.info/?l=linux-wireless=151027741010422=2 We also reproduced a problem in our office with AR9462. With the MSI support patch in use, we ping a server every second for 1000 seconds while monitoring "iw dev wlp2s0 link" output. With the MSI support patch in place, this test fails every time; the connection is dropped in less than 1000s. With the patch reverted everything is fine. We ran the same test with AR9565 in MSI mode and it worked fine. Daniel
Re: [v2] ath9k: add MSI support
On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valowrote: >> Can't be fixed in firmware, but it would be good to have confirmation >> of the hardware behavivour, and maybe some other solution is possible? >> Are you following this up within Qualcomm? > > No time to do that right now, sorry. I got several autoresponders from people on this thread from Qualcomm Taiwan. Would it be useful for us to drop off a sample of the affected product at your Taipei or Hsinchu office so that you can investigate further? Thanks Daniel
Re: [v2] ath9k: add MSI support
On Mon, Nov 13, 2017 at 4:48 PM, Kalle Valowrote: > Enabling MSI by default is just too invasive, ath9k is used in so many > different enviroments that risk of regressions is high. MSI needs a lot > of testing before we can even consider enabling it by default. And it seems like we already found a regression here - the MSI Message Data is being corrupted as described in my last mail. Can't be fixed in firmware, but it would be good to have confirmation of the hardware behavivour, and maybe some other solution is possible? Are you following this up within Qualcomm? >> I have tested your patch on Acer Aspire ES1-432. It does not work - >> I still can't connect to wifi. >> /proc/interrupts shows that no MSI interrupts are delivered, the >> counters are 0. >> >> lspci -vv shows: >> Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+ >> Address: fee0f00c Data: 4142 >> Masking: 000e Pending: >> >> So MSI is enabled and the vector number is 0x42 (decimal 66). >> However my kernel log is now totally spammed with: >> do_IRQ: 0.64 No irq handler for vector >> >> My assumption here is that the ath9k hardware implementation of >> MSI is buggy, and it is therefore corrupting the MSI vector number >> by zeroing out the lower 2 bits (e.g. 66 -> 64). Thanks Daniel
Re: [v2] ath9k: add MSI support
Hi Russell, > On new Intel platforms like ApolloLake, legacy interrupt mechanism > (INTx) is not supported Could you please share the background on what you are claiming here. I have multiple ApolloLake laptops here with many legacy interrupts being used in /proc/interrupts. I do see this ath9k problem on multiple Acer ApolloLake laptops, however I also have an Asus E402NA ApolloLake laptop on hand where the exact same ath9k miniPCIe card is working fine with legacy interrupts. > With module paremeter "use_msi=1", ath9k driver would try to > use MSI instead of INTx. In the previous patch review it was suggested that MSI should become the default - not a quirk or parameter. https://lkml.org/lkml/2017/9/26/64 I have tested your patch on Acer Aspire ES1-432. It does not work - I still can't connect to wifi. /proc/interrupts shows that no MSI interrupts are delivered, the counters are 0. lspci -vv shows: Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+ Address: fee0f00c Data: 4142 Masking: 000e Pending: So MSI is enabled and the vector number is 0x42 (decimal 66). However my kernel log is now totally spammed with: do_IRQ: 0.64 No irq handler for vector My assumption here is that the ath9k hardware implementation of MSI is buggy, and it is therefore corrupting the MSI vector number by zeroing out the lower 2 bits (e.g. 66 -> 64). It would be very useful if Qualcomm could confirm if this behaviour is really true and if it could potentially be fixed with a new ath9k firmware version. For more info please see: https://marc.info/?l=linux-pci=150238260826803=2 https://marc.info/?t=15063128321=1=2 https://marc.info/?l=linux-pci=150831581725596=2 Thanks Daniel > diff --git a/drivers/net/wireless/ath/ath9k/hw.c > b/drivers/net/wireless/ath/ath9k/hw.c > index 8c5c2dd8fa7f..cd0f023ccf77 100644 > --- a/drivers/net/wireless/ath/ath9k/hw.c > +++ b/drivers/net/wireless/ath/ath9k/hw.c > @@ -922,6 +922,7 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw > *ah, > AR_IMR_RXERR | > AR_IMR_RXORN | > AR_IMR_BCNMISC; > + u32 msi_cfg = 0; > > if (AR_SREV_9340(ah) || AR_SREV_9550(ah) || AR_SREV_9531(ah) || > AR_SREV_9561(ah)) > @@ -929,22 +930,30 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw > *ah, > > if (AR_SREV_9300_20_OR_LATER(ah)) { > imr_reg |= AR_IMR_RXOK_HP; > - if (ah->config.rx_intr_mitigation) > + if (ah->config.rx_intr_mitigation) { > imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR; > - else > + msi_cfg |= AR_INTCFG_MSI_RXINTM | AR_INTCFG_MSI_RXMINTR; > + } else { > imr_reg |= AR_IMR_RXOK_LP; > - > + msi_cfg |= AR_INTCFG_MSI_RXOK; > + } > } else { > - if (ah->config.rx_intr_mitigation) > + if (ah->config.rx_intr_mitigation) { > imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR; > - else > + msi_cfg |= AR_INTCFG_MSI_RXINTM | AR_INTCFG_MSI_RXMINTR; > + } else { > imr_reg |= AR_IMR_RXOK; > + msi_cfg |= AR_INTCFG_MSI_RXOK; > + } > } > > - if (ah->config.tx_intr_mitigation) > + if (ah->config.tx_intr_mitigation) { > imr_reg |= AR_IMR_TXINTM | AR_IMR_TXMINTR; > - else > + msi_cfg |= AR_INTCFG_MSI_TXINTM | AR_INTCFG_MSI_TXMINTR; > + } else { > imr_reg |= AR_IMR_TXOK; > + msi_cfg |= AR_INTCFG_MSI_TXOK; > + } > > ENABLE_REGWRITE_BUFFER(ah); > > @@ -952,6 +961,16 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw > *ah, > ah->imrs2_reg |= AR_IMR_S2_GTT; > REG_WRITE(ah, AR_IMR_S2, ah->imrs2_reg); > > + if (ah->msi_enabled) { > + ah->msi_reg = REG_READ(ah, AR_PCIE_MSI); > + ah->msi_reg |= AR_PCIE_MSI_HW_DBI_WR_EN; > + ah->msi_reg &= AR_PCIE_MSI_HW_INT_PENDING_ADDR_MSI_64; > + REG_WRITE(ah, AR_INTCFG, msi_cfg); > + ath_dbg(ath9k_hw_common(ah), ANY, > + "value of AR_INTCFG=0x%X, msi_cfg=0x%X\n", > + REG_READ(ah, AR_INTCFG), msi_cfg); > + } > + > if (!AR_SREV_9100(ah)) { > REG_WRITE(ah, AR_INTR_SYNC_CAUSE, 0x); > REG_WRITE(ah, AR_INTR_SYNC_ENABLE, sync_default); > diff --git a/drivers/net/wireless/ath/ath9k/hw.h > b/drivers/net/wireless/ath/ath9k/hw.h > index 4ac70827d142..0d6c07c77372 100644 > --- a/drivers/net/wireless/ath/ath9k/hw.h > +++ b/drivers/net/wireless/ath/ath9k/hw.h > @@ -977,6 +977,9 @@ struct ath_hw { > bool tpc_enabled; > u8 tx_power[Ar5416RateSize]; > u8 tx_power_stbc[Ar5416RateSize]; > + bool msi_enabled; > + u32 msi_mask; > + u32 msi_reg; > };
Re: ath9k disconnects in 4.13 with reason=4 locally_generated=1
On Fri, Nov 3, 2017 at 5:51 PM, Jouni Malinen <jo...@qca.qualcomm.com> wrote: > On Fri, Nov 03, 2017 at 10:57:11AM +0800, Daniel Drake wrote: >> Endless OS recently upgraded from Linux 4.11 to Linux 4.13, and we now >> have a few reports of issues with ath9k wireless becoming unusable. >> >> In the logs we can see that it authenticates, associates and completes >> the WPA 4 way handshake, before then being disconnected with: >> >> wlp2s0: CTRL-EVENT-DISCONNECTED bssid=74:26:ac:68:2f:c0 reason=4 >> locally_generated=1 > > reason=4 is WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY. I'd expect the most > likely source of this to be one of the mac80211 code paths in mlme.c > where disconnection is triggered if the current AP become unreachable. > Getting a debug log from mac80211 might help in figuring out what is > causing this (there seem to be number of mlme_dbg() calls before most, > but not necessarily all, places where > WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY is used). We got the log, it is coming from ieee80211_sta_work() else if (ieee80211_hw_check(>hw, REPORTS_TX_ACK_STATUS)) { sdata_info(sdata,t "Failed to send nullfunc to AP %pM after %dms, disconnecting\n", bssid, probe_wait_ms); ieee80211_sta_connection_lost(sdata, bssid, WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY, false); I looked again at changes between 4.12 and 4.13 and still no idea how 4.13 causes this problem :( Daniel
ath9k disconnects in 4.13 with reason=4 locally_generated=1
Hi, Endless OS recently upgraded from Linux 4.11 to Linux 4.13, and we now have a few reports of issues with ath9k wireless becoming unusable. In the logs we can see that it authenticates, associates and completes the WPA 4 way handshake, before then being disconnected with: wlp2s0: CTRL-EVENT-DISCONNECTED bssid=74:26:ac:68:2f:c0 reason=4 locally_generated=1 The cycle then repeats with it connecting again before being swiftly disconnected, etc. More logs: https://gist.github.com/dsd/49f263c67c2859838ce168628ab043e0 At the same time that we upgraded the kernel, we also upgraded many other components (e.g. NetworkManager and wpa_supplicant), however the same problem has been reported on Arch Linux and a user there reports that he narrowed it down to a kernel regression between 4.12 and 4.13: https://bbs.archlinux.org/viewtopic.php?id=225199 Unfortunately we can not reproduce this in our office, so can't offer much more info yet, but we are continuing to investigate. I have not found any codepaths in userspace that generate disconnect reason 4, so I think it must be something in the kernel causing the disconnection, but I did not see any suspicious changes in recent commit history. It would be good to hear from anyone who has heard of this or has any ideas about causes or solutions. Thanks Daniel
Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically
On Fri, Oct 13, 2017 at 9:12 AM, AceLan Kaowrote: > Hi Daniel, > > After applied the 2 commits you mentioned in the email, ath9k works. > > https://marc.info/?l=linux-wireless=150631274108016=2 > https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657 Thanks for testing. However the approach was basically rejected in this thread: [PATCH] PCI MSI: allow alignment restrictions on vector allocation https://marc.info/?t=15063128321=1=2 So we still need an upstream solution. I am curious what Qualcomm have to say about their hardware corrupting the MSI Message Data value. Is there any news on them submitting the MSI support patch? Separately we have the option of seeing if Intel can help us unblock the legacy interrupt (assuming it was simply blocked by the BIOS), or adding an interrupt-polling fallback path to ath9k. Daniel
Re: [PATCH] PCI MSI: allow alignment restrictions on vector allocation
On Mon, Oct 2, 2017 at 10:38 PM, Thomas Gleixnerwrote: >> After checking out the new code and thinking this through a bit, I think >> perhaps the only generic approach that would work is to make the >> ath9k driver require a vector allocation that enables the entire block >> of 4 MSI IRQs that the hardware supports (which is what Windows is doing). > > I wonder how Windows deals with the affinity problem for multi-MSI. Or does > it just not allow to set it at all? https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/interrupt-affinity-and-priority Looks like IRQ affinity can only be set by registry or inf files. I assume that means it is not dynamic and hence avoids the challenges related to moving interrupts around at runtime. > What's wrong with just using the legacy INTx emulation if you cannot > allocate 4 MSI vectors? The Legacy interrupt simply doesn't work for the wifi on at least 8 new Acer laptop products based on Intel Apollo Lake. Plus 4 Dell systems included in the patches in this thread: https://lkml.org/lkml/2017/9/26/55 (the 2 which I can find specs for are also Apollo Lake) We have tried taking the mini-PCIe wifi module out of one of the affected Acer products and moved it to another computer, where it is working fine with legacy interrupts. So this suggests that the wifi module itself is OK, but we are facing a hardware limitation or BIOS limitation on the affected products. In the Dell thread it says "Some platform(BIOS) blocks legacy interrupts (INTx)". If you have any suggestions for how we might solve this without getting into the MSI mess then that would be much appreciated. If the BIOS blocks the interrupts, can Linux unblock them? Just for reference I'm attaching my latest attempt at enabling MULTI_PCI_MSI. It would definitely need further work if we proceed here - so far I've ignored the affinity considerations that you explained, and it's not particularly clean. I'll now have a look at polling for interrupts in the ath9k driver. --- arch/x86/kernel/apic/msi.c| 3 +- arch/x86/kernel/apic/vector.c | 75 --- include/linux/irq.h | 3 +- kernel/irq/matrix.c | 23 +++-- 4 files changed, 74 insertions(+), 30 deletions(-) diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 5b6dd1a85ec4..c57b6a7b9317 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -129,7 +129,8 @@ static struct msi_domain_ops pci_msi_domain_ops = { static struct msi_domain_info pci_msi_domain_info = { .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS | - MSI_FLAG_PCI_MSIX | MSI_FLAG_MUST_REACTIVATE, + MSI_FLAG_PCI_MSIX | MSI_FLAG_MUST_REACTIVATE | + MSI_FLAG_MULTI_PCI_MSI, .ops= _msi_domain_ops, .chip = _msi_controller, .handler= handle_edge_irq, diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 6789e286def9..2926fd92ea1c 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -35,7 +35,8 @@ struct apic_chip_data { unsigned intmove_in_progress: 1, is_managed : 1, can_reserve : 1, - has_reserved: 1; + has_reserved: 1, + contig_allocation : 1; }; struct irq_domain *x86_vector_domain; @@ -198,7 +199,8 @@ static int reserve_irq_vector(struct irq_data *irqd) return 0; } -static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest) +static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest, + unsigned int num, unsigned int align_mask) { struct apic_chip_data *apicd = apic_chip_data(irqd); bool resvd = apicd->has_reserved; @@ -215,18 +217,21 @@ static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest) if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest)) return 0; - vector = irq_matrix_alloc(vector_matrix, dest, resvd, ); + vector = irq_matrix_alloc(vector_matrix, dest, resvd, , + num, align_mask); if (vector > 0) apic_update_vector(irqd, vector, cpu); + trace_vector_alloc(irqd->irq, vector, resvd, vector); return vector; } static int assign_vector_locked(struct irq_data *irqd, - const struct cpumask *dest) + const struct cpumask *dest, + unsigned int num, unsigned int align_mask) { struct apic_chip_data *apicd = apic_chip_data(irqd); - int vector = allocate_vector(irqd, dest); +
Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically
Hi AceLan, On Thu, Sep 28, 2017 at 4:28 PM, AceLan Kaowrote: > Hi Daniel, > > I've tried your patch, but it doesn't work for me. > Wifi can scan AP, but can't get connected. Can you please clarify which patch(es) you have tried? This is the base patch which adds the infrastructure to request specific MSI IRQ vectors: https://marc.info/?l=linux-wireless=150631274108016=2 This is the ath9k MSI patch which makes use of that: https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657 If you were already able to use ath9k MSI interrupts without specific consideration for which MSI vector numbers were used, these are the possible explanations that spring to mind: 1. You got lucky and it picked a vector number that is 4-aligned. You can check this in the "lspci -vvv" output. You'll see something like: Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+ Address: fee0300c Data: 4142 The lower number is the vector number. In my example here 0x42 (66) is not 4-aligned so the failure condition will be hit. 2. You are using interrupt remapping, which I suspect may provide a high likelihood of MSI interrupt vectors being 4-aligned. See if /proc/interrupts shows the IRQ type as IR-PCI-MSI Unfortunately interrupt remapping is not available here, https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023717.html 3. My assumption that all ath9k hardware corrupts the MSI vector number could wrong. However we've seen this on different wifi modules in laptops produced by different OEMs and ODMs, so it seems to be a somewhat widespread problem at least. 4. My assumption that ath9k hardware is corrupting the MSI vector number could be wrong; maybe another component is to blame, could it be a BIOS issue? Admittedly I don't really know how I can debug the layers inbetween seeing the MSI Message Data value disagree with the vector number being handled inside do_IRQ(). Daniel
[PATCH] PCI MSI: allow alignment restrictions on vector allocation
ath9k hardware claims to support up to 4 MSI vectors, and when run in that configuration, it would be allowed to modify the lower bits of the MSI Message Data when generating interrupts in order to signal which of the 4 vectors the interrupt is being raised on. Linux's PCI-MSI irqchip only supports a single MSI vector for each device, and it tells the device this, but the device appears to assume it is working with 4, as it will unset the lower 2 bits of Message Data presumably to indicate that it is an IRQ for the first of 4 possible vectors. Linux will then receive an interrupt on the wrong vector, so the ath9k interrupt handler will not be invoked. To work around this, introduce a mechanism where the vector assignment algorithm can be restricted to only a subset of available vector numbers based on a bitmap. As a user of this bitmap, introduce a pci_dev.align_msi_vector flag which can be used to state that MSI vector numbers must be aligned to a specific amount. If we 4-align the ath9k MSI vector then the lower bits will already be 0 and hence the device will not modify the Message Data away from its original value. This is needed in order to support the wifi card in at least 8 new Acer consumer laptop models which come with the Foxconn NFA335 WiFi module. Legacy interrupts do not work on that module, so MSI support is required. Signed-off-by: Daniel Drake <dr...@endlessm.com> https://phabricator.endlessm.com/T16988 --- arch/x86/include/asm/hw_irq.h | 1 + arch/x86/kernel/apic/msi.c| 15 +++ arch/x86/kernel/apic/vector.c | 32 +--- include/linux/pci.h | 2 ++ 4 files changed, 43 insertions(+), 7 deletions(-) This solves the issue described here: https://marc.info/?l=linux-pci=150238260826803=2 If this approach looks good I'll follow up with the ath9k patch to enable MSI interrupts. diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 6dfe366a8804..7f35178586a1 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -77,6 +77,7 @@ struct irq_alloc_info { struct { struct pci_dev *msi_dev; irq_hw_number_t msi_hwirq; + DECLARE_BITMAP(allowed_vectors, NR_VECTORS); }; #endif #ifdef CONFIG_X86_IO_APIC diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 9b18be764422..80067873cfd5 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -111,6 +111,21 @@ int pci_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS; } + if (pdev->align_msi_vector) { + /* We have specific alignment requirements on the vector +* number used by the device. Set up a bitmap that restricts +* the vector selection accordingly. +*/ + int i = pdev->align_msi_vector; + + set_bit(0, arg->allowed_vectors); + for (; i < NR_VECTORS; i += pdev->align_msi_vector) + set_bit(i, arg->allowed_vectors); + } else { + /* No specific alignment requirements so allow all vectors. */ + bitmap_fill(arg->allowed_vectors, NR_VECTORS); + } + return 0; } EXPORT_SYMBOL_GPL(pci_msi_prepare); diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 88c214e75a6b..64ddac198c25 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -104,7 +104,8 @@ static void free_apic_chip_data(struct apic_chip_data *data) static int __assign_irq_vector(int irq, struct apic_chip_data *d, const struct cpumask *mask, - struct irq_data *irqdata) + struct irq_data *irqdata, + unsigned long *allowed_vectors) { /* * NOTE! The local APIC isn't very good at handling @@ -178,6 +179,9 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d, if (test_bit(vector, used_vectors)) goto next; + if (allowed_vectors && !test_bit(vector, allowed_vectors)) + goto next; + for_each_cpu(new_cpu, vector_searchmask) { if (!IS_ERR_OR_NULL(per_cpu(vector_irq, new_cpu)[vector])) goto next; @@ -234,13 +238,14 @@ static int __assign_irq_vector(int irq, struct apic_chip_data *d, static int assign_irq_vector(int irq, struct apic_chip_data *data, const struct cpumask *mask, -struct irq_data *irqdata) +struct irq_data *irqdata, +unsigned long *allowed_vectors) {
ath9k hardware corrupts MSI Message Data, raises wrong interrupt
Hi, The ath9k wireless driver in mainline currently does not have support for PCI MSI interrupts, it uses legacy interrupts instead. However we are working with a number of 3rd party laptop models based on Intel Apollo Lake which will soon be available on the consumer market. They all appear to have broken legacy interrupt wiring for the wifi card. Unfortunately the hardware can't be changed so we are instead looking at making ath9k use MSI interrupts which is what we believe they are doing on Windows. To recap what MSI is: The host OS can configure a Message Address value and a Message Data value within the device's PCI configuration space. When the device wishes to interrupt the host, instead of pulsing a logic level on the legacy interrupt pin, it will instead write the value of Message Data into the address specified in Message Address. This write will then trigger interrupt handling mechanisms within the kernel. The code below can be used to tell the ath9k hardware to use MSI interrupts instead of legacy interrupts (sorry that it's a bit unclean). However, it is not working, as reproduced on multiple devices. No interrupts are counted against the ath9k MSI IRQ, and we get messages like these spammed in the kernel logs: do_IRQ: 0.64 No irq handler for vector The device does not appear to be MSI-X capable. Configuration dump for the device at this point: 02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter (rev 01) Subsystem: AzureWave Device 218d Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- 64, 99 -> 96). Why might it be doing this? My guess, looking at the PCI specs: The Multiple Message Enable field (bits 6-4 of the Message Control register) defines the number of low order message data bits the function is permitted to modify to generate its system software allocated vectors. For example, a Multiple Message Enable encoding of “010” indicates the function has been allocated four vectors and is permitted to modify message data bits 1 and 0 (a function modifies the lower message data bits to generate the allocated number of vectors). If the Multiple Message Enable field is “000”, the function is not permitted to modify the message data. Linux is not working with Multiple Messages and has written the 000 value as described. However, I suspect the device is not fully following the spec here and is effectively taking ownership of the 2 lower bits, and setting them to 00 to indicate that it is working with the first of the 4 possible MSI IRQ vectors. I thought about modifying Linux's vector-assignment algorithm to consider this special case and only assign a single vector number with the low bits already set as 00, but that seems like a hairy topic and that code is distanced from the driver too. The algorithm in question is __assign_irq_vector() in arch/x86/kernel/apic/msi.c Similarly the idea of adding support for MSI_FLAG_MULTI_PCI_MSI to the PCI-MSI adapter would encounter similar challenges where ultimately we'd need to allocate 4 contiguous vectors and that is not really in agreement with the design of __assign_irq_vector(). I'd appreciate any suggestions for next steps here. Do any ath9k developers have datasheet or vendor contacts that might shine light on the behaviour I suspect here where the Message Data bits are being incorrectly zeroed out? Any PCI experts that have any bright ideas for how we could introduce a workaround for this possibly broken hardware in upstreamable form? Thanks Daniel --- drivers/net/wireless/ath/ath9k/hw.c | 34 ++-- drivers/net/wireless/ath/ath9k/hw.h | 3 +++ drivers/net/wireless/ath/ath9k/init.c | 4 drivers/net/wireless/ath/ath9k/mac.c | 42 +++ drivers/net/wireless/ath/ath9k/pci.c | 20 - drivers/net/wireless/ath/ath9k/reg.h | 15 + 6 files changed, 110 insertions(+), 8 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/hw.c b/drivers/net/wireless/ath/ath9k/hw.c index 8c5c2dd8fa7f..8c25d14cd9fc 100644 --- a/drivers/net/wireless/ath/ath9k/hw.c +++ b/drivers/net/wireless/ath/ath9k/hw.c @@ -922,6 +922,7 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw *ah, AR_IMR_RXERR | AR_IMR_RXORN | AR_IMR_BCNMISC; + u32 msi_cfg = 0; if (AR_SREV_9340(ah) || AR_SREV_9550(ah) || AR_SREV_9531(ah) || AR_SREV_9561(ah)) @@ -929,22 +930,33 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw *ah, if (AR_SREV_9300_20_OR_LATER(ah)) { imr_reg |= AR_IMR_RXOK_HP; - if (ah->config.rx_intr_mitigation) + if (ah->config.rx_intr_mitigation) { imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR; -
Where is wil6210.fw / wil6210.brd?
Hi, We are working with a new consumer laptop model that includes a wil6210 wireless adapter. It is not usable on current Linux distros because the firmware is not present. It's not in linux-firmware and we can't even find any download links when searching the web. Could you please send a copy of the required firmware to linux-firmware? https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git Thanks Daniel
ath9k excessive delay in handling EAPOL frames
Hi, As this is remote problem debugging I haven't gathered quite as much info as I would like, and won't be investigating further immediately, but I would like to share what I have found so far, maybe it is useful knowledge and we can revisit later. With the following hardware on Linux 4.4, we cannot connect to our office WPA2-PSK network. Other networks seem fine. 02:00.0 Network controller [0280]: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter [168c:0036] (rev 01) Subsystem: AzureWave Device [1a3b:218d] Kernel driver in use: ath9k The logs show: wpa_supplicant[585]: wlp2s0: SME: Trying to authenticate with 0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz) kernel: wlp2s0: authenticate with 0c:11:67:33:8d:50 NetworkManager[620]: [1474483556.0677] device (wlp2s0): supplicant interface state: inactive -> authenticating kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 1/3) kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 2/3) kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 3/3) wpa_supplicant[585]: wlp2s0: Trying to associate with 0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz) kernel: wlp2s0: authenticated NetworkManager[620]: [1474483558.1078] device (wlp2s0): supplicant interface state: authenticating -> associating kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 1/3) kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 2/3) kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 3/3) wpa_supplicant[585]: wlp2s0: Associated with 0c:11:67:33:8d:50 kernel: wlp2s0: RX AssocResp from 0c:11:67:33:8d:50 (capab=0x431 status=0 aid=5) kernel: wlp2s0: associated kernel: wlp2s0: deauthenticated from 0c:11:67:33:8d:50 (Reason: 23=IEEE8021X_FAILED) Using monitor mode from another station, I observe: - STA sends association request - AP sends association response 0.01s later, STA acks - AP sends EAPOL 0.002s later, STA acks - AP sends another EAPOL 0.1s later, STA acks - AP sends deauthentication 0.3s later (presumably a timeout waiting for EAPOL response), STA acks - STA sends another association request 0.5s later - AP replies with Deauthentication (can't associated as you are deauthed) - STA sends another association request 1s later - AP replies with Deauthentication again - STA sends EAPOL response message, a full 2 seconds after the first EAPOL was received It is as if the processing of incoming frames is getting stuck for 2 seconds, even though they were already ACKed. i.e. The first association requests succeeds immediately but the processing of the AssocResp frame (and the following EAPOLs and deauth) is delayed by more than 2 seconds, far longer than the AP is willing to wait. I have confirmed this perspective in the wpa_supplicant debug logs too, there is 2 seconds of RX silence after the first association request is sent before all the frames come in at once. Hope this partial info is useful in some way, I'll come back to this problem as time permits. Daniel