Re: brcmfmac signal/interference issues

2018-03-28 Thread Daniel Drake
On Thu, Mar 8, 2018 at 4:47 AM, Arend van Spriel
 wrote:
>> 43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf
>>
>> Yes, ndis. So no easy way to run the same firmware on the 2 OSes.
>
> Indeed. I could try building nearly same firmware target. Can you provide
> the firmware version as well.

Full string is:
43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf
Version: 7.45.87.0 CRC: 7cb2470e Date: Thu 2016-04-21 22:31:44 PDT
Ucode Ver: 1043.2070 FWID: 01-f68ec182

If you could build that config but for Linux instead of ndis I would
love to try it.

Also, here is the string for the current one in linux-firmware:
43455c0-roml/sdio-ag-p2p-pno-aoe-pktfilter-keepalive-mchan-pktctx-proptxstatus-ampduhostreorder-lpc-pwropt-txbf-wnm-okc-ccx-ltecx-wfds-wl11u-mfp-tdls-ve
Version: 7.45.18.0 CRC: d7226371 Date: Sun 2015-03-01 07:31:57 PST
Ucode Ver: 1026.2 FWID: 01-6a2c8ad4

I note that the Version and UcodeVer in the linux-firmware version are
lower than the windows one. If it's possible to also rebuild the
linux-firmware config but with those newer versions (or even the
latest, if there is something newer), I will test that too.

> So it picks up something in the PC. Some sources of interference that I have
> seen before are USB3 and HDMI. Maybe try to shield those if present and see
> if that helps. The nvram contains sensitivity parameters, but as you stated
> you are using the same nvram for windows and linux for now we can rule it
> out for debugging the issue.

Yeah, there are some options here which we can try to explore. What's
still unknown though is why windows appears immune to this exact same
interference. A software fix would be much more convenient... :)

Daniel


Re: brcmfmac signal/interference issues

2018-03-28 Thread Daniel Drake
On Thu, Mar 8, 2018 at 9:54 AM, Steve deRosier  wrote:
> Did you check the Bluetooth?  I don't know if this chip has it or if
> it's an independent chip on this board, but if Linux is leaving it
> powered up but not properly configured you could have issues.

I had already disabled it via hciconfig, without any effect on the problem.

Based on your suggestion I also checked BT_REG_ON, which was not being
affected by hciconfig. On AP6255 I believe it is active high, so I
brought it low to disable bluetooth, confirmed with a scope, and the
problem is still there.

Thanks for the suggestion anyway!
Daniel


Re: brcmfmac signal/interference issues

2018-02-23 Thread Daniel Drake
On Fri, Feb 23, 2018 at 12:54 PM, Arend van Spriel
 wrote:
> Yup. Windows firmware talks NDIS. If you run 'strings 4345r6rtecdc.bin |
> tail -1' you can see the firmware build target and it likely has 'ndis' in
> it.

43455c0-roml/sdio-ag-ndis-vista-pktfilter-d0c-pno-aoe-p2p-dhdoid-ndoe-gtkoe-mfp-proptxstatus-dmatxrc-keepalive-ap-ampduretry-pclose-txbf

Yes, ndis. So no easy way to run the same firmware on the 2 OSes.

> Now are you using BT as well on this device? Another suggestion I got is to
> disable transmit beamforming which brcmfmac enables by default. Not sure if
> this device supports it, but could you try the patch below.

Thanks for the ideas. I had already tried with the bluetooth disabled
- no change there.
Also reproduced the problem after applying your patch.

Daniel


Re: brcmfmac signal/interference issues

2018-02-23 Thread Daniel Drake
Hi,

On Wed, Feb 21, 2018 at 12:39 PM, Daniel Drake <dr...@endlessm.com> wrote:
> Thanks for looking into this. Here is the brcmfmac43455-sdio.txt file
> we are using:
> https://gist.github.com/dsd/d7ee3caa6dfd77f0bcd16cf272b20298
> This is identical to the 4345r6nvram.txt file from windows.

I checked Windows again and it seems to be using a firmware file
4345r6rtecdc.bin alongside this nvram data.
This firmware is different from the one in linux-firmware. I've
uploaded it here:
https://drive.google.com/open?id=1MUsiaoozslJb8SCYOR-FNbJFuD-h4PY_

I was hoping to try this on Linux to see if it makes any difference to
the issue seen here.
However, with thisi firmware in place, I can't connect to the network
at all. It associates, wpa_supplicant never sees the first WPA2 key
message sent from the AP - even though wireshark on a separate monitor
shows that the key message was sent, and that the STA acked it.

I turned off WPA2 to make it an open network instead, and now I am
unable to complete the DHCP conversation. According to the monitor
station, the STA succesfully transmits DHCPDISCOVER and the AP
responds with DHCPOFFER. The offer is acked, but dhclient never sees
it, and eventually times out.

Any ideas why this firmware may not be working at all on linux?

Thanks,
Daniel


Re: brcmfmac signal/interference issues

2018-02-21 Thread Daniel Drake
Hi Arend,

On Wed, Feb 21, 2018 at 12:07 PM, Arend van Spriel
<arend.vanspr...@broadcom.com> wrote:
>
> On 2/21/2018 9:14 AM, Daniel Drake wrote:
>>
>> Hi,
>>
>> We're working with the Weibu F3C MiniPC which includes BCM43455 SDIO
>> wifi chip 0x004345(17221) rev 0x06 (AP6255 module).
>>
>> We are seeing a strange issue where usually within an hour of usage,
>> the wifi connection becomes so unstable and lossy that it is unusable.
>> While investigating this my standard test is to send ICMP pings to the
>> IP address of the local access point. Normally the latency is 5-10ms,
>> but when this problem is seen it will go to 500ms and then increase up
>> towards 20s before completely timing out.
>>
>> Sometimes it is possible to induce the problem on-demand by stressing
>> some combination of CPU, disk and/or USB. At this point, ping reply
>> latency increases from ~5ms to 500ms+ before increasing even further.
>> Killing the stress test, the pings immediately return to normal. This
>> is not concrete though - I also seem to have a lot of luck hitting the
>> problem in the morning when booting up the computer from stone cold
>> state, while it is idle.
>>
>> When the problem is being reproduced (ping times are high or get no
>> response), touching the exposed metal on the antenna connector with my
>> finger makes ping times return to normal. Touching it with a piece of
>> plastic does not have the same effect - so it is some effect of body
>> capacitance or similar. Also, disconnecting the antenna makes ping
>> times return to normal, although outside of the simple pings,
>> bandwidth is much reduced.
>>
>> Additionally, when the problem is being reproduced, if I move the
>> antenna outside of the case, ping times return to normal. When I move
>> the antenna back into the miniPC case vicinity, it goes slow and lossy
>> again.
>>
>> I have used a separate monitoring station with wireshark to look at
>> the 802.11 traffic while this is happening. When the problem is
>> reproduced, the miniPC is mostly unable to TX anything, and the AP
>> sends frames and retries them but with no ACK visible from the miniPC.
>> Immediately when I touch the antenna connector with my finger, tx
>> frames from the miniPC appear and the conversation comes back to life.
>>
>> Running Linux 4.15 but we believe all versions are affected.
>>
>> This very much sounds like a hardware issue, but here is where things
>> get interesting: Windows 10 on the same unit has no such problem.
>>
>> I set up 2 units side by side - one running Windows 10 and the other
>> running Linux, connected to the same AP. The top part of the MiniPC
>> case has been removed so I can see the motherboard. I free up the
>> antennas from the MiniPC casing and they are on a relatively long
>> cable, so they can be freely moved around in this test, allowing me to
>> dangle the antenna into the vicinity of the neighbouring unit miniPC
>> case.
>>
>> If I place both antenna terminals inside the Linux MiniPC case, the
>> Linux pings are bad but the Windows pings are fine.
>>
>> If I place both antenna terminals inside the Windows MiniPC case, it
>> is the same: Linux pings are bad, but the Windows pings are fine.
>>
>> And when the Linux antenna is placed outside of both cases, the Linux
>> pings are fine. I've repeated these tests a handful of times in quick
>> succession to make sure that I'm not going crazy and that this is not
>> a case of the problem intermittency causing misleading results. These
>> findings appear very solid.
>>
>> This suggests that regardless of the running OS, the MiniPC produces
>> some kind of interference that intermittently has an extremely
>> detrimental effect on wifi signal when you are running Linux. However,
>> Windows is somehow immune to this.
>>
>> Any ideas for how to continue debugging this? How can we make the
>> Linux driver immune to this interference like the windows one is?
>
>
> Hi Daniel,
>
> Thanks. I forwarded your detailed report. My first hunch would be the nvram 
> file used. Are you using the same nvram file on Linux as the one on Windows? 
> If not can you compare them or better just sent them.


Thanks for looking into this. Here is the brcmfmac43455-sdio.txt file
we are using:
https://gist.github.com/dsd/d7ee3caa6dfd77f0bcd16cf272b20298
This is identical to the 4345r6nvram.txt file from windows.

Daniel


brcmfmac signal/interference issues

2018-02-21 Thread Daniel Drake
Hi,

We're working with the Weibu F3C MiniPC which includes BCM43455 SDIO
wifi chip 0x004345(17221) rev 0x06 (AP6255 module).

We are seeing a strange issue where usually within an hour of usage,
the wifi connection becomes so unstable and lossy that it is unusable.
While investigating this my standard test is to send ICMP pings to the
IP address of the local access point. Normally the latency is 5-10ms,
but when this problem is seen it will go to 500ms and then increase up
towards 20s before completely timing out.

Sometimes it is possible to induce the problem on-demand by stressing
some combination of CPU, disk and/or USB. At this point, ping reply
latency increases from ~5ms to 500ms+ before increasing even further.
Killing the stress test, the pings immediately return to normal. This
is not concrete though - I also seem to have a lot of luck hitting the
problem in the morning when booting up the computer from stone cold
state, while it is idle.

When the problem is being reproduced (ping times are high or get no
response), touching the exposed metal on the antenna connector with my
finger makes ping times return to normal. Touching it with a piece of
plastic does not have the same effect - so it is some effect of body
capacitance or similar. Also, disconnecting the antenna makes ping
times return to normal, although outside of the simple pings,
bandwidth is much reduced.

Additionally, when the problem is being reproduced, if I move the
antenna outside of the case, ping times return to normal. When I move
the antenna back into the miniPC case vicinity, it goes slow and lossy
again.

I have used a separate monitoring station with wireshark to look at
the 802.11 traffic while this is happening. When the problem is
reproduced, the miniPC is mostly unable to TX anything, and the AP
sends frames and retries them but with no ACK visible from the miniPC.
Immediately when I touch the antenna connector with my finger, tx
frames from the miniPC appear and the conversation comes back to life.

Running Linux 4.15 but we believe all versions are affected.

This very much sounds like a hardware issue, but here is where things
get interesting: Windows 10 on the same unit has no such problem.

I set up 2 units side by side - one running Windows 10 and the other
running Linux, connected to the same AP. The top part of the MiniPC
case has been removed so I can see the motherboard. I free up the
antennas from the MiniPC casing and they are on a relatively long
cable, so they can be freely moved around in this test, allowing me to
dangle the antenna into the vicinity of the neighbouring unit miniPC
case.

If I place both antenna terminals inside the Linux MiniPC case, the
Linux pings are bad but the Windows pings are fine.

If I place both antenna terminals inside the Windows MiniPC case, it
is the same: Linux pings are bad, but the Windows pings are fine.

And when the Linux antenna is placed outside of both cases, the Linux
pings are fine. I've repeated these tests a handful of times in quick
succession to make sure that I'm not going crazy and that this is not
a case of the problem intermittency causing misleading results. These
findings appear very solid.

This suggests that regardless of the running OS, the MiniPC produces
some kind of interference that intermittently has an extremely
detrimental effect on wifi signal when you are running Linux. However,
Windows is somehow immune to this.

Any ideas for how to continue debugging this? How can we make the
Linux driver immune to this interference like the windows one is?

Thanks
Daniel


Re: Make brcmfmac repeat authentication requests

2018-02-15 Thread Daniel Drake
On Thu, Feb 15, 2018 at 3:46 PM, Arend van Spriel
 wrote:
> Ok. Could you create a log with driver debugging enabled, ie. build driver 
> CONFIG_BRCMDBG=y and load with module param 'debug=0x1416'. The problem is 
> probably when the firmware is configured.

Logs from driver load at boot:
https://gist.github.com/dsd/7f9a7e8b0f8e20794aaed6298b2cb96a

Logs from interface up:
https://gist.github.com/dsd/13909ed821f7429e6be6a97ed91a61af

Logs from connection attempt:
https://gist.github.com/dsd/ae4a664c45e3d379d765231d96ae20d7

By the way, I noticed that the new parameter is called assoc_retry_max.
However here the problem is at the authentication stage. We do not
reach association.
Does assoc_retry_max also affect the authentication codepath, or is
there an equivalent parameter for retrying auth?

Thanks
Daniel


Re: Make brcmfmac repeat authentication requests

2018-02-14 Thread Daniel Drake
Hi,

Thanks for the fast response.

On Tue, Feb 13, 2018 at 12:50 PM, Arend van Spriel
 wrote:
> I tried to find info about that access point equipment, but not getting any
> hits apart from a olivetti laser printer, but I doubt it is that. Can you
> provide more details.

The device itself is basically unbranded (just says "4G LTE"). It's an
access point and mifi bridge (so insert a sim card and it shares your
mobile data connection on the LAN). It comes as part of a solar home
solutions package.

MF928 is listed as the product name behind the battery. In the web UI
it says it is from the EV910 product family, hardware version
LR521_V1.0. I can't find info online about it.

> User-space (wpa_supplicant) would retry the connect attempt so I guess you
> are saying that the timing between the two auth requests is important?

Yes, the error goes up to userspace which then retries. However around
15 seconds pass before the authentication request is sent again, and
also as part of the retry it redoes the probe requests etc. Windows
does the same but there is only a 3 second delay. I haven't checked if
this device needs the authentication request resent in less than 3
seconds, or if the problem is that it needs to be sent twice in
consecutive frames (i.e. without another probe request in the middle).

> Is firmware not repeating at all or is the time between the two auth
> requests too long?

Firmware is not repeating at all.

> Checking firmware there is a 300ms timeout and it does a retry if the limit
> is not reached. However, that limit is initialized to zero :-p
>
> Could you try the patch below?

Thanks for looking into the firmware! Unfortunately the change does
not appear to make any difference. As before, the auth request is
ACKed by the AP but then the conversation halts until userspace steps
in on timeout a few seconds later.

Daniel


> Regards,
> Arend
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
> b/drivers
> index 19686ef..af1ab00 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/common.c
> @@ -384,6 +384,9 @@ int brcmf_c_preinit_dcmds(struct brcmf_if *ifp)
> goto done;
> }
>
> +   /* allow join retry by firmware */
> +   (void)brcmf_fil_iovar_int_set(ifp, "assoc_retry_max", 1);
> +
> /* Enable tx beamforming, errors can be ignored (not supported) */
> (void)brcmf_fil_iovar_int_set(ifp, "txbf", 1);
>


Make brcmfmac repeat authentication requests

2018-02-12 Thread Daniel Drake
Hi,

We are working with the Weibu F3C MiniPC which includes BCM43455 SDIO
wifi chip 0x004345(17221) rev 0x06

Testing Linux 4.15, this wifi adapter is unable to authenticate with
the MF928 MiFi Access Point which is common in Africa. The STA sends
the authentication request, which is ACKed by the AP, but then the
conversation ends there (a timeout later bubbles up to userspace).
Windows 10 with broadcom driver version 1.605.1.0 is also unable to
connect.

My laptop with ath10k can authenticate and connect fine. There the
conversation is:

1. STA sends authentication request
2. AP sends ACK
3. After 0.1s timeout, STA sends another auth request
4. AP sends ACK
5. AP sends authentication response
6. etc.

Also confirmed the same pattern on a couple of smartphones, where the
delay seems to be 0.3s before repeating the authentication request.

Clearly this AP is not behaving correctly; the authentication request
should not have to be repeated. However of all the devices to hand,
unfortunately only this broadcom device is unable to connect.

Is there a way to adjust the driver/firmware to repeat the
authentication requests when they are not responded to? This would
match the behaviour of other devices and work around this issue.

Thanks
Daniel


Re: [v2] ath9k: add MSI support

2018-01-08 Thread Daniel Drake
On Mon, Jan 8, 2018 at 6:24 AM, Kalle Valo <kv...@qca.qualcomm.com> wrote:
> (Adding AceLan)
>
> Daniel Drake <dr...@endlessm.com> writes:
>
>> On Wed, Nov 15, 2017 at 7:38 AM, Daniel Drake <dr...@endlessm.com> wrote:
>>> On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valo <kv...@qca.qualcomm.com> wrote:
>>>>> Can't be fixed in firmware, but it would be good to have confirmation
>>>>> of the hardware behavivour, and maybe some other solution is possible?
>>>>> Are you following this up within Qualcomm?
>>>>
>>>> No time to do that right now, sorry.
>>>
>>> I got several autoresponders from people on this thread from Qualcomm
>>> Taiwan. Would it be useful for us to drop off a sample of the affected
>>> product at your Taipei or Hsinchu office so that you can investigate
>>> further?
>>
>> Ping - how can we collaborate on this?
>
> Are you asking me? While looking at my todo list for this year I doubt I
> can find time to help with the MSI implementation or bugfixing.

So far you are the only Qualcomm person to reply to the many mails I
have written on this topic, so I appreciate the response. I have sunk
many hours into this unfortunate situation so I'd really appreciate if
you could point me to someone at Qualcomm who can provide a response.
I am willing to continue doing the hard work, but I do need some
Qualcomm help in getting past brick walls.

> But my plan is that first I would apply Russel's patch which makes it
> possible to enable MSI with a module parameter:
>
> https://patchwork.kernel.org/patch/249/

This isn't enough to fix many of the systems that are affected by this
issue. You add the parameter, enable it, and MSI support totally fails
to deliver any interrupts. Pasting again from earlier:

I have tested your patch on Acer Aspire ES1-432. It does not work - I
still can't connect to wifi.
/proc/interrupts shows that no MSI interrupts are delivered, the counters are 0.

lspci -vv shows:
Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+
Address: fee0f00c  Data: 4142
Masking: 000e  Pending: 

So MSI is enabled and the vector number is 0x42 (decimal 66).
However my kernel log is now totally spammed with:
  do_IRQ: 0.64 No irq handler for vector

My assumption here is that the ath9k hardware implementation of MSI is
buggy, and it is therefore corrupting the MSI vector number by zeroing
out the lower 2 bits (e.g. 66 -> 64).

It would be very useful if Qualcomm could confirm if this behaviour is
really true.

For more info please see:
   https://marc.info/?l=linux-pci=150238260826803=2
   https://marc.info/?t=15063128321=1=2
   https://marc.info/?l=linux-pci=150831581725596=2

Thanks,
Daniel


Re: [v2] ath9k: add MSI support

2017-12-12 Thread Daniel Drake
On Wed, Nov 15, 2017 at 7:38 AM, Daniel Drake <dr...@endlessm.com> wrote:
> On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valo <kv...@qca.qualcomm.com> wrote:
>>> Can't be fixed in firmware, but it would be good to have confirmation
>>> of the hardware behavivour, and maybe some other solution is possible?
>>> Are you following this up within Qualcomm?
>>
>> No time to do that right now, sorry.
>
> I got several autoresponders from people on this thread from Qualcomm
> Taiwan. Would it be useful for us to drop off a sample of the affected
> product at your Taipei or Hsinchu office so that you can investigate
> further?

Ping - how can we collaborate on this?

Also, we have been testing the MSI support patch and while it seems to
be working fine on AR9565, multiple users hit failures on AR9462. The
most common report is that the system simply cannot maintain the
connection with the AP for more than a few seconds. It hits a check in
mac80211 where it sends a nullfunc to the AP and expects an ack in
less than 500ms, but it disconnects since it doesn't see the ack.

https://marc.info/?l=linux-wireless=151027741010422=2

We also reproduced a problem in our office with AR9462. With the MSI
support patch in use, we ping a server every second for 1000 seconds
while monitoring "iw dev wlp2s0 link" output. With the MSI support
patch in place, this test fails every time; the connection is dropped
in less than 1000s.
With the patch reverted everything is fine.

We ran the same test with AR9565 in MSI mode and it worked fine.

Daniel


Re: [v2] ath9k: add MSI support

2017-11-14 Thread Daniel Drake
On Tue, Nov 14, 2017 at 8:15 PM, Kalle Valo  wrote:
>> Can't be fixed in firmware, but it would be good to have confirmation
>> of the hardware behavivour, and maybe some other solution is possible?
>> Are you following this up within Qualcomm?
>
> No time to do that right now, sorry.

I got several autoresponders from people on this thread from Qualcomm
Taiwan. Would it be useful for us to drop off a sample of the affected
product at your Taipei or Hsinchu office so that you can investigate
further?

Thanks
Daniel


Re: [v2] ath9k: add MSI support

2017-11-13 Thread Daniel Drake
On Mon, Nov 13, 2017 at 4:48 PM, Kalle Valo  wrote:
> Enabling MSI by default is just too invasive, ath9k is used in so many
> different enviroments that risk of regressions is high. MSI needs a lot
> of testing before we can even consider enabling it by default.

And it seems like we already found a regression here - the MSI Message
Data is being corrupted as described in my last mail. Can't be fixed
in firmware, but it would be good to have confirmation of the hardware
behavivour, and maybe some other solution is possible? Are you
following this up within Qualcomm?

>> I have tested your patch on Acer Aspire ES1-432. It does not work -
>> I still can't connect to wifi.
>> /proc/interrupts shows that no MSI interrupts are delivered, the
>> counters are 0.
>>
>> lspci -vv shows:
>> Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+
>> Address: fee0f00c  Data: 4142
>> Masking: 000e  Pending: 
>>
>> So MSI is enabled and the vector number is 0x42 (decimal 66).
>> However my kernel log is now totally spammed with:
>>   do_IRQ: 0.64 No irq handler for vector
>>
>> My assumption here is that the ath9k hardware implementation of
>> MSI is buggy, and it is therefore corrupting the MSI vector number
>> by zeroing out the lower 2 bits (e.g. 66 -> 64).

Thanks
Daniel


Re: [v2] ath9k: add MSI support

2017-11-09 Thread Daniel Drake
Hi Russell,

> On new Intel platforms like ApolloLake, legacy interrupt mechanism
> (INTx) is not supported

Could you please share the background on what you are claiming here.
I have multiple ApolloLake laptops here with many legacy interrupts
being used in /proc/interrupts.

I do see this ath9k problem on multiple Acer ApolloLake laptops, however
I also have an Asus E402NA ApolloLake laptop on hand where the exact same
ath9k miniPCIe card is working fine with legacy interrupts.

> With module paremeter "use_msi=1", ath9k driver would try to
> use MSI instead of INTx.

In the previous patch review it was suggested that MSI should become
the default - not a quirk or parameter.
https://lkml.org/lkml/2017/9/26/64


I have tested your patch on Acer Aspire ES1-432. It does not work -
I still can't connect to wifi.
/proc/interrupts shows that no MSI interrupts are delivered, the
counters are 0.

lspci -vv shows:
Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+
Address: fee0f00c  Data: 4142
Masking: 000e  Pending: 

So MSI is enabled and the vector number is 0x42 (decimal 66).
However my kernel log is now totally spammed with:
  do_IRQ: 0.64 No irq handler for vector

My assumption here is that the ath9k hardware implementation of
MSI is buggy, and it is therefore corrupting the MSI vector number
by zeroing out the lower 2 bits (e.g. 66 -> 64).

It would be very useful if Qualcomm could confirm if this behaviour
is really true and if it could potentially be fixed with a new ath9k
firmware version.

For more info please see:
   https://marc.info/?l=linux-pci=150238260826803=2
   https://marc.info/?t=15063128321=1=2
   https://marc.info/?l=linux-pci=150831581725596=2

Thanks
Daniel


> diff --git a/drivers/net/wireless/ath/ath9k/hw.c 
> b/drivers/net/wireless/ath/ath9k/hw.c
> index 8c5c2dd8fa7f..cd0f023ccf77 100644
> --- a/drivers/net/wireless/ath/ath9k/hw.c
> +++ b/drivers/net/wireless/ath/ath9k/hw.c
> @@ -922,6 +922,7 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw 
> *ah,
>   AR_IMR_RXERR |
>   AR_IMR_RXORN |
>   AR_IMR_BCNMISC;
> + u32 msi_cfg = 0;
>  
>   if (AR_SREV_9340(ah) || AR_SREV_9550(ah) || AR_SREV_9531(ah) ||
>   AR_SREV_9561(ah))
> @@ -929,22 +930,30 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw 
> *ah,
>  
>   if (AR_SREV_9300_20_OR_LATER(ah)) {
>   imr_reg |= AR_IMR_RXOK_HP;
> - if (ah->config.rx_intr_mitigation)
> + if (ah->config.rx_intr_mitigation) {
>   imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR;
> - else
> + msi_cfg |= AR_INTCFG_MSI_RXINTM | AR_INTCFG_MSI_RXMINTR;
> + } else {
>   imr_reg |= AR_IMR_RXOK_LP;
> -
> + msi_cfg |= AR_INTCFG_MSI_RXOK;
> + }
>   } else {
> - if (ah->config.rx_intr_mitigation)
> + if (ah->config.rx_intr_mitigation) {
>   imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR;
> - else
> + msi_cfg |= AR_INTCFG_MSI_RXINTM | AR_INTCFG_MSI_RXMINTR;
> + } else {
>   imr_reg |= AR_IMR_RXOK;
> + msi_cfg |= AR_INTCFG_MSI_RXOK;
> + }
>   }
>  
> - if (ah->config.tx_intr_mitigation)
> + if (ah->config.tx_intr_mitigation) {
>   imr_reg |= AR_IMR_TXINTM | AR_IMR_TXMINTR;
> - else
> + msi_cfg |= AR_INTCFG_MSI_TXINTM | AR_INTCFG_MSI_TXMINTR;
> + } else {
>   imr_reg |= AR_IMR_TXOK;
> + msi_cfg |= AR_INTCFG_MSI_TXOK;
> + }
>  
>   ENABLE_REGWRITE_BUFFER(ah);
>  
> @@ -952,6 +961,16 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw 
> *ah,
>   ah->imrs2_reg |= AR_IMR_S2_GTT;
>   REG_WRITE(ah, AR_IMR_S2, ah->imrs2_reg);
>  
> + if (ah->msi_enabled) {
> + ah->msi_reg = REG_READ(ah, AR_PCIE_MSI);
> + ah->msi_reg |= AR_PCIE_MSI_HW_DBI_WR_EN;
> + ah->msi_reg &= AR_PCIE_MSI_HW_INT_PENDING_ADDR_MSI_64;
> + REG_WRITE(ah, AR_INTCFG, msi_cfg);
> + ath_dbg(ath9k_hw_common(ah), ANY,
> + "value of AR_INTCFG=0x%X, msi_cfg=0x%X\n",
> + REG_READ(ah, AR_INTCFG), msi_cfg);
> + }
> +
>   if (!AR_SREV_9100(ah)) {
>   REG_WRITE(ah, AR_INTR_SYNC_CAUSE, 0x);
>   REG_WRITE(ah, AR_INTR_SYNC_ENABLE, sync_default);
> diff --git a/drivers/net/wireless/ath/ath9k/hw.h 
> b/drivers/net/wireless/ath/ath9k/hw.h
> index 4ac70827d142..0d6c07c77372 100644
> --- a/drivers/net/wireless/ath/ath9k/hw.h
> +++ b/drivers/net/wireless/ath/ath9k/hw.h
> @@ -977,6 +977,9 @@ struct ath_hw {
>   bool tpc_enabled;
>   u8 tx_power[Ar5416RateSize];
>   u8 tx_power_stbc[Ar5416RateSize];
> + bool msi_enabled;
> + u32 msi_mask;
> + u32 msi_reg;
>  };

Re: ath9k disconnects in 4.13 with reason=4 locally_generated=1

2017-11-09 Thread Daniel Drake
On Fri, Nov 3, 2017 at 5:51 PM, Jouni Malinen <jo...@qca.qualcomm.com> wrote:
> On Fri, Nov 03, 2017 at 10:57:11AM +0800, Daniel Drake wrote:
>> Endless OS recently upgraded from Linux 4.11 to Linux 4.13, and we now
>> have a few reports of issues with ath9k wireless becoming unusable.
>>
>> In the logs we can see that it authenticates, associates and completes
>> the WPA 4 way handshake, before then being disconnected with:
>>
>>  wlp2s0: CTRL-EVENT-DISCONNECTED bssid=74:26:ac:68:2f:c0 reason=4
>> locally_generated=1
>
> reason=4 is WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY. I'd expect the most
> likely source of this to be one of the mac80211 code paths in mlme.c
> where disconnection is triggered if the current AP become unreachable.
> Getting a debug log from mac80211 might help in figuring out what is
> causing this (there seem to be number of mlme_dbg() calls before most,
> but not necessarily all, places where
> WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY is used).

We got the log, it is coming from ieee80211_sta_work()

else if (ieee80211_hw_check(>hw, REPORTS_TX_ACK_STATUS)) {
sdata_info(sdata,t
 "Failed to send nullfunc to AP %pM after %dms,
disconnecting\n",
 bssid, probe_wait_ms);
ieee80211_sta_connection_lost(sdata, bssid,
WLAN_REASON_DISASSOC_DUE_TO_INACTIVITY, false);

I looked again at changes between 4.12 and 4.13 and still no idea how
4.13 causes this problem :(

Daniel


ath9k disconnects in 4.13 with reason=4 locally_generated=1

2017-11-02 Thread Daniel Drake
Hi,

Endless OS recently upgraded from Linux 4.11 to Linux 4.13, and we now
have a few reports of issues with ath9k wireless becoming unusable.

In the logs we can see that it authenticates, associates and completes
the WPA 4 way handshake, before then being disconnected with:

 wlp2s0: CTRL-EVENT-DISCONNECTED bssid=74:26:ac:68:2f:c0 reason=4
locally_generated=1

The cycle then repeats with it connecting again before being swiftly
disconnected, etc.

More logs: https://gist.github.com/dsd/49f263c67c2859838ce168628ab043e0

At the same time that we upgraded the kernel, we also upgraded many
other components (e.g. NetworkManager and wpa_supplicant), however the
same problem has been reported on Arch Linux and a user there reports
that he narrowed it down to a kernel regression between 4.12 and 4.13:
https://bbs.archlinux.org/viewtopic.php?id=225199

Unfortunately we can not reproduce this in our office, so can't offer
much more info yet, but we are continuing to investigate. I have not
found any codepaths in userspace that generate disconnect reason 4, so
I think it must be something in the kernel causing the disconnection,
but I did not see any suspicious changes in recent commit history.

It would be good to hear from anyone who has heard of this or has any
ideas about causes or solutions.

Thanks
Daniel


Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically

2017-10-12 Thread Daniel Drake
On Fri, Oct 13, 2017 at 9:12 AM, AceLan Kao  wrote:
> Hi Daniel,
>
> After applied the 2 commits you mentioned in the email, ath9k works.
>
> https://marc.info/?l=linux-wireless=150631274108016=2
> https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657

Thanks for testing. However the approach was basically rejected in this thread:
  [PATCH] PCI MSI: allow alignment restrictions on vector allocation
  https://marc.info/?t=15063128321=1=2

So we still need an upstream solution.

I am curious what Qualcomm have to say about their hardware corrupting
the MSI Message Data value. Is there any news on them submitting the
MSI support patch?

Separately we have the option of seeing if Intel can help us unblock
the legacy interrupt (assuming it was simply blocked by the BIOS), or
adding an interrupt-polling fallback path to ath9k.

Daniel


Re: [PATCH] PCI MSI: allow alignment restrictions on vector allocation

2017-10-04 Thread Daniel Drake
On Mon, Oct 2, 2017 at 10:38 PM, Thomas Gleixner  wrote:
>> After checking out the new code and thinking this through a bit, I think
>> perhaps the only generic approach that would work is to make the
>> ath9k driver require a vector allocation that enables the entire block
>> of 4 MSI IRQs that the hardware supports (which is what Windows is doing).
>
> I wonder how Windows deals with the affinity problem for multi-MSI. Or does
> it just not allow to set it at all?

https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/interrupt-affinity-and-priority
Looks like IRQ affinity can only be set by registry or inf files. I assume
that means it is not dynamic and hence avoids the challenges related to
moving interrupts around at runtime.

> What's wrong with just using the legacy INTx emulation if you cannot
> allocate 4 MSI vectors?

The Legacy interrupt simply doesn't work for the wifi on at least 8 new Acer
laptop products based on Intel Apollo Lake.
Plus 4 Dell systems included in the patches in this thread:
https://lkml.org/lkml/2017/9/26/55
(the 2 which I can find specs for are also Apollo Lake)

We have tried taking the mini-PCIe wifi module out of one of the affected
Acer products and moved it to another computer, where it is working fine
with legacy interrupts. So this suggests that the wifi module itself is OK,
but we are facing a hardware limitation or BIOS limitation on the affected
products. In the Dell thread it says "Some platform(BIOS) blocks legacy
interrupts (INTx)".

If you have any suggestions for how we might solve this without getting into
the MSI mess then that would be much appreciated. If the BIOS blocks the
interrupts, can Linux unblock them?

Just for reference I'm attaching my latest attempt at enabling MULTI_PCI_MSI.
It would definitely need further work if we proceed here - so far I've
ignored the affinity considerations that you explained, and it's not
particularly clean.

I'll now have a look at polling for interrupts in the ath9k driver.

---
 arch/x86/kernel/apic/msi.c|  3 +-
 arch/x86/kernel/apic/vector.c | 75 ---
 include/linux/irq.h   |  3 +-
 kernel/irq/matrix.c   | 23 +++--
 4 files changed, 74 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 5b6dd1a85ec4..c57b6a7b9317 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -129,7 +129,8 @@ static struct msi_domain_ops pci_msi_domain_ops = {
 
 static struct msi_domain_info pci_msi_domain_info = {
.flags  = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS |
- MSI_FLAG_PCI_MSIX | MSI_FLAG_MUST_REACTIVATE,
+ MSI_FLAG_PCI_MSIX | MSI_FLAG_MUST_REACTIVATE |
+ MSI_FLAG_MULTI_PCI_MSI,
.ops= _msi_domain_ops,
.chip   = _msi_controller,
.handler= handle_edge_irq,
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6789e286def9..2926fd92ea1c 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -35,7 +35,8 @@ struct apic_chip_data {
unsigned intmove_in_progress: 1,
is_managed  : 1,
can_reserve : 1,
-   has_reserved: 1;
+   has_reserved: 1,
+   contig_allocation   : 1;
 };
 
 struct irq_domain *x86_vector_domain;
@@ -198,7 +199,8 @@ static int reserve_irq_vector(struct irq_data *irqd)
return 0;
 }
 
-static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest)
+static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest,
+  unsigned int num, unsigned int align_mask)
 {
struct apic_chip_data *apicd = apic_chip_data(irqd);
bool resvd = apicd->has_reserved;
@@ -215,18 +217,21 @@ static int allocate_vector(struct irq_data *irqd, const 
struct cpumask *dest)
if (vector && cpu_online(cpu) && cpumask_test_cpu(cpu, dest))
return 0;
 
-   vector = irq_matrix_alloc(vector_matrix, dest, resvd, );
+   vector = irq_matrix_alloc(vector_matrix, dest, resvd, ,
+ num, align_mask);
if (vector > 0)
apic_update_vector(irqd, vector, cpu);
+
trace_vector_alloc(irqd->irq, vector, resvd, vector);
return vector;
 }
 
 static int assign_vector_locked(struct irq_data *irqd,
-   const struct cpumask *dest)
+   const struct cpumask *dest,
+   unsigned int num, unsigned int align_mask)
 {
struct apic_chip_data *apicd = apic_chip_data(irqd);
-   int vector = allocate_vector(irqd, dest);
+   

Re: [PATCH 2/6] ath9k: add a quirk to set use_msi automatically

2017-10-01 Thread Daniel Drake
Hi AceLan,

On Thu, Sep 28, 2017 at 4:28 PM, AceLan Kao  wrote:
> Hi Daniel,
>
> I've tried your patch, but it doesn't work for me.
> Wifi can scan AP, but can't get connected.

Can you please clarify which patch(es) you have tried?

This is the base patch which adds the infrastructure to request
specific MSI IRQ vectors:
https://marc.info/?l=linux-wireless=150631274108016=2

This is the ath9k MSI patch which makes use of that:
https://github.com/endlessm/linux/commit/739c7a924db8f4434a9617657

If you were already able to use ath9k MSI interrupts without specific
consideration for which MSI vector numbers were used, these are the
possible explanations that spring to mind:

1. You got lucky and it picked a vector number that is 4-aligned. You
can check this in the "lspci -vvv" output. You'll see something like:
Capabilities: [50] MSI: Enable+ Count=1/4 Maskable+ 64bit+
Address: fee0300c  Data: 4142
The lower number is the vector number. In my example here 0x42 (66) is
not 4-aligned so the failure condition will be hit.

2. You are using interrupt remapping, which I suspect may provide a
high likelihood of MSI interrupt vectors being 4-aligned. See if
/proc/interrupts shows the IRQ type as IR-PCI-MSI
Unfortunately interrupt remapping is not available here,
https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023717.html

3. My assumption that all ath9k hardware corrupts the MSI vector
number could wrong. However we've seen this on different wifi modules
in laptops produced by different OEMs and ODMs, so it seems to be a
somewhat widespread problem at least.

4. My assumption that ath9k hardware is corrupting the MSI vector
number could be wrong; maybe another component is to blame, could it
be a BIOS issue? Admittedly I don't really know how I can debug the
layers inbetween seeing the MSI Message Data value disagree with the
vector number being handled inside do_IRQ().

Daniel


[PATCH] PCI MSI: allow alignment restrictions on vector allocation

2017-09-24 Thread Daniel Drake
ath9k hardware claims to support up to 4 MSI vectors, and when run in
that configuration, it would be allowed to modify the lower bits of the
MSI Message Data when generating interrupts in order to signal which
of the 4 vectors the interrupt is being raised on.

Linux's PCI-MSI irqchip only supports a single MSI vector for each
device, and it tells the device this, but the device appears to assume
it is working with 4, as it will unset the lower 2 bits of Message Data
presumably to indicate that it is an IRQ for the first of 4 possible
vectors.

Linux will then receive an interrupt on the wrong vector, so the
ath9k interrupt handler will not be invoked.

To work around this, introduce a mechanism where the vector assignment
algorithm can be restricted to only a subset of available vector numbers
based on a bitmap.

As a user of this bitmap, introduce a pci_dev.align_msi_vector flag which
can be used to state that MSI vector numbers must be aligned to a specific
amount. If we 4-align the ath9k MSI vector then the lower bits will
already be 0 and hence the device will not modify the Message Data away
from its original value.

This is needed in order to support the wifi card in at least 8 new Acer
consumer laptop models which come with the Foxconn NFA335 WiFi module.
Legacy interrupts do not work on that module, so MSI support is required.

Signed-off-by: Daniel Drake <dr...@endlessm.com>

https://phabricator.endlessm.com/T16988
---
 arch/x86/include/asm/hw_irq.h |  1 +
 arch/x86/kernel/apic/msi.c| 15 +++
 arch/x86/kernel/apic/vector.c | 32 +---
 include/linux/pci.h   |  2 ++
 4 files changed, 43 insertions(+), 7 deletions(-)

This solves the issue described here:
https://marc.info/?l=linux-pci=150238260826803=2

If this approach looks good I'll follow up with the ath9k patch
to enable MSI interrupts.

diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 6dfe366a8804..7f35178586a1 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -77,6 +77,7 @@ struct irq_alloc_info {
struct {
struct pci_dev  *msi_dev;
irq_hw_number_t msi_hwirq;
+   DECLARE_BITMAP(allowed_vectors, NR_VECTORS);
};
 #endif
 #ifdef CONFIG_X86_IO_APIC
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 9b18be764422..80067873cfd5 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -111,6 +111,21 @@ int pci_msi_prepare(struct irq_domain *domain, struct 
device *dev, int nvec,
arg->flags |= X86_IRQ_ALLOC_CONTIGUOUS_VECTORS;
}
 
+   if (pdev->align_msi_vector) {
+   /* We have specific alignment requirements on the vector
+* number used by the device. Set up a bitmap that restricts
+* the vector selection accordingly.
+*/
+   int i = pdev->align_msi_vector;
+
+   set_bit(0, arg->allowed_vectors);
+   for (; i < NR_VECTORS; i += pdev->align_msi_vector)
+   set_bit(i, arg->allowed_vectors);
+   } else {
+   /* No specific alignment requirements so allow all vectors. */
+   bitmap_fill(arg->allowed_vectors, NR_VECTORS);
+   }
+
return 0;
 }
 EXPORT_SYMBOL_GPL(pci_msi_prepare);
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 88c214e75a6b..64ddac198c25 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -104,7 +104,8 @@ static void free_apic_chip_data(struct apic_chip_data *data)
 
 static int __assign_irq_vector(int irq, struct apic_chip_data *d,
   const struct cpumask *mask,
-  struct irq_data *irqdata)
+  struct irq_data *irqdata,
+  unsigned long *allowed_vectors)
 {
/*
 * NOTE! The local APIC isn't very good at handling
@@ -178,6 +179,9 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
if (test_bit(vector, used_vectors))
goto next;
 
+   if (allowed_vectors && !test_bit(vector, allowed_vectors))
+   goto next;
+
for_each_cpu(new_cpu, vector_searchmask) {
if (!IS_ERR_OR_NULL(per_cpu(vector_irq, 
new_cpu)[vector]))
goto next;
@@ -234,13 +238,14 @@ static int __assign_irq_vector(int irq, struct 
apic_chip_data *d,
 
 static int assign_irq_vector(int irq, struct apic_chip_data *data,
 const struct cpumask *mask,
-struct irq_data *irqdata)
+struct irq_data *irqdata,
+unsigned long *allowed_vectors)
 {
  

ath9k hardware corrupts MSI Message Data, raises wrong interrupt

2017-08-10 Thread Daniel Drake
Hi,

The ath9k wireless driver in mainline currently does not have support for
PCI MSI interrupts, it uses legacy interrupts instead.

However we are working with a number of 3rd party laptop models based on
Intel Apollo Lake which will soon be available on the consumer market. They
all appear to have broken legacy interrupt wiring for the wifi card.
Unfortunately the hardware can't be changed so we are instead looking at
making ath9k use MSI interrupts which is what we believe they are doing on
Windows.

To recap what MSI is: The host OS can configure a Message Address value and
a Message Data value within the device's PCI configuration space. When the
device wishes to interrupt the host, instead of pulsing a logic level on the
legacy interrupt pin, it will instead write the value of Message Data into
the address specified in Message Address. This write will then trigger
interrupt handling mechanisms within the kernel.

The code below can be used to tell the ath9k hardware to use MSI interrupts
instead of legacy interrupts (sorry that it's a bit unclean). However, it
is not working, as reproduced on multiple devices. No interrupts are
counted against the ath9k MSI IRQ, and we get messages like these spammed
in the kernel logs:
  do_IRQ: 0.64 No irq handler for vector

The device does not appear to be MSI-X capable.

Configuration dump for the device at this point:

02:00.0 Network controller: Qualcomm Atheros QCA9565 / AR9565 Wireless Network 
Adapter (rev 01)
Subsystem: AzureWave Device 218d
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-  64,
99 -> 96). Why might it be doing this? My guess, looking at the PCI specs:

The Multiple Message Enable field (bits 6-4 of the Message Control
register) defines the number of low order message data bits the
function is permitted to modify to generate its system software
allocated vectors. For example, a Multiple Message Enable encoding of
“010” indicates the function has been allocated four vectors and is
permitted to modify message data bits 1 and 0 (a function modifies the
lower message data bits to generate the allocated number of vectors).
If the Multiple Message Enable field is “000”, the function is not
permitted to modify the message data.

Linux is not working with Multiple Messages and has written the 000 value
as described. However, I suspect the device is not fully following the
spec here and is effectively taking ownership of the 2 lower bits, and
setting them to 00 to indicate that it is working with the first of the
4 possible MSI IRQ vectors.

I thought about modifying Linux's vector-assignment algorithm to consider
this special case and only assign a single vector number with the low bits
already set as 00, but that seems like a hairy topic and that code is
distanced from the driver too. The algorithm in question is
 __assign_irq_vector() in arch/x86/kernel/apic/msi.c

Similarly the idea of adding support for MSI_FLAG_MULTI_PCI_MSI to the
PCI-MSI adapter would encounter similar challenges where ultimately we'd
need to allocate 4 contiguous vectors and that is not really in agreement
with the design of __assign_irq_vector().

I'd appreciate any suggestions for next steps here. Do any ath9k
developers have datasheet or vendor contacts that might shine light on
the behaviour I suspect here where the Message Data bits are being
incorrectly zeroed out? Any PCI experts that have any bright ideas for
how we could introduce a workaround for this possibly broken hardware
in upstreamable form?

Thanks
Daniel


---
 drivers/net/wireless/ath/ath9k/hw.c   | 34 ++--
 drivers/net/wireless/ath/ath9k/hw.h   |  3 +++
 drivers/net/wireless/ath/ath9k/init.c |  4 
 drivers/net/wireless/ath/ath9k/mac.c  | 42 +++
 drivers/net/wireless/ath/ath9k/pci.c  | 20 -
 drivers/net/wireless/ath/ath9k/reg.h  | 15 +
 6 files changed, 110 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/hw.c 
b/drivers/net/wireless/ath/ath9k/hw.c
index 8c5c2dd8fa7f..8c25d14cd9fc 100644
--- a/drivers/net/wireless/ath/ath9k/hw.c
+++ b/drivers/net/wireless/ath/ath9k/hw.c
@@ -922,6 +922,7 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw *ah,
AR_IMR_RXERR |
AR_IMR_RXORN |
AR_IMR_BCNMISC;
+   u32 msi_cfg = 0;
 
if (AR_SREV_9340(ah) || AR_SREV_9550(ah) || AR_SREV_9531(ah) ||
AR_SREV_9561(ah))
@@ -929,22 +930,33 @@ static void ath9k_hw_init_interrupt_masks(struct ath_hw 
*ah,
 
if (AR_SREV_9300_20_OR_LATER(ah)) {
imr_reg |= AR_IMR_RXOK_HP;
-   if (ah->config.rx_intr_mitigation)
+   if (ah->config.rx_intr_mitigation) {
imr_reg |= AR_IMR_RXINTM | AR_IMR_RXMINTR;
- 

Where is wil6210.fw / wil6210.brd?

2017-03-14 Thread Daniel Drake
Hi,

We are working with a new consumer laptop model that includes a
wil6210 wireless adapter.

It is not usable on current Linux distros because the firmware is not
present. It's not in linux-firmware and we can't even find any
download links when searching the web.

Could you please send a copy of the required firmware to linux-firmware?
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git

Thanks
Daniel


ath9k excessive delay in handling EAPOL frames

2016-10-05 Thread Daniel Drake
Hi,

As this is remote problem debugging I haven't gathered quite as much
info as I would like, and won't be investigating further immediately,
but I would like to share what I have found so far, maybe it is useful
knowledge and we can revisit later.

With the following hardware on Linux 4.4, we cannot connect to our
office WPA2-PSK network. Other networks seem fine.

02:00.0 Network controller [0280]: Qualcomm Atheros QCA9565 / AR9565
Wireless Network Adapter [168c:0036] (rev 01)
Subsystem: AzureWave Device [1a3b:218d]
Kernel driver in use: ath9k

The logs show:

wpa_supplicant[585]: wlp2s0: SME: Trying to authenticate with
0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz)
kernel: wlp2s0: authenticate with 0c:11:67:33:8d:50
NetworkManager[620]:  [1474483556.0677] device (wlp2s0):
supplicant interface state: inactive -> authenticating
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 1/3)
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 2/3)
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 3/3)
wpa_supplicant[585]: wlp2s0: Trying to associate with
0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz)
kernel: wlp2s0: authenticated
NetworkManager[620]:  [1474483558.1078] device (wlp2s0):
supplicant interface state: authenticating -> associating
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 1/3)
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 2/3)
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 3/3)
wpa_supplicant[585]: wlp2s0: Associated with 0c:11:67:33:8d:50
kernel: wlp2s0: RX AssocResp from 0c:11:67:33:8d:50 (capab=0x431 status=0 aid=5)
kernel: wlp2s0: associated
kernel: wlp2s0: deauthenticated from 0c:11:67:33:8d:50 (Reason:
23=IEEE8021X_FAILED)

Using monitor mode from another station, I observe:

- STA sends association request
- AP sends association response 0.01s later, STA acks
- AP sends EAPOL 0.002s later, STA acks
- AP sends another EAPOL 0.1s later, STA acks
- AP sends deauthentication 0.3s later (presumably a timeout waiting
for EAPOL response), STA acks
- STA sends another association request 0.5s later
- AP replies with Deauthentication (can't associated as you are deauthed)
- STA sends another association request 1s later
- AP replies with Deauthentication again
- STA sends EAPOL response message, a full 2 seconds after the first
EAPOL was received

It is as if the processing of incoming frames is getting stuck for 2
seconds, even though they were already ACKed. i.e. The first
association requests succeeds immediately but the processing of the
AssocResp frame (and the following EAPOLs and deauth) is delayed by
more than 2 seconds, far longer than the AP is willing to wait.

I have confirmed this perspective in the wpa_supplicant debug logs
too, there is 2 seconds of RX silence after the first association
request is sent before all the frames come in at once.

Hope this partial info is useful in some way, I'll come back to this
problem as time permits.

Daniel