Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-12-03 Thread Timo Sigurdsson
Hi Mohammed,

so, I provoked the crash again on a newer build with debugging enabled
(LEDE r2321, Linux 4.4.32, ath10k firmware 10.2.4.70.54).

The relevant dmesg output is pasted below (with mac addresses
pseudo-anonymized). I hope this includes the information you were
looking for. If not, please let me know what other useful information I
can provide. I haven't rebooted the access point yet.

Regards,

Timo

[283232.075393] ath10k_pci :01:00.0: failed to delete peer 
78:f8:82:00:00:00 for vdev 0: -145
[283232.084140] ath10k_pci :01:00.0: found sta peer 78:f8:82:00:00:00 (ptr 
86df2800 id 62) entry on vdev 0 after it was supposedly removed
[283232.096976] [ cut here ]
[283232.101904] WARNING: CPU: 0 PID: 1841 at 
compat-wireless-2016-10-08/net/mac80211/sta_info.c:964 
sta_set_sinfo+0x92c/0x9e0 [mac80211]()
[283232.114279] Modules linked in: pppoe ppp_async iptable_nat ath9k pppox 
ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT 
ipt_MASQUERADE ax88179_178a ath9k_common xt_time xt_tcpudp xt_tcpmss 
xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark 
xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP 
xt_CT xt_CLASSIFY usbnet slhc nf_reject_ipv4 nf_nat_redirect 
nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 
nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter 
ipt_ECN ip_tables crc_ccitt ath9k_hw em_nbyte sch_tbf sch_pie sch_gred sch_htb 
sch_teql cls_basic act_ipt sch_red sch_prio em_meta em_text sch_codel sch_sfq 
act_police sch_dsmark em_cmp sch_fq act_skbedit act_mirred em_u32 cls_u32 
cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ath10k_pci 
ath10k_core ath mac80211 cfg80211 compat ledt
 rig_usbport ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_raw 
ip6table_mangle ip6table_filter ip6_tables x_tables ifb ehci_platform ehci_hcd 
gpio_button_hotplug usbcore nls_base usb_common mii
[283232.221696] CPU: 0 PID: 1841 Comm: hostapd Not tainted 4.4.32 #0
[283232.227884] Stack : 803c74e4  0001 8042 87d65b80 8040ed83 
803a8bac 0731
[283232.227884]   8048379c 8740e800  7782fe94  800a7238 
803ae218 8041
[283232.227884]   0003 8740e800 803ac624 86e1db2c  800a51b4 
01afb952 
[283232.227884]   0001 801f3200     
 
[283232.227884]         
 
[283232.227884]   ...
[283232.264341] Call Trace:
[283232.266934] [<80071b80>] show_stack+0x50/0x84
[283232.271445] [<80081900>] warn_slowpath_common+0xa0/0xd0
[283232.276847] [<800819b8>] warn_slowpath_null+0x18/0x24
[283232.282164] [<875870a4>] sta_set_sinfo+0x92c/0x9e0 [mac80211]
[283232.288224] [<87587188>] __sta_info_destroy+0x30/0x48 [mac80211]
[283232.294519] [<87587238>] sta_info_destroy_addr_bss+0x38/0x60 [mac80211]
[283232.301475] [<874cc144>] cfg80211_check_station_change+0xed8/0x1390 
[cfg80211]
[283232.308942] 
[283232.310543] ---[ end trace 1e173b5175dabd85 ]---
[283547.702654] ath10k_pci :01:00.0: failed to install key for vdev 0 peer 
a0:02:dc:00:00:00: -145
[283547.711846] wlan0: failed to remove key (0, a0:02:dc:00:00:00) from 
hardware (-145)
[283547.720952] ath10k_pci :01:00.0: cipher 0 is not supported
[283547.727006] ath10k_pci :01:00.0: failed to remove peer wep key 0: -122
[283547.734106] ath10k_pci :01:00.0: failed to clear all peer wep keys for 
vdev 0: -122
[283547.742313] ath10k_pci :01:00.0: failed to disassociate station: 
a0:02:dc:00:00:00 vdev 0: -122
[283547.751590] [ cut here ]
[283547.756521] WARNING: CPU: 0 PID: 1841 at 
compat-wireless-2016-10-08/net/mac80211/sta_info.c:956 
sta_set_sinfo+0x8d8/0x9e0 [mac80211]()
[283547.768872] Modules linked in: pppoe ppp_async iptable_nat ath9k pppox 
ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT 
ipt_MASQUERADE ax88179_178a ath9k_common xt_time xt_tcpudp xt_tcpmss 
xt_statistic xt_state xt_recent xt_nat xt_multiport xt_mark xt_mac xt_limit 
xt_length xt_id xt_hl xt_helper xt_ecn xt_dscp xt_conntrack xt_connmark 
xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_HL xt_DSCP 
xt_CT xt_CLASSIFY usbnet slhc nf_reject_ipv4 nf_nat_redirect 
nf_nat_masquerade_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 
nf_conntrack_rtcache nf_conntrack iptable_raw iptable_mangle iptable_filter 
ipt_ECN ip_tables crc_ccitt ath9k_hw em_nbyte sch_tbf sch_pie sch_gred sch_htb 
sch_teql cls_basic act_ipt sch_red sch_prio em_meta em_text sch_codel sch_sfq 
act_police sch_dsmark em_cmp sch_fq act_skbedit act_mirred em_u32 cls_u32 
cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ingress ath10k_pci 
ath10k_core ath mac80211 cfg80211 compat ledt
 rig_usbport ip6t_REJECT 

Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-11-27 Thread Timo Sigurdsson
Hi Mohammed,

sorry for the late reply, but I was on a business trip last week.

The log I had attached is all I got from this crash. I have no experience
with kernel debugging, but I assume some info might be missing because the
kernel was not compiled with debug information. So, I will have to provoke
another crash with a different firmware image that has more debugging
options enabled. It might take a few days, until the error occurs again,
but I'll try to gather more information and post it.

Regards,

Timo


Mohammed Shafi Shajakhan schrieb am 21.11.2016 13:47:

> Hi Timo,
> 
> sorry had I missed the exact kernel crash call trace ?
> I could see only the warnings, can you please post the call
> trace of the kernel crash please ?
> 
> regards,
> shafi
> 
> On Sun, Nov 20, 2016 at 02:35:27AM +0100, Timo Sigurdsson wrote:
>> Hi Adrian,
>> 
>> sure - here's the bug report on kernel.org:
>> https://bugzilla.kernel.org/show_bug.cgi?id=188201
>> 
>> Regards,
>> 
>> Timo
>> 
>> 
>> Adrian Chadd schrieb am 18.11.2016 22:22:
>> 
>> > hiya!
>> > 
>> > Can you file a kernel.org bug mentioning this?
>> > 
>> > thanks!
>> > 
>> > 
>> > -a
>> > 
>> > 
>> > On 18 November 2016 at 01:30, Timo Sigurdsson
>> >  wrote:
>> >> Hi again,
>> >>
>> >> in the meantime, I have some more information to add to the issue 
>> >> mentioned
>> in
>> >> my email quoted further down below.
>> >>
>> >> Ben Greear approached me off-list and suggested to try the Candela Tech
>> ath10k
>> >> driver and firmware and see if the issue occurs with that as well. So, for
>> the
>> >> last 3 weeks I've been testing the CT driver and firmware and I can 
>> >> happily
>> >> report that the issue with the driver crashing after a while when a Nexus
>> 5X
>> >> device is connected is not occuring with the current BETA 18
>> >> firmware-2-ct-full-community.bin. So, this really seems like a regression
>> in
>> >> the official API level 5 ath10k firmware blobs.
>> >>
>> >> The CT firmware is not perfect either, since it seems to suffer from a
>> >> different
>> >> bug that has been resolved in the official firmwares, and that is that
>> after a
>> >> reboot of my TP-Link Archer C7 v2, the ath10k driver won't load. Only 
>> >> after
>> a
>> >> hard reset or "cold boot" it will come up. That's an issue I had with 
>> >> older
>> >> official firmwares as well, but it has been resolved with the recent API
>> level
>> >> 5 firmwares.
>> >>
>> >> Nevertheless, for the time being, I will stick to the CT firmware because 
>> >> I
>> >> can
>> >> work around the reboot issue and having the 5GHz wifi working for my Nexus
>> 5X
>> >> clients is more important.
>> >>
>> >> Over the next weeks, I will test different combinations of ath10k(-ct)
>> driver
>> >> and firmware to see if there's a combination that resolves all issues. 
>> >> This
>> >> morning I flashed a LEDE build with the official ath10k driver and the CT
>> >> firmware binary.
>> >>
>> >> But of course, if someone has more suggestions on what I could try or what
>> >> information I could collect to help resolve the issue related to the Nexus
>> 5X
>> >> clients in the official firmware binaries, I think that would be 
>> >> beneficial
>> >> for a larger audience.
>> >>
>> >> Regards,
>> >>
>> >> Timo
>> >>
>> >> P.S.: Please include my email address in any reply, since I'm not
>> subscribed
>> >> to the mailing list. Thank you.
>> >>
>> >>
>> >> Timo Sigurdsson schrieb am 29.10.2016 22:19:
>> >>
>> >>> Hi everybody,
>> >>>
>> >>> I have a TP-Link Archer C7 v2 running a fairly recent build of LEDE
>> (r1952,
>> >>> Linux 4.4.26, compat-wireless-2016-10-08). It all works well except for
>> the
>> >>> fact that when I connect a Nexus 5X device to the 5GHz radio, the kernel
>> or
>> >>> ath10k driver will crash after a while. 5Ghz wifi will be dead after that
>> >>> until I reboot the system.
>> >>>
>> >>> This issue has been reported before [1] and it also has been declared as
>> >>> solved with newer firmwares [2] (but reopened by other users). However,
>> even
>> >>> with the latest firmware 10.2.4.70.58 from Kalle Valo's Github repository
>> the
>> >>> issue is far from resolved. I have tried many different firmware 
>> >>> revisions
>> >>> over the time (more recently 10.2.4.70.56 and 10.2.4.70.54), and I can
>> could
>> >>> only find that the issue sometimes takes longer to trigger with some
>> >>> firmwares
>> >>> (which might just be random), but it would always occur at some point 
>> >>> with
>> >>> API
>> >>> level 5 firmwares. With API level 2 firmwares (which I testesd when I was
>> >>> still using OpenWrt 15.05) I never saw these crashes, but the Nexus 5X 
>> >>> had
>> >>> other connectivity issues with these older firmwares that made this
>> >>> combination no fun to use either. But this shows that the firmware itself
>> >>> makes the difference here.
>> >>>
>> >>> I actually have two Nexus 5X on my network (my wife's and my own). I can
>> >>> trigger the 

Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-11-18 Thread Adrian Chadd
hiya!

Can you file a kernel.org bug mentioning this?

thanks!


-a


On 18 November 2016 at 01:30, Timo Sigurdsson
 wrote:
> Hi again,
>
> in the meantime, I have some more information to add to the issue mentioned in
> my email quoted further down below.
>
> Ben Greear approached me off-list and suggested to try the Candela Tech ath10k
> driver and firmware and see if the issue occurs with that as well. So, for the
> last 3 weeks I've been testing the CT driver and firmware and I can happily
> report that the issue with the driver crashing after a while when a Nexus 5X
> device is connected is not occuring with the current BETA 18
> firmware-2-ct-full-community.bin. So, this really seems like a regression in
> the official API level 5 ath10k firmware blobs.
>
> The CT firmware is not perfect either, since it seems to suffer from a 
> different
> bug that has been resolved in the official firmwares, and that is that after a
> reboot of my TP-Link Archer C7 v2, the ath10k driver won't load. Only after a
> hard reset or "cold boot" it will come up. That's an issue I had with older
> official firmwares as well, but it has been resolved with the recent API level
> 5 firmwares.
>
> Nevertheless, for the time being, I will stick to the CT firmware because I 
> can
> work around the reboot issue and having the 5GHz wifi working for my Nexus 5X
> clients is more important.
>
> Over the next weeks, I will test different combinations of ath10k(-ct) driver
> and firmware to see if there's a combination that resolves all issues. This
> morning I flashed a LEDE build with the official ath10k driver and the CT
> firmware binary.
>
> But of course, if someone has more suggestions on what I could try or what
> information I could collect to help resolve the issue related to the Nexus 5X
> clients in the official firmware binaries, I think that would be beneficial
> for a larger audience.
>
> Regards,
>
> Timo
>
> P.S.: Please include my email address in any reply, since I'm not subscribed
> to the mailing list. Thank you.
>
>
> Timo Sigurdsson schrieb am 29.10.2016 22:19:
>
>> Hi everybody,
>>
>> I have a TP-Link Archer C7 v2 running a fairly recent build of LEDE (r1952,
>> Linux 4.4.26, compat-wireless-2016-10-08). It all works well except for the
>> fact that when I connect a Nexus 5X device to the 5GHz radio, the kernel or
>> ath10k driver will crash after a while. 5Ghz wifi will be dead after that
>> until I reboot the system.
>>
>> This issue has been reported before [1] and it also has been declared as
>> solved with newer firmwares [2] (but reopened by other users). However, even
>> with the latest firmware 10.2.4.70.58 from Kalle Valo's Github repository the
>> issue is far from resolved. I have tried many different firmware revisions
>> over the time (more recently 10.2.4.70.56 and 10.2.4.70.54), and I can could
>> only find that the issue sometimes takes longer to trigger with some 
>> firmwares
>> (which might just be random), but it would always occur at some point with 
>> API
>> level 5 firmwares. With API level 2 firmwares (which I testesd when I was
>> still using OpenWrt 15.05) I never saw these crashes, but the Nexus 5X had
>> other connectivity issues with these older firmwares that made this
>> combination no fun to use either. But this shows that the firmware itself
>> makes the difference here.
>>
>> I actually have two Nexus 5X on my network (my wife's and my own). I can
>> trigger the crash with either one of them. And if both Nexus 5X are connected
>> to the 5Ghz radio, then the issue triggers much faster (can be as low as 15
>> minutes). My workaround is to let the Nexus 5X devices only connect to the
>> 2.4GHz radio. This way, the device can runs for weeks without any issue or
>> crash, but of course I would prefer the actual bug being fixed rather than to
>> circumvent it.
>>
>> I'm appending a syslog from my access point with a crash happening while one
>> Nexus 5X was connected to the 5GHz radio starting from the time the system
>> booted. I randomized the MAC addresses. but left the first two characters
>> unique so different clients can be distinguished.
>>
>> If there is more info I could collect to help identify the cause of this
>> issue, please let me know (and possibly how to do that as well).
>>
>> Thank you and regards,
>>
>> Timo
>>
>> [1] http://lists.infradead.org/pipermail/ath10k/2015-November/006413.html
>> [2] https://dev.openwrt.org/ticket/20854
>>
>> And here's the log:
>> Fri Oct 28 02:01:35 2016 kern.notice kernel: [0.00] Linux version
>> 4.4.26 (user@buildsystem) (gcc version 5.4.0 (LEDE GCC 5.4.0 r1952) ) #0 Fri
>> Oct 21 15:52:28 2016
>> [...]
>> Fri Oct 28 02:01:35 2016 kern.info kernel: [9.468751] Loading modules
>> backported from Linux version wt-2016-10-03-1-g6fcb1a6
>> Fri Oct 28 02:01:35 2016 kern.info kernel: [9.476481] Backport generated 
>> by
>> backports.git backports-20160324-9-g0e38f5c
>> Fri Oct 28 02:01:35 

Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-11-18 Thread Timo Sigurdsson
Hi again,

in the meantime, I have some more information to add to the issue mentioned in
my email quoted further down below.

Ben Greear approached me off-list and suggested to try the Candela Tech ath10k
driver and firmware and see if the issue occurs with that as well. So, for the
last 3 weeks I've been testing the CT driver and firmware and I can happily
report that the issue with the driver crashing after a while when a Nexus 5X
device is connected is not occuring with the current BETA 18
firmware-2-ct-full-community.bin. So, this really seems like a regression in
the official API level 5 ath10k firmware blobs.

The CT firmware is not perfect either, since it seems to suffer from a different
bug that has been resolved in the official firmwares, and that is that after a
reboot of my TP-Link Archer C7 v2, the ath10k driver won't load. Only after a 
hard reset or "cold boot" it will come up. That's an issue I had with older
official firmwares as well, but it has been resolved with the recent API level
5 firmwares.

Nevertheless, for the time being, I will stick to the CT firmware because I can
work around the reboot issue and having the 5GHz wifi working for my Nexus 5X
clients is more important.

Over the next weeks, I will test different combinations of ath10k(-ct) driver
and firmware to see if there's a combination that resolves all issues. This
morning I flashed a LEDE build with the official ath10k driver and the CT
firmware binary.

But of course, if someone has more suggestions on what I could try or what
information I could collect to help resolve the issue related to the Nexus 5X
clients in the official firmware binaries, I think that would be beneficial
for a larger audience.

Regards,

Timo

P.S.: Please include my email address in any reply, since I'm not subscribed
to the mailing list. Thank you.


Timo Sigurdsson schrieb am 29.10.2016 22:19:

> Hi everybody,
> 
> I have a TP-Link Archer C7 v2 running a fairly recent build of LEDE (r1952, 
> Linux 4.4.26, compat-wireless-2016-10-08). It all works well except for the
> fact that when I connect a Nexus 5X device to the 5GHz radio, the kernel or
> ath10k driver will crash after a while. 5Ghz wifi will be dead after that
> until I reboot the system.
> 
> This issue has been reported before [1] and it also has been declared as
> solved with newer firmwares [2] (but reopened by other users). However, even
> with the latest firmware 10.2.4.70.58 from Kalle Valo's Github repository the
> issue is far from resolved. I have tried many different firmware revisions
> over the time (more recently 10.2.4.70.56 and 10.2.4.70.54), and I can could
> only find that the issue sometimes takes longer to trigger with some firmwares
> (which might just be random), but it would always occur at some point with API
> level 5 firmwares. With API level 2 firmwares (which I testesd when I was
> still using OpenWrt 15.05) I never saw these crashes, but the Nexus 5X had
> other connectivity issues with these older firmwares that made this
> combination no fun to use either. But this shows that the firmware itself
> makes the difference here.
> 
> I actually have two Nexus 5X on my network (my wife's and my own). I can
> trigger the crash with either one of them. And if both Nexus 5X are connected
> to the 5Ghz radio, then the issue triggers much faster (can be as low as 15
> minutes). My workaround is to let the Nexus 5X devices only connect to the
> 2.4GHz radio. This way, the device can runs for weeks without any issue or
> crash, but of course I would prefer the actual bug being fixed rather than to
> circumvent it.
> 
> I'm appending a syslog from my access point with a crash happening while one
> Nexus 5X was connected to the 5GHz radio starting from the time the system
> booted. I randomized the MAC addresses. but left the first two characters
> unique so different clients can be distinguished.
> 
> If there is more info I could collect to help identify the cause of this
> issue, please let me know (and possibly how to do that as well).
> 
> Thank you and regards,
> 
> Timo
> 
> [1] http://lists.infradead.org/pipermail/ath10k/2015-November/006413.html
> [2] https://dev.openwrt.org/ticket/20854
> 
> And here's the log:
> Fri Oct 28 02:01:35 2016 kern.notice kernel: [0.00] Linux version
> 4.4.26 (user@buildsystem) (gcc version 5.4.0 (LEDE GCC 5.4.0 r1952) ) #0 Fri
> Oct 21 15:52:28 2016
> [...]
> Fri Oct 28 02:01:35 2016 kern.info kernel: [9.468751] Loading modules
> backported from Linux version wt-2016-10-03-1-g6fcb1a6
> Fri Oct 28 02:01:35 2016 kern.info kernel: [9.476481] Backport generated 
> by
> backports.git backports-20160324-9-g0e38f5c
> Fri Oct 28 02:01:35 2016 kern.warn kernel: [9.576570] PCI: Enabling device
> :01:00.0 ( -> 0002)
> Fri Oct 28 02:01:35 2016 kern.info kernel: [9.582475] ath10k_pci
> :01:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
> Fri Oct 28 02:01:35 2016 kern.warn kernel: [   

Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-11-01 Thread Phil Bradley-Schmieg
Unfortunately, there are no crashlogs.  If there's anything else I can
do to help debug this, please let me know!

On 31 October 2016 at 09:11, Jo-Philipp Wich  wrote:
> Hi Phil,
>
> if you enter failsafe after boot-loop, is there a
> /sys/kernel/debug/crashlog available and if-yes, can you post its
> contents here?
>
> ~ Jo
>

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-10-31 Thread Timo Sigurdsson
Hi,

this is reply to Jo's message [1] which was only sent to the mailing list, but 
not to me directly. As I am not subscribed to the mailing list, please 
excplicitly include me in future replies. Thank you.

Jo-Philipp Wich schrieb am 31.10.2016 01:37:

> Hi Timo,
> 
> did you try with kmod-ath10k or kmod-ath10k-ct ?
> 
> Might be worth trying the -ct variant, to see if it has the same issue.
> 
> ~ Jo

So far, all my testing has been done with the "official" kmod-ath10k. But Ben 
Greear also approached me off-list and recommended me to test the ath10k-ct 
driver and firmware. I will give it a try this week and see how it works. 

Thanks and regards,

Timo

[1] http://lists.infradead.org/pipermail/lede-dev/2016-October/003751.html

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-10-31 Thread Phil Bradley-Schmieg
Just to report a possibly connected issue: snapshot LEDE builds for
the Archer C7 v1 (I tried Saturday and last Monday) go into bootloop
(though failsafe mode remains available, thankfully).

The C7 v1 5ghz / AC radio isn't supported by ath10k, according to the
OpenWrt wiki. Could ath10k also be causing the bootloop?  I'm new to
all this, but if there's anything I can do to help diagnose the
problem, please let me know.  I'm currently running OpenWrt trunk
snapshot with no problem (besides lack of 5ghz / AC)

On 31 October 2016 at 08:37, Jo-Philipp Wich  wrote:
> Hi Timo,
>
> did you try with kmod-ath10k or kmod-ath10k-ct ?
>
> Might be worth trying the -ct variant, to see if it has the same issue.
>
> ~ Jo
>
> ___
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-10-31 Thread Jo-Philipp Wich
Hi Phil,

if you enter failsafe after boot-loop, is there a
/sys/kernel/debug/crashlog available and if-yes, can you post its
contents here?

~ Jo



signature.asc
Description: OpenPGP digital signature
___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


Re: [LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-10-31 Thread Jo-Philipp Wich
Hi Timo,

did you try with kmod-ath10k or kmod-ath10k-ct ?

Might be worth trying the -ct variant, to see if it has the same issue.

~ Jo

___
Lede-dev mailing list
Lede-dev@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/lede-dev


[LEDE-DEV] [BUG] Kernel crashes with ath10k radio and Nexus 5X clients

2016-10-29 Thread Timo Sigurdsson
Hi everybody,

I have a TP-Link Archer C7 v2 running a fairly recent build of LEDE (r1952, 
Linux 4.4.26, compat-wireless-2016-10-08). It all works well except for the
fact that when I connect a Nexus 5X device to the 5GHz radio, the kernel or
ath10k driver will crash after a while. 5Ghz wifi will be dead after that
until I reboot the system.

This issue has been reported before [1] and it also has been declared as
solved with newer firmwares [2] (but reopened by other users). However, even
with the latest firmware 10.2.4.70.58 from Kalle Valo's Github repository the
issue is far from resolved. I have tried many different firmware revisions
over the time (more recently 10.2.4.70.56 and 10.2.4.70.54), and I can could
only find that the issue sometimes takes longer to trigger with some firmwares
(which might just be random), but it would always occur at some point with API
level 5 firmwares. With API level 2 firmwares (which I testesd when I was
still using OpenWrt 15.05) I never saw these crashes, but the Nexus 5X had
other connectivity issues with these older firmwares that made this
combination no fun to use either. But this shows that the firmware itself
makes the difference here.

I actually have two Nexus 5X on my network (my wife's and my own). I can
trigger the crash with either one of them. And if both Nexus 5X are connected
to the 5Ghz radio, then the issue triggers much faster (can be as low as 15
minutes). My workaround is to let the Nexus 5X devices only connect to the
2.4GHz radio. This way, the device can runs for weeks without any issue or
crash, but of course I would prefer the actual bug being fixed rather than to
circumvent it.

I'm appending a syslog from my access point with a crash happening while one
Nexus 5X was connected to the 5GHz radio starting from the time the system
booted. I randomized the MAC addresses. but left the first two characters
unique so different clients can be distinguished.

If there is more info I could collect to help identify the cause of this
issue, please let me know (and possibly how to do that as well).

Thank you and regards,

Timo

[1] http://lists.infradead.org/pipermail/ath10k/2015-November/006413.html
[2] https://dev.openwrt.org/ticket/20854

And here's the log:
Fri Oct 28 02:01:35 2016 kern.notice kernel: [0.00] Linux version 
4.4.26 (user@buildsystem) (gcc version 5.4.0 (LEDE GCC 5.4.0 r1952) ) #0 Fri 
Oct 21 15:52:28 2016
[...]
Fri Oct 28 02:01:35 2016 kern.info kernel: [9.468751] Loading modules 
backported from Linux version wt-2016-10-03-1-g6fcb1a6
Fri Oct 28 02:01:35 2016 kern.info kernel: [9.476481] Backport generated by 
backports.git backports-20160324-9-g0e38f5c
Fri Oct 28 02:01:35 2016 kern.warn kernel: [9.576570] PCI: Enabling device 
:01:00.0 ( -> 0002)
Fri Oct 28 02:01:35 2016 kern.info kernel: [9.582475] ath10k_pci 
:01:00.0: pci irq legacy oper_irq_mode 1 irq_mode 0 reset_mode 0
Fri Oct 28 02:01:35 2016 kern.warn kernel: [   10.087609] ath10k_pci 
:01:00.0: Direct firmware load for ath10k/pre-cal-pci-:01:00.0.bin 
failed with error -2
Fri Oct 28 02:01:35 2016 kern.warn kernel: [   10.098492] ath10k_pci 
:01:00.0: Falling back to user helper
Fri Oct 28 02:01:35 2016 kern.err kernel: [   10.176500] firmware 
ath10k!pre-cal-pci-:01:00.0.bin: firmware_loading_store: map pages failed
Fri Oct 28 02:01:35 2016 kern.info kernel: [   10.677026] ath10k_pci 
:01:00.0: qca988x hw2.0 target 0x4100016c chip_id 0x043202ff sub :
Fri Oct 28 02:01:35 2016 kern.info kernel: [   10.686424] ath10k_pci 
:01:00.0: kconfig debug 0 debugfs 1 tracing 0 dfs 1 testmode 1
Fri Oct 28 02:01:35 2016 kern.info kernel: [   10.699498] ath10k_pci 
:01:00.0: firmware ver 10.2.4.70.58 api 5 features no-p2p,raw-mode,mfp 
crc32 e1af076f
Fri Oct 28 02:01:35 2016 kern.warn kernel: [   10.709932] ath10k_pci 
:01:00.0: Direct firmware load for ath10k/QCA988X/hw2.0/board-2.bin failed 
with error -2
Fri Oct 28 02:01:35 2016 kern.warn kernel: [   10.720531] ath10k_pci 
:01:00.0: Falling back to user helper
Fri Oct 28 02:01:35 2016 kern.err kernel: [   10.798719] firmware 
ath10k!QCA988X!hw2.0!board-2.bin: firmware_loading_store: map pages failed
Fri Oct 28 02:01:35 2016 kern.info kernel: [   10.823845] ath10k_pci 
:01:00.0: board_file api 1 bmi_id N/A crc32 bebc7c08
Fri Oct 28 02:01:35 2016 kern.info kernel: [   11.954723] ath10k_pci 
:01:00.0: htt-ver 2.1 wmi-op 5 htt-op 2 cal file max-sta 128 raw 0 hwcrypto 
1
[...]
// Evertyhing runs fine for a bit more than a day, but then this happens ... //
// Note: The ath10k radio in question is wlan0 //
Sat Oct 29 10:38:21 2016 daemon.info hostapd: wlan1: STA 9c:12:34:56:78:9a IEEE 
802.11: disassociated
Sat Oct 29 10:38:22 2016 daemon.info hostapd: wlan1: STA 9c:12:34:56:78:9a IEEE 
802.11: deauthenticated due to inactivity (timer DEAUTH/REMOVE)
Sat Oct 29 10:42:57 2016 daemon.info hostapd: wlan1: STA 00:12:34:56:78:9a WPA: 
group key handshake