Re: [OpenWrt-Devel] atomic sleep bugs - 19.07 (and probably Master too)

2019-08-02 Thread Hauke Mehrtens
On 8/2/19 10:40 AM, Koen Vandeputte wrote:
> 
> On 01.08.19 17:27, Koen Vandeputte wrote:
>> Hi All,
>>
>> I've been playing around the last few days stresstesting latest 19.07
>> on different targets (ar71xx, cns3xxx, imx6, ...) with extra kernel
>> debug features enabled.
>>
>> I'll post some results here as maybe somebody has a clue. :)
>>
>>
>> Some interesting splats already showed up, actually also *breaking*
>> some functionality while the board is running:
>>
>> on Mikrotik RB2011 (ar71xx)
>>
>>

..

> 
> [   13.152502] eth0: link down
> [   13.155063] BUG: sleeping function called from invalid context at
> net/core/dev.c:5563
> [   13.161685] in_atomic(): 1, irqs_disabled(): 1, pid: 456, name: ip
> [   13.167848] 2 locks held by ip/456:
> [   13.171315]  #0:  (rtnl_mutex){}, at: [<8032aba4>]
> rtnetlink_rcv_msg+0x2d8/0x380
> [   13.179039]  #1:  (&(>lock)->rlock){}, at: [<802e3c40>]
> ag71xx_hw_disable+0x24/0x94
> [   13.187385] CPU: 0 PID: 456 Comm: ip Not tainted 4.14.134 #0
> [   13.193019] Stack : 805a 80547894 8051de14 831c98ec 805f
> 805f 83918efc 805733a7
> [   13.201348] 80517e44 01c8 805f38bc 831c9ccc 8390d0c0
> 0001 831c98a0 b0502ae3
> [   13.209681]   80ae 831c979c 6138f004
>  0007 
> [   13.218016] 0091 b340 0090  8000
> 839c9d8c 839c9db0 0001
> [   13.226347] 80428b7c 831c9ccc 8390d0c0 83b51210 0003
>   805f
> [   13.234680] ...
> [   13.237114] Call Trace:
> [   13.239556] [<8006c88c>] show_stack+0x58/0x100
> [   13.244006] [<800aab74>] ___might_sleep+0x100/0x120
> [   13.248853] [<8030fb84>] napi_disable+0x30/0xd8
> [   13.253354] [<802e3c80>] ag71xx_hw_disable+0x64/0x94
> [   13.258304] [<802e3cd4>] ag71xx_stop+0x24/0x38
> [   13.262731] [<8030d580>] __dev_close_many+0xcc/0x104
> [   13.267697] [<803165fc>] __dev_change_flags+0xc8/0x1ac
> [   13.272801] [<80316708>] dev_change_flags+0x28/0x70
> [   13.277662] [<80329fc0>] do_setlink+0x31c/0x91c
> [   13.282178] [<8032ca90>] rtnl_newlink+0x3ec/0x7f8
> [   13.286865] [<8032abc8>] rtnetlink_rcv_msg+0x2fc/0x380
> [   13.292019] [<8034de64>] netlink_rcv_skb+0xd4/0x178
> [   13.296849] [<8034d440>] netlink_unicast+0x168/0x250
> [   13.301797] [<8034da04>] netlink_sendmsg+0x3d8/0x434
> [   13.306753] [<802f2774>] ___sys_sendmsg+0x1dc/0x290
> [   13.311607] [<802f37d0>] __sys_sendmsg+0x54/0x84
> [   13.316225] [<80071eac>] syscall_common+0x34/0x58

There is a bug in ag71xx_hw_disable(), it calls napi_disable() while
holding a spinlock in the same function which is not allowed. I never
looked into ag71xx before, someone should fix this code and make it call
napi_disable() outside of the spinlock.

Hauke



signature.asc
Description: OpenPGP digital signature
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [OpenWrt-Devel] atomic sleep bugs - 19.07 (and probably Master too)

2019-08-02 Thread Koen Vandeputte


On 01.08.19 17:27, Koen Vandeputte wrote:

Hi All,

I've been playing around the last few days stresstesting latest 19.07 
on different targets (ar71xx, cns3xxx, imx6, ...) with extra kernel 
debug features enabled.


I'll post some results here as maybe somebody has a clue. :)


Some interesting splats already showed up, actually also *breaking* 
some functionality while the board is running:


on Mikrotik RB2011 (ar71xx)


[   16.885207] eth0: link down
[   16.919752] BUG: sleeping function called from invalid context at 
net/core/dev.c:5563

[   17.013669] in_atomic(): 1, irqs_disabled(): 1, pid: 463, name: ip
[   17.087839] 2 locks held by ip/463:
[   17.129668]  #0:  (rtnl_mutex){}, at: [<80378814>] 
rtnetlink_rcv_msg+0x2d8/0x380
[   17.222617]  #1:  (&(>lock)->rlock){}, at: [<80331900>] 
ag71xx_hw_disable+0x24/0x94

[   17.322878] CPU: 0 PID: 463 Comm: ip Not tainted 4.14.134 #0
[   17.390782] Stack : 805e 8058aefc 80558ed0 876278ec 8061 
8061 87d1b2fc 805b7367
[   17.491032] 80552f04 01cf 8061386c 87627ccc 87d5b180 
0001 876278a0 6ae07578
[   17.591283]   80b0 8762779c 0a55a891 
 0007 
[   17.691535] 0099 8f9b5648 0098  8000 
87d6e58c 87d6e5b0 0001
[   17.791785] 8047095c 87627ccc 87d5b180 87d9ae10 0003 
802cfa54  8061

[   17.892036] ...
[   17.921352] Call Trace:
[   17.950676] [<8006cb0c>] show_stack+0x58/0x100
[   18.003996] [<800aadf4>] ___might_sleep+0x100/0x120
[   18.062512] [<8035d7f4>] napi_disable+0x30/0xd8
[   18.116854] [<80331940>] ag71xx_hw_disable+0x64/0x94
[   18.176421] [<80331994>] ag71xx_stop+0x24/0x38
[   18.229729] [<8035b1f0>] __dev_close_many+0xcc/0x104
[   18.289306] [<8036426c>] __dev_change_flags+0xc8/0x1ac
[   18.350950] [<80364378>] dev_change_flags+0x28/0x70
[   18.409473] [<80377c30>] do_setlink+0x31c/0x91c
[   18.463826] [<8037a700>] rtnl_newlink+0x3ec/0x7f8
[   18.520261] [<80378838>] rtnetlink_rcv_msg+0x2fc/0x380
[   18.581938] [<8039bac4>] netlink_rcv_skb+0xd4/0x178
[   18.640439] [<8039b0a0>] netlink_unicast+0x168/0x250
[   18.76] [<8039b664>] netlink_sendmsg+0x3d8/0x434
[   18.759580] [<803404a4>] ___sys_sendmsg+0x1dc/0x290
[   18.818098] [<80341500>] __sys_sendmsg+0x54/0x84
[   18.873504] [<8007212c>] syscall_common+0x34/0x58


on Mikrotik RB-912 (ar71xx), using ath9k combined with a USB modem 
together.  When disabling ath9k or unplugging the modem, these bugs 
don't appear

Here, the USB modem stops receiving data after a few seconds:


[   37.165493] wlan0: Trigger new scan to find an IBSS to join
[   37.171546] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:112
[   37.180006] in_atomic(): 1, irqs_disabled(): 1, pid: 9, name: 
kworker/u2:1

[   37.187110] 6 locks held by kworker/u2:1/9:
[   37.191434]  #0:  ("%s"wiphy_name(local->hw.wiphy)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   37.201298]  #1:  ((>work)){}, at: [<8009f89c>] 
process_one_work+0x250/0x4c0
[   37.209604]  #2:  (>mtx){}, at: [<82917130>] 
ieee80211_ibss_work+0x40/0x5a4 [mac80211]
[   37.219077]  #3:  (>mtx){}, at: [<8290d914>] 
ieee80211_request_ibss_scan+0x4c/0x2b8 [mac80211]
[   37.229246]  #4:  (>mutex){}, at: [<82ae4878>] 
ath9k_ps_restore+0xd54/0x11dc [ath9k]
[   37.238159]  #5:  (_desc_lock_class){}, at: [<800c29d8>] 
__irq_get_desc_lock+0x8c/0xb0

[   37.247113] CPU: 0 PID: 9 Comm: kworker/u2:1 Tainted: G W 4.14.134 #0
[   37.255282] Workqueue: phy0 ieee80211_ibss_leave [mac80211]
[   37.261055] Stack : 82870080 82870158 82860d60 800c0fa0 8061 
 0001 805b
[   37.269743] 80552e7c 8384db24 0348 800c1e04 82860d60 
 8384db00 21bc325c
[   37.278434]   0006 8384d9d4 acbbbd0d 
 0007 
[   37.287125] 0132 805c 0131   
828626d8 828703c8 0348
[   37.295816] 82870080 82870158 82860d60 828703c8 0002 
802cfa54  8061

[   37.304508] ...
[   37.307055] Call Trace:
[   37.309601] [<8006cb0c>] show_stack+0x58/0x100
[   37.314219] [<800aadf4>] ___might_sleep+0x100/0x120
[   37.319280] [<800c3344>] synchronize_irq+0x3c/0xa0
[   37.324245] [<800c6594>] __irq_disable+0x64/0xb4
[   37.329024] [<800c35f4>] __disable_irq_nosync+0x3c/0x68
[   37.334434] [<800c363c>] disable_irq+0x14/0x38
[   37.339195] [<82ae5b48>] ath9k_calculate_summary_state+0x53c/0x6f4 
[ath9k]

[   37.357669] ttyS ttyS0: 1 input overrun(s)
[   38.170588] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:112
[   38.179059] in_atomic(): 1, irqs_disabled(): 1, pid: 9, name: 
kworker/u2:1

[   38.186167] 5 locks held by kworker/u2:1/9:
[   38.190492]  #0:  ("%s"wiphy_name(local->hw.wiphy)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   38.200365]  #1: ((&(>scan_work)->work)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   

[OpenWrt-Devel] atomic sleep bugs - 19.07 (and probably Master too)

2019-08-01 Thread Koen Vandeputte

Hi All,

I've been playing around the last few days stresstesting latest 19.07 on 
different targets (ar71xx, cns3xxx, imx6, ...) with extra kernel debug 
features enabled.


I'll post some results here as maybe somebody has a clue. :)


Some interesting splats already showed up, actually also *breaking* some 
functionality while the board is running:


on Mikrotik RB2011 (ar71xx)


[   16.885207] eth0: link down
[   16.919752] BUG: sleeping function called from invalid context at 
net/core/dev.c:5563

[   17.013669] in_atomic(): 1, irqs_disabled(): 1, pid: 463, name: ip
[   17.087839] 2 locks held by ip/463:
[   17.129668]  #0:  (rtnl_mutex){}, at: [<80378814>] 
rtnetlink_rcv_msg+0x2d8/0x380
[   17.222617]  #1:  (&(>lock)->rlock){}, at: [<80331900>] 
ag71xx_hw_disable+0x24/0x94

[   17.322878] CPU: 0 PID: 463 Comm: ip Not tainted 4.14.134 #0
[   17.390782] Stack : 805e 8058aefc 80558ed0 876278ec 8061 
8061 87d1b2fc 805b7367
[   17.491032] 80552f04 01cf 8061386c 87627ccc 87d5b180 
0001 876278a0 6ae07578
[   17.591283]   80b0 8762779c 0a55a891 
 0007 
[   17.691535] 0099 8f9b5648 0098  8000 
87d6e58c 87d6e5b0 0001
[   17.791785] 8047095c 87627ccc 87d5b180 87d9ae10 0003 
802cfa54  8061

[   17.892036] ...
[   17.921352] Call Trace:
[   17.950676] [<8006cb0c>] show_stack+0x58/0x100
[   18.003996] [<800aadf4>] ___might_sleep+0x100/0x120
[   18.062512] [<8035d7f4>] napi_disable+0x30/0xd8
[   18.116854] [<80331940>] ag71xx_hw_disable+0x64/0x94
[   18.176421] [<80331994>] ag71xx_stop+0x24/0x38
[   18.229729] [<8035b1f0>] __dev_close_many+0xcc/0x104
[   18.289306] [<8036426c>] __dev_change_flags+0xc8/0x1ac
[   18.350950] [<80364378>] dev_change_flags+0x28/0x70
[   18.409473] [<80377c30>] do_setlink+0x31c/0x91c
[   18.463826] [<8037a700>] rtnl_newlink+0x3ec/0x7f8
[   18.520261] [<80378838>] rtnetlink_rcv_msg+0x2fc/0x380
[   18.581938] [<8039bac4>] netlink_rcv_skb+0xd4/0x178
[   18.640439] [<8039b0a0>] netlink_unicast+0x168/0x250
[   18.76] [<8039b664>] netlink_sendmsg+0x3d8/0x434
[   18.759580] [<803404a4>] ___sys_sendmsg+0x1dc/0x290
[   18.818098] [<80341500>] __sys_sendmsg+0x54/0x84
[   18.873504] [<8007212c>] syscall_common+0x34/0x58


on Mikrotik RB-912 (ar71xx), using ath9k combined with a USB modem 
together.  When disabling ath9k or unplugging the modem, these bugs 
don't appear

Here, the USB modem stops receiving data after a few seconds:


[   37.165493] wlan0: Trigger new scan to find an IBSS to join
[   37.171546] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:112
[   37.180006] in_atomic(): 1, irqs_disabled(): 1, pid: 9, name: 
kworker/u2:1

[   37.187110] 6 locks held by kworker/u2:1/9:
[   37.191434]  #0:  ("%s"wiphy_name(local->hw.wiphy)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   37.201298]  #1:  ((>work)){}, at: [<8009f89c>] 
process_one_work+0x250/0x4c0
[   37.209604]  #2:  (>mtx){}, at: [<82917130>] 
ieee80211_ibss_work+0x40/0x5a4 [mac80211]
[   37.219077]  #3:  (>mtx){}, at: [<8290d914>] 
ieee80211_request_ibss_scan+0x4c/0x2b8 [mac80211]
[   37.229246]  #4:  (>mutex){}, at: [<82ae4878>] 
ath9k_ps_restore+0xd54/0x11dc [ath9k]
[   37.238159]  #5:  (_desc_lock_class){}, at: [<800c29d8>] 
__irq_get_desc_lock+0x8c/0xb0
[   37.247113] CPU: 0 PID: 9 Comm: kworker/u2:1 Tainted: G W   
4.14.134 #0

[   37.255282] Workqueue: phy0 ieee80211_ibss_leave [mac80211]
[   37.261055] Stack : 82870080 82870158 82860d60 800c0fa0 8061 
 0001 805b
[   37.269743] 80552e7c 8384db24 0348 800c1e04 82860d60 
 8384db00 21bc325c
[   37.278434]   0006 8384d9d4 acbbbd0d 
 0007 
[   37.287125] 0132 805c 0131   
828626d8 828703c8 0348
[   37.295816] 82870080 82870158 82860d60 828703c8 0002 
802cfa54  8061

[   37.304508] ...
[   37.307055] Call Trace:
[   37.309601] [<8006cb0c>] show_stack+0x58/0x100
[   37.314219] [<800aadf4>] ___might_sleep+0x100/0x120
[   37.319280] [<800c3344>] synchronize_irq+0x3c/0xa0
[   37.324245] [<800c6594>] __irq_disable+0x64/0xb4
[   37.329024] [<800c35f4>] __disable_irq_nosync+0x3c/0x68
[   37.334434] [<800c363c>] disable_irq+0x14/0x38
[   37.339195] [<82ae5b48>] ath9k_calculate_summary_state+0x53c/0x6f4 
[ath9k]

[   37.357669] ttyS ttyS0: 1 input overrun(s)
[   38.170588] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:112
[   38.179059] in_atomic(): 1, irqs_disabled(): 1, pid: 9, name: 
kworker/u2:1

[   38.186167] 5 locks held by kworker/u2:1/9:
[   38.190492]  #0:  ("%s"wiphy_name(local->hw.wiphy)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   38.200365]  #1: ((&(>scan_work)->work)){}, at: 
[<8009f89c>] process_one_work+0x250/0x4c0
[   38.209925]  #2:  (>mtx){}, at: