Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-02 Thread Toke Høiland-Jørgensen
Ben Greear  writes:

> On 02/01/2018 02:47 PM, Johannes Berg wrote:
>> On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote:
>>>
>>> The code does a plain rcu_dereference(), no locking other than
>>> rcu_read_lock() involved.
>>>
>>> On second thought though, I'm not convinced that your modifications
>>> caused the problem.
>>>
>>> Given your call stack, we'd expect rcu_read_lock() somewhere around
>>> ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly
>>> requires it.
>>>
>>> Normally, ieee80211_tx_dequeue() is called from various places that
>>> probably come from mac80211 and already hold the rcu_read_lock(), e.g.
>>> the wake_tx_queue op.
>>>
>>> In this case, you're coming from drv_sta_state, so not sure why the
>>> driver thinks it's OK to call the dequeue there.
>>
>> Just to clarify - it could just be that in the "normal" case, when a
>> station dies, there's nothing on the queues - so the dequeue just
>> doesn't do anything and never goes into the code path where the
>> rcu_dereference() is, hence it might be a bug in mainline that just
>> never triggers in ordinary scenarios.
>
> It looks like the code in ath9k has not been changed in that area for
> some time.

Hmm, I think the issue here is that after the switch to mac80211 txqs,
the driver is now draining mac80211 queues, whereas before it was only
draining its own driver-internal queues (which are protected by the
ath_txq_lock() that ath_tx_node_cleanup() does grab). And when the
switch to the mac80211 queues happened, a call to rcu_read_lock() should
have been added. But, well, I had no idea this was needed until just
now, so obviously I didn't add that... :)

> Would adding rcu_read_lock in drv_sta_state() be a possibility?

Think it should probably just be added in ath_tx_node_cleanup()? Can
send a patch...

-Toke


Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-01 Thread Ben Greear

On 02/01/2018 02:47 PM, Johannes Berg wrote:

On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote:


The code does a plain rcu_dereference(), no locking other than
rcu_read_lock() involved.

On second thought though, I'm not convinced that your modifications
caused the problem.

Given your call stack, we'd expect rcu_read_lock() somewhere around
ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly
requires it.

Normally, ieee80211_tx_dequeue() is called from various places that
probably come from mac80211 and already hold the rcu_read_lock(), e.g.
the wake_tx_queue op.

In this case, you're coming from drv_sta_state, so not sure why the
driver thinks it's OK to call the dequeue there.


Just to clarify - it could just be that in the "normal" case, when a
station dies, there's nothing on the queues - so the dequeue just
doesn't do anything and never goes into the code path where the
rcu_dereference() is, hence it might be a bug in mainline that just
never triggers in ordinary scenarios.


It looks like the code in ath9k has not been changed in that area for
some time.

Would adding rcu_read_lock in drv_sta_state() be a possibility?

Thanks,
Ben



johannes




--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-01 Thread Johannes Berg
On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote:
> 
> The code does a plain rcu_dereference(), no locking other than
> rcu_read_lock() involved.
> 
> On second thought though, I'm not convinced that your modifications
> caused the problem.
> 
> Given your call stack, we'd expect rcu_read_lock() somewhere around
> ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly
> requires it.
> 
> Normally, ieee80211_tx_dequeue() is called from various places that
> probably come from mac80211 and already hold the rcu_read_lock(), e.g.
> the wake_tx_queue op.
> 
> In this case, you're coming from drv_sta_state, so not sure why the
> driver thinks it's OK to call the dequeue there.

Just to clarify - it could just be that in the "normal" case, when a
station dies, there's nothing on the queues - so the dequeue just
doesn't do anything and never goes into the code path where the
rcu_dereference() is, hence it might be a bug in mainline that just
never triggers in ordinary scenarios.

johannes


Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-01 Thread Johannes Berg
On Thu, 2018-02-01 at 14:33 -0800, Ben Greear wrote:
> 
> All of RCU is suspicious

Heh.

The code does a plain rcu_dereference(), no locking other than
rcu_read_lock() involved.

On second thought though, I'm not convinced that your modifications
caused the problem.

Given your call stack, we'd expect rcu_read_lock() somewhere around
ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly
requires it.

Normally, ieee80211_tx_dequeue() is called from various places that
probably come from mac80211 and already hold the rcu_read_lock(), e.g.
the wake_tx_queue op.

In this case, you're coming from drv_sta_state, so not sure why the
driver thinks it's OK to call the dequeue there.

johannes


Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-01 Thread Ben Greear

On 02/01/2018 02:23 PM, Johannes Berg wrote:

On Wed, 2018-01-31 at 16:53 -0800, Ben Greear wrote:


Any idea why this might be complaining?



  30917 Jan 31 15:21:01 2u-6n kernel: =
  30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage


well, that's why? :-)


All of RCU is suspicious as far as my feeble brain is concerned...I was 
wondering *why*
it was suspicious.  I see you answered below... :)




  30933   stack backtrace:
  30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G 
   W  O4.13.16+ #2
  30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc 
CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
  30936 Jan 31 15:21:01 2u-6n kernel: Call Trace:
  30937 Jan 31 15:21:01 2u-6n kernel:  dump_stack+0x85/0xc7
  30938 Jan 31 15:21:01 2u-6n kernel:  lockdep_rcu_suspicious+0xc5/0x100
  30939 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_h_select_key+0x1c9/0x4e0 
[mac80211]
  30940 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_dequeue+0x376/0xca0 
[mac80211]
  30941 Jan 31 15:21:01 2u-6n kernel:  ath_tid_dequeue+0x9c/0x110 [ath9k]
  30942 Jan 31 15:21:01 2u-6n kernel:  ath_tx_node_cleanup+0xa4/0x160 [ath9k]
  30943 Jan 31 15:21:01 2u-6n kernel:  ath9k_sta_state+0x6b/0x1e0 [ath9k]
  30944 Jan 31 15:21:01 2u-6n kernel:  drv_sta_state+0xad/0xa80 [mac80211]


I'd argue your hacks are at fault - somewhere in your modifications
around this call stack you lost the necessary rcu_read_lock()
protection.


Ok, I'll go looking for bugs like that.

Thanks!
--Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-02-01 Thread Johannes Berg
On Wed, 2018-01-31 at 16:53 -0800, Ben Greear wrote:
> 
> Any idea why this might be complaining?

>   30917 Jan 31 15:21:01 2u-6n kernel: =
>   30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage

well, that's why? :-)

>   30933   stack backtrace:
>   30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G   
>  W  O4.13.16+ #2
>   30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc 
> CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
>   30936 Jan 31 15:21:01 2u-6n kernel: Call Trace:
>   30937 Jan 31 15:21:01 2u-6n kernel:  dump_stack+0x85/0xc7
>   30938 Jan 31 15:21:01 2u-6n kernel:  lockdep_rcu_suspicious+0xc5/0x100
>   30939 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_h_select_key+0x1c9/0x4e0 
> [mac80211]
>   30940 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_dequeue+0x376/0xca0 
> [mac80211]
>   30941 Jan 31 15:21:01 2u-6n kernel:  ath_tid_dequeue+0x9c/0x110 [ath9k]
>   30942 Jan 31 15:21:01 2u-6n kernel:  ath_tx_node_cleanup+0xa4/0x160 [ath9k]
>   30943 Jan 31 15:21:01 2u-6n kernel:  ath9k_sta_state+0x6b/0x1e0 [ath9k]
>   30944 Jan 31 15:21:01 2u-6n kernel:  drv_sta_state+0xad/0xa80 [mac80211]

I'd argue your hacks are at fault - somewhere in your modifications
around this call stack you lost the necessary rcu_read_lock()
protection.

joahnnes


lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related

2018-01-31 Thread Ben Greear

Hello,

This first splat comes from this code:

static ieee80211_tx_result debug_noinline
ieee80211_tx_h_select_key(struct ieee80211_tx_data *tx)
{
struct ieee80211_key *key;
struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb);
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data;

if (unlikely(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT))
tx->key = NULL;
else if (tx->sta &&
### line 605 ### (key = 
rcu_dereference(tx->sta->ptk[tx->sta->ptk_idx])))
tx->key = key;
else if (ieee80211_is_group_privacy_action(tx->skb) &&
(key = rcu_dereference(tx->sdata->default_multicast_key)))
tx->key = key;
else if (ieee80211_is_mgmt(hdr->frame_control) &&
 is_multicast_ether_addr(hdr->addr1) &&
 ieee80211_is_robust_mgmt_frame(tx->skb) &&
 (key = rcu_dereference(tx->sdata->default_mgmt_key)))
tx->key = key;
else if (is_multicast_ether_addr(hdr->addr1) &&
 (key = rcu_dereference(tx->sdata->default_multicast_key)))
tx->key = key;
else if (!is_multicast_ether_addr(hdr->addr1) &&
### line 619 ### (key = 
rcu_dereference(tx->sdata->default_unicast_key)))
tx->key = key;
else
tx->key = NULL;

This ath9k has some local modifications, including removal of the airtime 
fairness logic
that was breaking my test case very quickly, so could be my fault of course.

Full tree is here:

https://github.com/greearb/linux-ct-4.13


Any idea why this might be complaining?

Test case is to bring up 200 virtual stations on each of 6 radios and then 
randomly
restart the stations and/or APs they are connected to.  I'm trying to shake out
stability bugs and such...


 30917 Jan 31 15:21:01 2u-6n kernel: =
 30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage
 30919 Jan 31 15:21:01 2u-6n kernel: 4.13.16+ #2 Tainted: GW  O
 30920 Jan 31 15:21:01 2u-6n kernel: -
 30921 Jan 31 15:21:01 2u-6n kernel: 
/home/greearb/git/linux-4.13.dev.y/net/mac80211/tx.c:605 suspicious 
rcu_dereference_check() usage!
 30922 Jan 31 15:21:01 2u-6n kernel:
 30923   other info that might help us debug this:
 30924 Jan 31 15:21:01 2u-6n kernel:
 30925   rcu_scheduler_active = 2, debug_locks = 1
 30926 Jan 31 15:21:01 2u-6n kernel: 5 locks held by ip/19628:
 30927 Jan 31 15:21:01 2u-6n kernel:  #0:  (rtnl_mutex){+.+.+.}, at: 
[] rtnetlink_rcv+0x16/0x30
 30928 Jan 31 15:21:01 2u-6n kernel:  #1:  (>mtx){+.+.+.}, at: 
[] cfg80211_leave+0x1d/0x40 [cfg80211]
 30929 Jan 31 15:21:01 2u-6n kernel:  #2:  (>sta_mtx){+.+.+.}, at: 
[] __sta_info_flush+0x7c/0x170 [mac80211]
 30930 Jan 31 15:21:01 2u-6n kernel:  #3:  (&(>axq_lock)->rlock){+.-...}, at: 
[] ath_tx_node_cleanup+0x66/0x160 [ath9k]
 30931 Jan 31 15:21:01 2u-6n kernel:  #4:  (&(>lock)->rlock){+.-...}, at: 
[] ieee80211_tx_dequeue+0x45/0xca0 [mac80211]
 30932 Jan 31 15:21:01 2u-6n kernel:
 30933   stack backtrace:
 30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G  
  W  O4.13.16+ #2
 30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc 
CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017
 30936 Jan 31 15:21:01 2u-6n kernel: Call Trace:
 30937 Jan 31 15:21:01 2u-6n kernel:  dump_stack+0x85/0xc7
 30938 Jan 31 15:21:01 2u-6n kernel:  lockdep_rcu_suspicious+0xc5/0x100
 30939 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_h_select_key+0x1c9/0x4e0 
[mac80211]
 30940 Jan 31 15:21:01 2u-6n kernel:  ieee80211_tx_dequeue+0x376/0xca0 
[mac80211]
 30941 Jan 31 15:21:01 2u-6n kernel:  ath_tid_dequeue+0x9c/0x110 [ath9k]
 30942 Jan 31 15:21:01 2u-6n kernel:  ath_tx_node_cleanup+0xa4/0x160 [ath9k]
 30943 Jan 31 15:21:01 2u-6n kernel:  ath9k_sta_state+0x6b/0x1e0 [ath9k]
 30944 Jan 31 15:21:01 2u-6n kernel:  drv_sta_state+0xad/0xa80 [mac80211]
 30945 Jan 31 15:21:01 2u-6n kernel:  __sta_info_destroy_part2+0x178/0x1d0 
[mac80211]
 30946 Jan 31 15:21:01 2u-6n kernel:  __sta_info_flush+0xef/0x170 [mac80211]
 30947 Jan 31 15:21:01 2u-6n kernel:  ieee80211_set_disassoc+0xc6/0x400 
[mac80211]
 30948 Jan 31 15:21:01 2u-6n kernel:  ieee80211_mgd_deauth+0x2f6/0x830 
[mac80211]
 30949 Jan 31 15:21:01 2u-6n kernel:  ieee80211_deauth+0x13/0x20 [mac80211]
 30950 Jan 31 15:21:01 2u-6n kernel:  cfg80211_mlme_deauth+0x16b/0x3e0 
[cfg80211]
 30951 Jan 31 15:21:01 2u-6n kernel:  cfg80211_mlme_down+0x6d/0xa0 [cfg80211]
 30952 Jan 31 15:21:01 2u-6n kernel:  cfg80211_disconnect+0x2f4/0x3f0 [cfg80211]
 30953 Jan 31 15:21:01 2u-6n kernel:  ? kfree+0x24d/0x2b0
 30954 Jan 31 15:21:01 2u-6n kernel:  __cfg80211_leave+0x134/0x190 [cfg80211]
 30955 Jan 31 15:21:01 2u-6n kernel:  cfg80211_leave+0x28/0x40 [cfg80211]
 30956 Jan 31 15:21:01 2u-6n kernel: