Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
Ben Greearwrites: > On 02/01/2018 02:47 PM, Johannes Berg wrote: >> On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote: >>> >>> The code does a plain rcu_dereference(), no locking other than >>> rcu_read_lock() involved. >>> >>> On second thought though, I'm not convinced that your modifications >>> caused the problem. >>> >>> Given your call stack, we'd expect rcu_read_lock() somewhere around >>> ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly >>> requires it. >>> >>> Normally, ieee80211_tx_dequeue() is called from various places that >>> probably come from mac80211 and already hold the rcu_read_lock(), e.g. >>> the wake_tx_queue op. >>> >>> In this case, you're coming from drv_sta_state, so not sure why the >>> driver thinks it's OK to call the dequeue there. >> >> Just to clarify - it could just be that in the "normal" case, when a >> station dies, there's nothing on the queues - so the dequeue just >> doesn't do anything and never goes into the code path where the >> rcu_dereference() is, hence it might be a bug in mainline that just >> never triggers in ordinary scenarios. > > It looks like the code in ath9k has not been changed in that area for > some time. Hmm, I think the issue here is that after the switch to mac80211 txqs, the driver is now draining mac80211 queues, whereas before it was only draining its own driver-internal queues (which are protected by the ath_txq_lock() that ath_tx_node_cleanup() does grab). And when the switch to the mac80211 queues happened, a call to rcu_read_lock() should have been added. But, well, I had no idea this was needed until just now, so obviously I didn't add that... :) > Would adding rcu_read_lock in drv_sta_state() be a possibility? Think it should probably just be added in ath_tx_node_cleanup()? Can send a patch... -Toke
Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
On 02/01/2018 02:47 PM, Johannes Berg wrote: On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote: The code does a plain rcu_dereference(), no locking other than rcu_read_lock() involved. On second thought though, I'm not convinced that your modifications caused the problem. Given your call stack, we'd expect rcu_read_lock() somewhere around ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly requires it. Normally, ieee80211_tx_dequeue() is called from various places that probably come from mac80211 and already hold the rcu_read_lock(), e.g. the wake_tx_queue op. In this case, you're coming from drv_sta_state, so not sure why the driver thinks it's OK to call the dequeue there. Just to clarify - it could just be that in the "normal" case, when a station dies, there's nothing on the queues - so the dequeue just doesn't do anything and never goes into the code path where the rcu_dereference() is, hence it might be a bug in mainline that just never triggers in ordinary scenarios. It looks like the code in ath9k has not been changed in that area for some time. Would adding rcu_read_lock in drv_sta_state() be a possibility? Thanks, Ben johannes -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
On Thu, 2018-02-01 at 23:40 +0100, Johannes Berg wrote: > > The code does a plain rcu_dereference(), no locking other than > rcu_read_lock() involved. > > On second thought though, I'm not convinced that your modifications > caused the problem. > > Given your call stack, we'd expect rcu_read_lock() somewhere around > ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly > requires it. > > Normally, ieee80211_tx_dequeue() is called from various places that > probably come from mac80211 and already hold the rcu_read_lock(), e.g. > the wake_tx_queue op. > > In this case, you're coming from drv_sta_state, so not sure why the > driver thinks it's OK to call the dequeue there. Just to clarify - it could just be that in the "normal" case, when a station dies, there's nothing on the queues - so the dequeue just doesn't do anything and never goes into the code path where the rcu_dereference() is, hence it might be a bug in mainline that just never triggers in ordinary scenarios. johannes
Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
On Thu, 2018-02-01 at 14:33 -0800, Ben Greear wrote: > > All of RCU is suspicious Heh. The code does a plain rcu_dereference(), no locking other than rcu_read_lock() involved. On second thought though, I'm not convinced that your modifications caused the problem. Given your call stack, we'd expect rcu_read_lock() somewhere around ath_tid_dequeue (or its caller(s)), since ieee80211_tx_dequeue clearly requires it. Normally, ieee80211_tx_dequeue() is called from various places that probably come from mac80211 and already hold the rcu_read_lock(), e.g. the wake_tx_queue op. In this case, you're coming from drv_sta_state, so not sure why the driver thinks it's OK to call the dequeue there. johannes
Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
On 02/01/2018 02:23 PM, Johannes Berg wrote: On Wed, 2018-01-31 at 16:53 -0800, Ben Greear wrote: Any idea why this might be complaining? 30917 Jan 31 15:21:01 2u-6n kernel: = 30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage well, that's why? :-) All of RCU is suspicious as far as my feeble brain is concerned...I was wondering *why* it was suspicious. I see you answered below... :) 30933 stack backtrace: 30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G W O4.13.16+ #2 30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 30936 Jan 31 15:21:01 2u-6n kernel: Call Trace: 30937 Jan 31 15:21:01 2u-6n kernel: dump_stack+0x85/0xc7 30938 Jan 31 15:21:01 2u-6n kernel: lockdep_rcu_suspicious+0xc5/0x100 30939 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_h_select_key+0x1c9/0x4e0 [mac80211] 30940 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_dequeue+0x376/0xca0 [mac80211] 30941 Jan 31 15:21:01 2u-6n kernel: ath_tid_dequeue+0x9c/0x110 [ath9k] 30942 Jan 31 15:21:01 2u-6n kernel: ath_tx_node_cleanup+0xa4/0x160 [ath9k] 30943 Jan 31 15:21:01 2u-6n kernel: ath9k_sta_state+0x6b/0x1e0 [ath9k] 30944 Jan 31 15:21:01 2u-6n kernel: drv_sta_state+0xad/0xa80 [mac80211] I'd argue your hacks are at fault - somewhere in your modifications around this call stack you lost the necessary rcu_read_lock() protection. Ok, I'll go looking for bugs like that. Thanks! --Ben -- Ben GreearCandela Technologies Inc http://www.candelatech.com
Re: lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
On Wed, 2018-01-31 at 16:53 -0800, Ben Greear wrote: > > Any idea why this might be complaining? > 30917 Jan 31 15:21:01 2u-6n kernel: = > 30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage well, that's why? :-) > 30933 stack backtrace: > 30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G > W O4.13.16+ #2 > 30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc > CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 > 30936 Jan 31 15:21:01 2u-6n kernel: Call Trace: > 30937 Jan 31 15:21:01 2u-6n kernel: dump_stack+0x85/0xc7 > 30938 Jan 31 15:21:01 2u-6n kernel: lockdep_rcu_suspicious+0xc5/0x100 > 30939 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_h_select_key+0x1c9/0x4e0 > [mac80211] > 30940 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_dequeue+0x376/0xca0 > [mac80211] > 30941 Jan 31 15:21:01 2u-6n kernel: ath_tid_dequeue+0x9c/0x110 [ath9k] > 30942 Jan 31 15:21:01 2u-6n kernel: ath_tx_node_cleanup+0xa4/0x160 [ath9k] > 30943 Jan 31 15:21:01 2u-6n kernel: ath9k_sta_state+0x6b/0x1e0 [ath9k] > 30944 Jan 31 15:21:01 2u-6n kernel: drv_sta_state+0xad/0xa80 [mac80211] I'd argue your hacks are at fault - somewhere in your modifications around this call stack you lost the necessary rcu_read_lock() protection. joahnnes
lockdep warning in mac80211/tx.c, 4.13.16+ kernel, ath9k related
Hello, This first splat comes from this code: static ieee80211_tx_result debug_noinline ieee80211_tx_h_select_key(struct ieee80211_tx_data *tx) { struct ieee80211_key *key; struct ieee80211_tx_info *info = IEEE80211_SKB_CB(tx->skb); struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)tx->skb->data; if (unlikely(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT)) tx->key = NULL; else if (tx->sta && ### line 605 ### (key = rcu_dereference(tx->sta->ptk[tx->sta->ptk_idx]))) tx->key = key; else if (ieee80211_is_group_privacy_action(tx->skb) && (key = rcu_dereference(tx->sdata->default_multicast_key))) tx->key = key; else if (ieee80211_is_mgmt(hdr->frame_control) && is_multicast_ether_addr(hdr->addr1) && ieee80211_is_robust_mgmt_frame(tx->skb) && (key = rcu_dereference(tx->sdata->default_mgmt_key))) tx->key = key; else if (is_multicast_ether_addr(hdr->addr1) && (key = rcu_dereference(tx->sdata->default_multicast_key))) tx->key = key; else if (!is_multicast_ether_addr(hdr->addr1) && ### line 619 ### (key = rcu_dereference(tx->sdata->default_unicast_key))) tx->key = key; else tx->key = NULL; This ath9k has some local modifications, including removal of the airtime fairness logic that was breaking my test case very quickly, so could be my fault of course. Full tree is here: https://github.com/greearb/linux-ct-4.13 Any idea why this might be complaining? Test case is to bring up 200 virtual stations on each of 6 radios and then randomly restart the stations and/or APs they are connected to. I'm trying to shake out stability bugs and such... 30917 Jan 31 15:21:01 2u-6n kernel: = 30918 Jan 31 15:21:01 2u-6n kernel: WARNING: suspicious RCU usage 30919 Jan 31 15:21:01 2u-6n kernel: 4.13.16+ #2 Tainted: GW O 30920 Jan 31 15:21:01 2u-6n kernel: - 30921 Jan 31 15:21:01 2u-6n kernel: /home/greearb/git/linux-4.13.dev.y/net/mac80211/tx.c:605 suspicious rcu_dereference_check() usage! 30922 Jan 31 15:21:01 2u-6n kernel: 30923 other info that might help us debug this: 30924 Jan 31 15:21:01 2u-6n kernel: 30925 rcu_scheduler_active = 2, debug_locks = 1 30926 Jan 31 15:21:01 2u-6n kernel: 5 locks held by ip/19628: 30927 Jan 31 15:21:01 2u-6n kernel: #0: (rtnl_mutex){+.+.+.}, at: [] rtnetlink_rcv+0x16/0x30 30928 Jan 31 15:21:01 2u-6n kernel: #1: (>mtx){+.+.+.}, at: [] cfg80211_leave+0x1d/0x40 [cfg80211] 30929 Jan 31 15:21:01 2u-6n kernel: #2: (>sta_mtx){+.+.+.}, at: [] __sta_info_flush+0x7c/0x170 [mac80211] 30930 Jan 31 15:21:01 2u-6n kernel: #3: (&(>axq_lock)->rlock){+.-...}, at: [] ath_tx_node_cleanup+0x66/0x160 [ath9k] 30931 Jan 31 15:21:01 2u-6n kernel: #4: (&(>lock)->rlock){+.-...}, at: [] ieee80211_tx_dequeue+0x45/0xca0 [mac80211] 30932 Jan 31 15:21:01 2u-6n kernel: 30933 stack backtrace: 30934 Jan 31 15:21:01 2u-6n kernel: CPU: 1 PID: 19628 Comm: ip Tainted: G W O4.13.16+ #2 30935 Jan 31 15:21:01 2u-6n kernel: Hardware name: Iron_Systems,Inc CS-CAD-2U-A02/X10SRL-F, BIOS 2.0b 05/02/2017 30936 Jan 31 15:21:01 2u-6n kernel: Call Trace: 30937 Jan 31 15:21:01 2u-6n kernel: dump_stack+0x85/0xc7 30938 Jan 31 15:21:01 2u-6n kernel: lockdep_rcu_suspicious+0xc5/0x100 30939 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_h_select_key+0x1c9/0x4e0 [mac80211] 30940 Jan 31 15:21:01 2u-6n kernel: ieee80211_tx_dequeue+0x376/0xca0 [mac80211] 30941 Jan 31 15:21:01 2u-6n kernel: ath_tid_dequeue+0x9c/0x110 [ath9k] 30942 Jan 31 15:21:01 2u-6n kernel: ath_tx_node_cleanup+0xa4/0x160 [ath9k] 30943 Jan 31 15:21:01 2u-6n kernel: ath9k_sta_state+0x6b/0x1e0 [ath9k] 30944 Jan 31 15:21:01 2u-6n kernel: drv_sta_state+0xad/0xa80 [mac80211] 30945 Jan 31 15:21:01 2u-6n kernel: __sta_info_destroy_part2+0x178/0x1d0 [mac80211] 30946 Jan 31 15:21:01 2u-6n kernel: __sta_info_flush+0xef/0x170 [mac80211] 30947 Jan 31 15:21:01 2u-6n kernel: ieee80211_set_disassoc+0xc6/0x400 [mac80211] 30948 Jan 31 15:21:01 2u-6n kernel: ieee80211_mgd_deauth+0x2f6/0x830 [mac80211] 30949 Jan 31 15:21:01 2u-6n kernel: ieee80211_deauth+0x13/0x20 [mac80211] 30950 Jan 31 15:21:01 2u-6n kernel: cfg80211_mlme_deauth+0x16b/0x3e0 [cfg80211] 30951 Jan 31 15:21:01 2u-6n kernel: cfg80211_mlme_down+0x6d/0xa0 [cfg80211] 30952 Jan 31 15:21:01 2u-6n kernel: cfg80211_disconnect+0x2f4/0x3f0 [cfg80211] 30953 Jan 31 15:21:01 2u-6n kernel: ? kfree+0x24d/0x2b0 30954 Jan 31 15:21:01 2u-6n kernel: __cfg80211_leave+0x134/0x190 [cfg80211] 30955 Jan 31 15:21:01 2u-6n kernel: cfg80211_leave+0x28/0x40 [cfg80211] 30956 Jan 31 15:21:01 2u-6n kernel: