Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
On 22 September 2016 at 16:09, Rafał Miłeckiwrote: > On 22 September 2016 at 14:24, Rafał Miłecki wrote: >> I got the same problem again, but this time there was only 1 skb in my >> flowring. >> That resulted in less serial console messages and no reboot. >> >> It doesn't look good. brcmf_txfinalize was called 11 times after deleting >> flowring. Everytime it was working with the same skb that was already freed. > > I'm wondering if there is some bug in flowring code. Please take a > look if my following understanding it correct. > > 1) Every skb that is supposed to reach the device gets queued on a flowring. > 2) Dequeuing means skb is not available in a flowring anymore > 3a) brcmf_msgbuf_txflow dequeues skbs to pass them to the firmware > 3b) brcmf_flowring_delete dequeues skbs to delete (free) them > > If that simple description is correct, it shouldn't be possible to skb to: > 1) Reach the firmware > *and* > 2) Get freed as part of flowring deletion > > But this is what seems to be happening in my case. We get one skb > freed on flowring deletion and then reported as transmitted (sometimes > multiple times) by the firmware. > > Maybe there is simply a race between brcmf_flowring_delete and > brcmf_msgbuf_txflow? I can see a problem with my debugging code. It may happen brcmfmac gets skb allocated at the same address. Due to brcmf_flowring_delete bypassing brcmf_txfinalize we don't drop tracing of the old one and that results in the same packet listed twice in my debugging code. I got caught into the same trap that breaks pend_8021x_cnt counting. I believe pend_8021x_cnt related code should be moved to the place that really queues skb for the firmware to access. -- Rafał
Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
On 22 September 2016 at 14:24, Rafał Miłeckiwrote: > On 09/22/2016 01:59 PM, Rafał Miłecki wrote: >> >> And again... >> >> I decided to focus on brcmf_flowring_delete a bit more. >> >> As I can see flowrings are created and removed from time to time, in most >> cases >> they are empty when being deleted. When they are not, things go wrong. In >> below >> log you can see brcmfmac removing flowring that got 8 skb packets. >> >> First some summary, of what was already partially noticed in previous >> e-mail: >> 1) brcmf_flowring_delete calls brcmu_pkt_buf_free_skb directly bypassing >>brcmf_txfinalize. It means tracing 802.1x packets and counting TX >> errors is >>broken. >> 2) After freeing skb (as the result of brcmf_flowring_delete) there will >> still >>an IRQ, MSGBUF_TYPE_TX_STATUS and call to the brcmf_txfinalize. That's >>definitely unexpected as brcmf_txfinalize will be operating on skb that >> was >>already freed. >> >> Now, there is something new that looks even worse. Sometimes >> brcmf_txfinalize >> may be called multiple times for the same skb! Take a close look at my >> log. >> It was called twice for skb c64a5a80. Twice for c64a5480. Twice for >> c64a5f00. >> Then Linux/device rebooted itself due to amount of console messages. > > > I got the same problem again, but this time there was only 1 skb in my > flowring. > That resulted in less serial console messages and no reboot. > > It doesn't look good. brcmf_txfinalize was called 11 times after deleting > flowring. Everytime it was working with the same skb that was already freed. I'm wondering if there is some bug in flowring code. Please take a look if my following understanding it correct. 1) Every skb that is supposed to reach the device gets queued on a flowring. 2) Dequeuing means skb is not available in a flowring anymore 3a) brcmf_msgbuf_txflow dequeues skbs to pass them to the firmware 3b) brcmf_flowring_delete dequeues skbs to delete (free) them If that simple description is correct, it shouldn't be possible to skb to: 1) Reach the firmware *and* 2) Get freed as part of flowring deletion But this is what seems to be happening in my case. We get one skb freed on flowring deletion and then reported as transmitted (sometimes multiple times) by the firmware. Maybe there is simply a race between brcmf_flowring_delete and brcmf_msgbuf_txflow?
Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
On 09/22/2016 01:59 PM, Rafał Miłecki wrote: And again... I decided to focus on brcmf_flowring_delete a bit more. As I can see flowrings are created and removed from time to time, in most cases they are empty when being deleted. When they are not, things go wrong. In below log you can see brcmfmac removing flowring that got 8 skb packets. First some summary, of what was already partially noticed in previous e-mail: 1) brcmf_flowring_delete calls brcmu_pkt_buf_free_skb directly bypassing brcmf_txfinalize. It means tracing 802.1x packets and counting TX errors is broken. 2) After freeing skb (as the result of brcmf_flowring_delete) there will still an IRQ, MSGBUF_TYPE_TX_STATUS and call to the brcmf_txfinalize. That's definitely unexpected as brcmf_txfinalize will be operating on skb that was already freed. Now, there is something new that looks even worse. Sometimes brcmf_txfinalize may be called multiple times for the same skb! Take a close look at my log. It was called twice for skb c64a5a80. Twice for c64a5480. Twice for c64a5f00. Then Linux/device rebooted itself due to amount of console messages. I got the same problem again, but this time there was only 1 skb in my flowring. That resulted in less serial console messages and no reboot. It doesn't look good. brcmf_txfinalize was called 11 times after deleting flowring. Everytime it was working with the same skb that was already freed. After my DIR-885L and brcmfmac managed to survive that, my smartphone remained connected but I couldn't contact it (no reply to ping). I had to reconnect and then it was back OK. [ 2927.500112] brcmfmac: CONSOLE: 029651.820 wl0: Proxy STA 78:d6:f0:9b:ba:bc link is already gone !!?? [ 2927.509473] brcmfmac: [brcmf_msgbuf_process_flow_ring_delete_response -> __brcmf_msgbuf_remove_flowring] flowid:3 [ 2927.519800] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] flowid:3 ring:c7850540 [ 2927.529318] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] count:0 [ 2927.537526] brcmfmac: [brcmf_msgbuf_process_flow_ring_delete_response -> __brcmf_msgbuf_remove_flowring] flowid:34 [ 2927.547929] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] flowid:34 ring:c7850180 [ 2927.557533] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] count:0 [ 2927.559735] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] da:78:d6:f0:9b:ba:bc prio:0 ifidx:1 [ 2927.559739] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] flowid:3 [ 2927.584463] brcmfmac: [brcmf_msgbuf_process_flow_ring_create_response] flowid:3 status:0x [ 2927.593211] brcmfmac: [brcmf_msgbuf_process_flow_ring_delete_response -> __brcmf_msgbuf_remove_flowring] flowid:3 [ 2927.603538] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] flowid:3 ring:c658cb80 [ 2927.613059] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64d1240 skb->dev:c6b3f800 skb->dev->name:wlan1-1 [ 2927.626750] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] count:1 [ 2928.348341] brcmfmac: CONSOLE: 029652.655 wl0: wlc_ampdu_resp_timeout: cleaning up resp tid 0 waiting forseq 0xbb9 for 1000 ms [ 2930.887857] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] da:78:d6:f0:9b:ba:bc prio:0 ifidx:1 [ 2930.898340] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] flowid:3 [ 2930.906686] brcmfmac: [brcmf_msgbuf_process_flow_ring_create_response] flowid:3 status:0x [ 2933.982285] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c6b3fc80] ***BUG*** skb:c64d1240 skb->dev:c6b3f800 skb->dev->name:wlan1-1 [ 2933.995705] [ cut here ] [ 2934.000359] WARNING: CPU: 1 PID: 615 at compat-wireless-2016-06-20/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c:70 __brcmu_pkt_buf_free_skb+0x98/0x1ec [brcmfmac]() [ 2934.015949] Modules linked in: pppoe ppp_async iptable_nat brcmfmac pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_commed [ 2934.088284] CPU: 1 PID: 615 Comm: irq/33-brcmf_pc Not tainted 4.4.21 #0 [ 2934.094913] Hardware name: BCM5301X [ 2934.098403] Backtrace: [ 2934.100877] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 2934.108465] r7:0046 r6:bf1d1146 r5:6093 r4: [ 2934.114183] [] (show_stack) from [] (dump_stack+0x84/0xa4) [ 2934.121421] [] (dump_stack) from [] (warn_slowpath_common+0x8c/0xb8) [ 2934.129527] r5:0009 r4: [ 2934.133126] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x24/0x2c) [ 2934.141931] r8:a013 r7:c6b3fdd0 r6:bf1c9f2d r5:c6b3fc80 r4:c64d1240 [ 2934.148707] [] (warn_slowpath_null) from [] (__brcmu_pkt_buf_free_skb+0x98/0x1ec [brcmfmac]) [ 2934.158923] []
Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
And again... I decided to focus on brcmf_flowring_delete a bit more. As I can see flowrings are created and removed from time to time, in most cases they are empty when being deleted. When they are not, things go wrong. In below log you can see brcmfmac removing flowring that got 8 skb packets. First some summary, of what was already partially noticed in previous e-mail: 1) brcmf_flowring_delete calls brcmu_pkt_buf_free_skb directly bypassing brcmf_txfinalize. It means tracing 802.1x packets and counting TX errors is broken. 2) After freeing skb (as the result of brcmf_flowring_delete) there will still an IRQ, MSGBUF_TYPE_TX_STATUS and call to the brcmf_txfinalize. That's definitely unexpected as brcmf_txfinalize will be operating on skb that was already freed. Now, there is something new that looks even worse. Sometimes brcmf_txfinalize may be called multiple times for the same skb! Take a close look at my log. It was called twice for skb c64a5a80. Twice for c64a5480. Twice for c64a5f00. Then Linux/device rebooted itself due to amount of console messages. [ 242.242012] brcmfmac: [brcmf_msgbuf_process_flow_ring_delete_response -> __brcmf_msgbuf_remove_flowring] flowid:36 [ 242.252440] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] flowid:36 ring:c667fd40 [ 242.262043] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5780 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.275733] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5540 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.289420] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5180 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.303112] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5f00 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.316802] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a50c0 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.330488] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5480 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.344177] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5a80 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.357867] brcmfmac: [__brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c64a5900 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.371548] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] count:8 [ 242.379758] brcmfmac: [brcmf_msgbuf_process_flow_ring_delete_response -> __brcmf_msgbuf_remove_flowring] flowid:37 [ 242.390159] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] flowid:37 ring:c667fc80 [ 242.399752] brcmfmac: [__brcmf_msgbuf_remove_flowring -> __brcmf_flowring_delete] count:0 [ 242.433468] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] da:78:d6:f0:9b:ba:bc prio:0 ifidx:1 [ 242.443974] brcmfmac: [brcmf_msgbuf_flowring_create -> __brcmf_flowring_create] flowid:36 [ 242.452580] brcmfmac: [brcmf_msgbuf_process_flow_ring_create_response] flowid:36 status:0x [ 242.765056] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c647a480] ***BUG*** skb:c64a5900 skb->dev:c647a000 skb->dev->name:wlan1-1 [ 242.778479] [ cut here ] [ 242.783133] WARNING: CPU: 1 PID: 616 at compat-wireless-2016-06-20/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c:70 __brcmu_pkt_buf_free_skb+0x98/0x1ec [brcmfmac]() [ 242.798724] Modules linked in: pppoe ppp_async iptable_nat brcmfmac pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_commed [ 242.871057] CPU: 1 PID: 616 Comm: irq/33-brcmf_pc Not tainted 4.4.21 #0 [ 242.877686] Hardware name: BCM5301X [ 242.881176] Backtrace: [ 242.883651] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 242.891230] r7:0046 r6:bf1d1146 r5:6093 r4: [ 242.896946] [] (show_stack) from [] (dump_stack+0x84/0xa4) [ 242.904186] [] (dump_stack) from [] (warn_slowpath_common+0x8c/0xb8) [ 242.912293] r5:0009 r4: [ 242.915891] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x24/0x2c) [ 242.924696] r8:a013 r7:c647a5d0 r6:bf1c9f2d r5:c647a480 r4:c64a5900 [ 242.931468] [] (warn_slowpath_null) from [] (__brcmu_pkt_buf_free_skb+0x98/0x1ec [brcmfmac]) [ 242.941680] [] (__brcmu_pkt_buf_free_skb [brcmfmac]) from [] (__brcmf_txfinalize+0x19c/0x1d0 [brcmfmac]) [ 242.952914] r8:c04903c8 r7:c647a5d0 r6:c64a5900 r5:c648ca00 r4:c647a480 [ 242.959687] [] (__brcmf_txfinalize [brcmfmac]) from [] (brcmf_msgbuf_txdata+0x53c/0x6e8 [brcmfmac]) [ 242.970488] r10:c64a5900
Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
Hi Hante, I hit this problem again and I'm afraid it's getting even more complex. Last time you were suspecting flowring deletion but it didn't make much sense to me. It was because I didn't see brcmf_flowring_delete anywhere in my log. Well, today it was different. I saw brcmf_flowring_delete which makes me wonder if there is more than 1 source of this problem. Good news is that today I got few extra debugging messages. Bad news is I was experimenting with MAX_WAIT_FOR_8021X_TX. I added brcmf_netdev_wait_pend8021x to the brcmf_cfg80211_get_station and I was running while [ 1 ]; do iw dev wlan1-1 station get 88:53:2e:50:50:00 > /dev/null; done I hope this log may be a bit helpful anyway. So this time brcmf_flowring_delete was called indeed and I think there was a race in brcmfmac code. It seems brcmu_pkt_buf_free_skb was called twice for the same skb! For the first time it was called from brcmf_flowring_delete. We called dev_kfree_skb_any for that skb which means we shouldn't access it anymore. For the second time it (brcmu_pkt_buf_free_skb) was call from brcmf_txfinalize. Unfortunately when brcmf_txfinalize was analyzing that skb (that was already freed and invalid) it didn't contain ETH_P_PAE in the ethhdr anymore so atomic_dec(>pend_8021x_cnt); wasn't called. So my guess is that: 1) We should fix brcmf_flowring_delete to use brcmf_txfinalize 2) We should avoid freeing the same skb twice [ 167.596719] brcmfmac: [brcmf_cfg80211_del_key -> __send_key_to_dongle] ifp:c64c6480 brcmf_ifname(ifp):wlan1-1 [ 167.611257] brcmfmac: CONSOLE: 026896.770 wl0: Proxy STA 78:d6:f0:9b:ba:bc link is already gone !!?? [ 167.623375] brcmfmac: [brcmf_flowring_delete -> __brcmu_pkt_buf_free_skb] [ifp: (null)] Freeing skb:c7240240 skb->dev:c64c6000 skb->dev->name:wlan1-1 [ 167.640721] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c64c6480] ***BUG*** skb:c7240240 skb->dev:c64c6000 skb->dev->name:wlan1-1 [ 167.654247] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c64c6480] Original data: 78 d6 f0 9b ba bc 92 8d 78 66 3a 57 88 8e [ 167.658643] brcmfmac: [brcmf_netdev_wait_pend8021x] ***TIMEOUT WARNING*** ifp:c64c6480 brcmf_ifname(ifp):wlan1-1 brcmf_get_pend_8021x_cnt(ifp):1 [ 167.680249] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c64c6480] Current data: 88 53 2e 50 50 00 ba 87 01 e2 06 f8 08 00 [ 167.693239] [ cut here ] [ 167.697893] WARNING: CPU: 0 PID: 616 at compat-wireless-2016-06-20/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c:73 __brcmu_pkt_buf_free_skb+0x1a0/0x2f8 [brcmfmac]() [ 167.713587] Modules linked in: pppoe ppp_async iptable_nat brcmfmac pppox ppp_generic nf_nat_ipv4 nf_conntrack_ipv6 nf_conntrack_ipv4 ipt_REJECT ipt_MASQUERADE cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_id xt_conntrack xt_commed [ 167.785932] CPU: 0 PID: 616 Comm: irq/33-brcmf_pc Not tainted 4.4.21 #0 [ 167.792560] Hardware name: BCM5301X [ 167.796050] Backtrace: [ 167.798525] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 167.806112] r7:0049 r6:bf1d1255 r5:6013 r4: [ 167.811831] [] (show_stack) from [] (dump_stack+0x84/0xa4) [ 167.819069] [] (dump_stack) from [] (warn_slowpath_common+0x8c/0xb8) [ 167.827175] r5:0009 r4: [ 167.830774] [] (warn_slowpath_common) from [] (warn_slowpath_null+0x24/0x2c) [ 167.839579] r8:c64c653c r7:bf1c9f39 r6:c64c6480 r5:c7240240 r4:c69571c0 [ 167.846357] [] (warn_slowpath_null) from [] (__brcmu_pkt_buf_free_skb+0x1a0/0x2f8 [brcmfmac]) [ 167.856658] [] (__brcmu_pkt_buf_free_skb [brcmfmac]) from [] (__brcmf_txfinalize+0x184/0x1b4 [brcmfmac]) [ 167.867893] r8:cacd1720 r7:c723d180 r6:c7240240 r5:0001 r4:c64c6480 [ 167.874665] [] (__brcmf_txfinalize [brcmfmac]) from [] (brcmf_msgbuf_txdata+0x508/0x6b0 [brcmfmac]) [ 167.885466] r10:c7240240 r9:cacd1720 r8:cacd1720 r7:c723d180 r6:0003 r5:0001 [ 167.893352] r4:c6801c00 [ 167.895912] [] (brcmf_msgbuf_txdata [brcmfmac]) from [] (brcmf_proto_msgbuf_rx_trigger+0x3c/0xd0 [brcmfmac]) [ 167.907496] r10: r9:c0493882 r8:c68005e4 r7:c0057d98 r6:c78850c0 r5:0001 [ 167.915382] r4:c6801c00 [ 167.917941] [] (brcmf_proto_msgbuf_rx_trigger [brcmfmac]) from [] (brcmf_pcie_isr_thread+0x1c4/0x238 [brcmfmac]) [ 167.929874] r7:c0057d98 r6:c78850c0 r5:0001 r4:c6970800 [ 167.935587] [] (brcmf_pcie_isr_thread [brcmfmac]) from [] (irq_thread_fn+0x24/0x3c) [ 167.945002] r9:c0493882 r8:c68005e4 r7:c0057d98 r6:c78850c0 r5:c68005c0 r4:c68005c0 [ 167.952803] [] (irq_thread_fn) from [] (irq_thread+0xf8/0x1e4) [ 167.960381] r7:c0057d98 r6:c73f6000 r5:c68005c0 r4:c78850c0 [ 167.966089] [] (irq_thread) from [] (kthread+0xe0/0xf4) [ 167.973064] r10: r9: r8: r7:c0057fa8 r6:c68005c0 r5: [ 167.980948] r4:c6800b80 [ 167.983497] [] (kthread) from [] (ret_from_fork+0x14/0x3c) [ 167.990732]
Re: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
On 15 September 2016 at 11:20, Hante Meulemanwrote: > Thank you for the extensive debugging. We are looking into this. Arend wrote > yesterday to ask for detailed timing on wen eapol is inserted. We want this > so we can increase the timeout. This is not a "nice" way to solve the > problem, and it should be solved in firmware, but in the meanwhile we do > want to increase timer, because we think that ampdu issues can rise at any > given moment and even with changes/updates in firmware it might be necessary > to increase timeout. I'm kindly asking to keep replies in related threads :) I'm pretty sure above is about problem described in "AMPDU stalls with brcmfmac4366b-pcie.bin triggering WARNINGs". > Second problem is harder, it is good to see that the frame gets returned to > driver at some point. Our biggest worry is that a frame remains indefinitely > in the firmware, but that appears not to be the case. Now why could this > fail. There is one possible reason I found, and that is when a flowring is > deleted while it holds the eapol, see flowring.c. It does not call the > brcmf_txfinalize, but frees the packet directly. I think this is wrong but > need to investigate this in more detail. In the meanwhile, if you keep doing > tests I would like to ask you to add a WARN_ON() call to the function > __brcmu_pkt_buf_free_skb where you print ***BUG*** so we know where the > packet got freed from. Please take a look at my e-mail & log (& maybe diff) once again. You really quite missed the point. The function brcmf_txfinalize *was* called. I was describing it in my e-mail and there is a log: [ 1440.414653] brcmfmac: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] [ifp:c72e7c80] ***BUG*** skb:c70ddc00 skb->dev:c72e7800 skb->dev->name:wlan1-1 Above means that brcmf_txfinalize was called for skb c70ddc00 and it called brcmu_pkt_buf_free_skb. My debugging code noticed that it wasn't alright as this packet was still pending and pend_8021x_cnt wasn't decreased for him. Please note it was brcmf_txfinalize's fault (which was called for 100% sure). For some reason it didn't pass if (type == ETH_P_PAE) condition. I already described it and I shared my guess of firmware corrupting skb data. I'm now using debugging patch which prints copied and current content of skb data in case of fault. You're right I should have used WARN in my ***BUG*** place. It's a stupid habit from MIPS devices where backtraces aren't reliable. I printed mini call chain on my own instead. I mean this part: [__brcmf_txfinalize -> __brcmu_pkt_buf_free_skb] So please take a look at my e-mail again and let me know if it makes more sense now. What do you think about my guess of firmware corrupting skb data?
RE: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
Hi Rafal, Thank you for the extensive debugging. We are looking into this. Arend wrote yesterday to ask for detailed timing on wen eapol is inserted. We want this so we can increase the timeout. This is not a "nice" way to solve the problem, and it should be solved in firmware, but in the meanwhile we do want to increase timer, because we think that ampdu issues can rise at any given moment and even with changes/updates in firmware it might be necessary to increase timeout. Second problem is harder, it is good to see that the frame gets returned to driver at some point. Our biggest worry is that a frame remains indefinitely in the firmware, but that appears not to be the case. Now why could this fail. There is one possible reason I found, and that is when a flowring is deleted while it holds the eapol, see flowring.c. It does not call the brcmf_txfinalize, but frees the packet directly. I think this is wrong but need to investigate this in more detail. In the meanwhile, if you keep doing tests I would like to ask you to add a WARN_ON() call to the function __brcmu_pkt_buf_free_skb where you print ***BUG*** so we know where the packet got freed from. Regards, Hante -Original Message- From: Rafał Miłecki [mailto:zaj...@gmail.com] Sent: Thursday, September 15, 2016 10:12 AM To: Hante Meuleman; Arend van Spriel; brcm80211-dev-l...@broadcom.com Cc: linux-wireless@vger.kernel.org; Rafał Miłecki Subject: brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs Hi, Yesterday I explained on OpenWrt forum [0] that there are 2 problems leading to WARNINGs triggered by brcmf_netdev_wait_pend8021x. The first one is firmware problem with A-MPDU implementation. I already reported this in "AMPDU stalls with brcmfmac4366b-pcie.bin triggering WARNINGs" e-mail thread [1]. Another one (I'm reporting right now) is related to brcmfmac and its counting of 802.1x packets. The idea is simple: 1) In ndo_start_xmit callback there is check for ETH_P_PAE and code increasing 802.1x counter by 1. 2) In brcmf_txfinalize there is check for ETH_P_PAE and code decreasing 802.1x counter by 1. This is needed as some operations have to be handled without any 802.1x packet pending. Thanks to my debugging code (you can find it at the end) I just noticed that: 1) Sometimes (very rarely) brcmf_txfinalize doesn't detect 802.1x packet 2) brcmu_pkt_buf_free_skb gets called and skb gets freed 3) Counter remains not-decreased and brcmf_netdev_wait_pend8021x will always time out. I'm not sure why this could be happening. The check for ETH_P_PAE looks exactly the same in both places. My only idea so far is firmware corrupting skb data. This fools brcmfmac code and my debugging code was lucky enough to keep tracing skbs (by their addresses) to notice that. I'm going to extend my debugging patch by making a copy of eth header and comparing it with corrupted one. As this bug occurs very rarely it make take days or weeks to get any update. I'd love to hear any comment meanwhile. [0] https://forum.openwrt.org/viewtopic.php?pid=338235#p338235 [1] https://marc.info/?t=14738321621=1=2 [ 1438.965889] brcmfmac: CONSOLE: 028168.028 wl0.3: wlc_send_bar: seq 0xee tid 0 [ 1438.993255] brcmfmac: CONSOLE: 028168.055 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.000743] brcmfmac: CONSOLE: 028168.060 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.008060] brcmfmac: CONSOLE: 028168.067 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.015568] brcmfmac: CONSOLE: 028168.073 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.022786] brcmfmac: CONSOLE: 028168.077 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.030150] brcmfmac: CONSOLE: 028168.078 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.037342] brcmfmac: CONSOLE: 028168.081 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.044554] brcmfmac: CONSOLE: 028168.088 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.051756] brcmfmac: CONSOLE: 028168.090 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.362540] brcmfmac: CONSOLE: 028168.425 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.369827] brcmfmac: CONSOLE: 028168.431 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.378367] brcmfmac: CONSOLE: 028168.432 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.385590] brcmfmac: CONSOLE: 028168.434 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.392839] brcmfmac: CONSOLE: 028168.435 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.400041] brcmfmac: CONSOLE: 028168.438 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.407226] brcmfmac: CONSOLE: 028168.440 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.414848] brcmfmac: CONSOLE: 028168.447 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.422069] brcmfmac: CONSOLE: 028168.451 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.429269] brcmfmac: CONSOLE: 028168.454 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.436454] brcmfmac: CONSOLE: 028168.455 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.443653] brcmfmac: CONSOLE: 028168.457 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.450848] brcmfmac: CONSOLE: 028168.459 wl0.3: wlc_send
brcmf_txfinalize misses 802.1x packet leading to infinite WARNINGs
Hi, Yesterday I explained on OpenWrt forum [0] that there are 2 problems leading to WARNINGs triggered by brcmf_netdev_wait_pend8021x. The first one is firmware problem with A-MPDU implementation. I already reported this in "AMPDU stalls with brcmfmac4366b-pcie.bin triggering WARNINGs" e-mail thread [1]. Another one (I'm reporting right now) is related to brcmfmac and its counting of 802.1x packets. The idea is simple: 1) In ndo_start_xmit callback there is check for ETH_P_PAE and code increasing 802.1x counter by 1. 2) In brcmf_txfinalize there is check for ETH_P_PAE and code decreasing 802.1x counter by 1. This is needed as some operations have to be handled without any 802.1x packet pending. Thanks to my debugging code (you can find it at the end) I just noticed that: 1) Sometimes (very rarely) brcmf_txfinalize doesn't detect 802.1x packet 2) brcmu_pkt_buf_free_skb gets called and skb gets freed 3) Counter remains not-decreased and brcmf_netdev_wait_pend8021x will always time out. I'm not sure why this could be happening. The check for ETH_P_PAE looks exactly the same in both places. My only idea so far is firmware corrupting skb data. This fools brcmfmac code and my debugging code was lucky enough to keep tracing skbs (by their addresses) to notice that. I'm going to extend my debugging patch by making a copy of eth header and comparing it with corrupted one. As this bug occurs very rarely it make take days or weeks to get any update. I'd love to hear any comment meanwhile. [0] https://forum.openwrt.org/viewtopic.php?pid=338235#p338235 [1] https://marc.info/?t=14738321621=1=2 [ 1438.965889] brcmfmac: CONSOLE: 028168.028 wl0.3: wlc_send_bar: seq 0xee tid 0 [ 1438.993255] brcmfmac: CONSOLE: 028168.055 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.000743] brcmfmac: CONSOLE: 028168.060 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.008060] brcmfmac: CONSOLE: 028168.067 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.015568] brcmfmac: CONSOLE: 028168.073 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.022786] brcmfmac: CONSOLE: 028168.077 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.030150] brcmfmac: CONSOLE: 028168.078 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.037342] brcmfmac: CONSOLE: 028168.081 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.044554] brcmfmac: CONSOLE: 028168.088 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.051756] brcmfmac: CONSOLE: 028168.090 wl0.3: wlc_send_bar: seq 0xef tid 0 [ 1439.362540] brcmfmac: CONSOLE: 028168.425 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.369827] brcmfmac: CONSOLE: 028168.431 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.378367] brcmfmac: CONSOLE: 028168.432 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.385590] brcmfmac: CONSOLE: 028168.434 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.392839] brcmfmac: CONSOLE: 028168.435 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.400041] brcmfmac: CONSOLE: 028168.438 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.407226] brcmfmac: CONSOLE: 028168.440 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.414848] brcmfmac: CONSOLE: 028168.447 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.422069] brcmfmac: CONSOLE: 028168.451 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.429269] brcmfmac: CONSOLE: 028168.454 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.436454] brcmfmac: CONSOLE: 028168.455 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.443653] brcmfmac: CONSOLE: 028168.457 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.450848] brcmfmac: CONSOLE: 028168.459 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.458031] brcmfmac: CONSOLE: 028168.462 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.465229] brcmfmac: CONSOLE: 028168.465 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.472426] brcmfmac: CONSOLE: 028168.468 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.479623] brcmfmac: CONSOLE: 028168.471 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.486807] brcmfmac: CONSOLE: 028168.472 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.494008] brcmfmac: CONSOLE: 028168.474 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.501200] brcmfmac: CONSOLE: 028168.476 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.510826] brcmfmac: CONSOLE: 028168.479 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.518032] brcmfmac: CONSOLE: 028168.487 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.525252] brcmfmac: CONSOLE: 028168.489 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.532456] brcmfmac: CONSOLE: 028168.492 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.539647] brcmfmac: CONSOLE: 028168.497 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.755188] brcmfmac: CONSOLE: 028168.815 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.762642] brcmfmac: CONSOLE: 028168.819 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.769854] brcmfmac: CONSOLE: 028168.821 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.777087] brcmfmac: CONSOLE: 028168.823 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.784293] brcmfmac: CONSOLE: 028168.824 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.791861] brcmfmac: CONSOLE: 028168.826 wl0.3: wlc_send_bar: seq 0xf0 tid 0 [ 1439.799057] brcmfmac: CONSOLE: 028168.829