Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-11 Thread Ben Greear

On 06/10/2018 10:10 AM, Michał Kazior wrote:

Ben,

The patch is symptomatic. fq_tin_dequeue() already checks if the list
is empty before it tries to access first entry. I see no point in
using the _or_null() + WARN_ON.

The 0x3c deref is likely an offset off of NULL base pointer. Did you
check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it
point to?


gdb pointed to one line above the flow dereference, which is why I was
going to put some debugging in there.



I suspect there's not enough synchronization between quescing the
device/ath10k after fw crashes and performing mac80211's reconfig
procedure.


I am already running this patch which helps with some of that.  That
patch never made it upstream, but it fixed problems for me earlier.

https://patchwork.kernel.org/patch/9457639/

Could easily be there are some more issues in that logic.

Someone else posted a patch to disable mac-80211 tx when FW crashes,
I think...I have not tried to backport that.

https://patchwork.kernel.org/patch/10411967/

Thanks,
Ben





Michał

On 8 June 2018 at 23:40, Arend van Spriel  wrote:

On 6/8/2018 5:17 PM, Ben Greear wrote:

I recalled an email from Michał leaving tieto so adding his alternate email
he provided back then.

Gr. AvS



On 06/07/2018 04:59 PM, Cong Wang wrote:


On Thu, Jun 7, 2018 at 4:48 PM,   wrote:


diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..cb911f0 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
return NULL;
}

-   flow = list_first_entry(head, struct fq_flow, flowchain);
+   flow = list_first_entry_or_null(head, struct fq_flow,
flowchain);
+
+   if (WARN_ON_ONCE(!flow))
+   return NULL;



This does not make sense either. list_first_entry_or_null()
returns NULL only when the list is empty, but we already check
list_empty() right before this code, and it is protected by fq->lock.



Hello Michal,

git blame shows you as the author of the fq_impl.h code.

I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
kernel.  There was an apparent
mostly-null deref in the fq_tin_dequeue method.  According to gdb, it
was within
1 line of the dereference of 'flow'.

My hack above is probably not that useful.  Cong thinks maybe the
locking is bad.

If you get a chance, please review this thread and see if you have any
ideas for
a better fix (or better debugging code).

As always, if you would like me to generate you a buggy firmware that
will crash
in the tx path and cause all sorts of mayhem in the ath10k driver and
wifi stack,
I will be happy to do so.

https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html

Thanks,
Ben







--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-08 Thread Eric Dumazet



On 06/08/2018 07:10 AM, Ben Greear wrote:
> Maybe whoever put this code together can take a stab at it.
> 

This was one one the motivation for the Fixes: tag request.

By doing a git blame, you can find which commit(s) added this code,
and thus CC the author, who might not follow netdev@ closely.



Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-08 Thread Ben Greear




On 06/07/2018 05:13 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 4:48 PM,   wrote:

From: Ben Greear 

While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well.  One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.

I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.

BUG: unable to handle kernel NULL pointer dereference at 003c
IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]


Instead of adding WARN_ON(), you need to think about
the locking there, it is suspicious:

fq is from struct ieee80211_local:

struct fq *fq = >fq;

tin is from struct txq_info:

struct fq_tin *tin = >tin;

I don't know if fq and tin are supposed to be 1:1, if not there is
a bug in the locking, because ->new_flows and ->old_flows are
both inside tin instead of fq, but they are protected by fq->lock


Maybe whoever put this code together can take a stab at it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-08 Thread Ben Greear




On 06/07/2018 04:59 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 4:48 PM,   wrote:

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..cb911f0 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
return NULL;
}

-   flow = list_first_entry(head, struct fq_flow, flowchain);
+   flow = list_first_entry_or_null(head, struct fq_flow, flowchain);
+
+   if (WARN_ON_ONCE(!flow))
+   return NULL;


This does not make sense either. list_first_entry_or_null()
returns NULL only when the list is empty, but we already check
list_empty() right before this code, and it is protected by fq->lock.



Nevermind then.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com


Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 4:48 PM,   wrote:
> From: Ben Greear 
>
> While testing an ath10k firmware that often crashed under load,
> I was seeing kernel crashes as well.  One of them appeared to
> be a dereference of a NULL flow object in fq_tin_dequeue.
>
> I have since fixed the firmware flaw, but I think it would be
> worth adding the WARN_ON in case the problem appears again.
>
> BUG: unable to handle kernel NULL pointer dereference at 003c
> IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]

Instead of adding WARN_ON(), you need to think about
the locking there, it is suspicious:

fq is from struct ieee80211_local:

struct fq *fq = >fq;

tin is from struct txq_info:

struct fq_tin *tin = >tin;

I don't know if fq and tin are supposed to be 1:1, if not there is
a bug in the locking, because ->new_flows and ->old_flows are
both inside tin instead of fq, but they are protected by fq->lock


Re: [PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 4:48 PM,   wrote:
> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
> index be7c0fa..cb911f0 100644
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
> return NULL;
> }
>
> -   flow = list_first_entry(head, struct fq_flow, flowchain);
> +   flow = list_first_entry_or_null(head, struct fq_flow, flowchain);
> +
> +   if (WARN_ON_ONCE(!flow))
> +   return NULL;

This does not make sense either. list_first_entry_or_null()
returns NULL only when the list is empty, but we already check
list_empty() right before this code, and it is protected by fq->lock.


[PATCH v2] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread greearb
From: Ben Greear 

While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well.  One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.

I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.

BUG: unable to handle kernel NULL pointer dereference at 003c
IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
PGD 8001417fe067 P4D 8001417fe067 PUD 13db41067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 
libcrc32c vrf 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen lm78 ]
CPU: 2 PID: 21733 Comm: ip Tainted: GW  O 4.16.8+ #35
Hardware name: _ _/, BIOS 5.11 08/26/2016
RIP: 0010:ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
RSP: 0018:880172d03c30 EFLAGS: 00010286
RAX: 88013b2c RBX: 88013b2c00b8 RCX: 0898
RDX: 0001 RSI: 88013b2c00d8 RDI: 88016ac40820
RBP: 88016ac42ba0 R08: 0020 R09: 
R10: 0010 R11: 001256c89fd8 R12: 88013b2c
R13: 88013b2c00d8 R14:  R15: 88013b2c00d8
FS:  7f04e3606700() GS:880172d0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 003c CR3: 00013b35a005 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 ? update_load_avg+0x607/0x6f0
 ath10k_mac_tx_push_txq+0x6e/0x220 [ath10k_core]
 ath10k_mac_tx_push_pending+0x151/0x1e0 [ath10k_core]
 ath10k_htt_txrx_compl_task+0x113e/0x1940 [ath10k_core]
 ? ath10k_ce_completed_send_next_nolock+0x6f/0x90 [ath10k_pci]
 ? ath10k_ce_completed_send_next+0x31/0x40 [ath10k_pci]
 ? ath10k_pci_htc_tx_cb+0x30/0xc0 [ath10k_pci]
 ? ath10k_bus_pci_write32+0x3c/0xa0 [ath10k_pci]
 ath10k_pci_napi_poll+0x44/0xf0 [ath10k_pci]
 net_rx_action+0x250/0x3b0
 __do_softirq+0xc2/0x2c2
 irq_exit+0x93/0xa0
 do_IRQ+0x45/0xc0
 common_interrupt+0xf/0xf
 

Signed-off-by: Ben Greear 
---

* v2:  Use list_first_entry_or_null as suggested.

 include/net/fq_impl.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..cb911f0 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
return NULL;
}
 
-   flow = list_first_entry(head, struct fq_flow, flowchain);
+   flow = list_first_entry_or_null(head, struct fq_flow, flowchain);
+
+   if (WARN_ON_ONCE(!flow))
+   return NULL;
 
if (flow->deficit <= 0) {
flow->deficit += fq->quantum;
-- 
2.4.11