Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 02:52 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear  wrote:

On 06/07/2018 02:29 PM, Cong Wang wrote:


On Thu, Jun 7, 2018 at 9:06 AM,   wrote:


--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+



How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().



I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.

I guess possibly tin or fq was passed in as NULL?


A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.



Ahh, I see what you mean, and that is my mistake.  In my case, it did seem to
be a mostly-null deref, not a 0x0 deref.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 2:41 PM, Ben Greear  wrote:
> On 06/07/2018 02:29 PM, Cong Wang wrote:
>>
>> On Thu, Jun 7, 2018 at 9:06 AM,   wrote:
>>>
>>> --- a/include/net/fq_impl.h
>>> +++ b/include/net/fq_impl.h
>>> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>>>
>>> flow = list_first_entry(head, struct fq_flow, flowchain);
>>>
>>> +   if (WARN_ON_ONCE(!flow))
>>> +   return NULL;
>>> +
>>
>>
>> How could even possibly list_first_entry() returns NULL?
>> You need list_first_entry_or_null().
>>
>
> I don't know for certain flow as null, but something was NULL in this method
> near that line and it looked like a likely culprit.
>
> I guess possibly tin or fq was passed in as NULL?

A NULL pointer is not always 0. You can trigger a NULL-ptr-def with 0x3c
too, but you are checking against 0 in your patch, that is the problem and
that is why list_first_entry_or_null() exists.


Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 02:29 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 9:06 AM,   wrote:

--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+


How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().



I don't know for certain flow as null, but something was NULL in this method
near that line and it looked like a likely culprit.

I guess possibly tin or fq was passed in as NULL?

Anyway, if the patch seems worthless just ignore it.  I'll leave it in my tree
since it should be harmless and will let you know if I ever hit it.

If someone else hits a similar crash, hopefully they can report it.

Thanks,
Ben

--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Cong Wang
On Thu, Jun 7, 2018 at 9:06 AM,   wrote:
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>
> flow = list_first_entry(head, struct fq_flow, flowchain);
>
> +   if (WARN_ON_ONCE(!flow))
> +   return NULL;
> +

How could even possibly list_first_entry() returns NULL?
You need list_first_entry_or_null().


Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Ben Greear

On 06/07/2018 09:17 AM, Eric Dumazet wrote:



On 06/07/2018 09:06 AM, gree...@candelatech.com wrote:

From: Ben Greear 

While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well.  One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.

I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.

 common_interrupt+0xf/0xf
 



Please find the exact commit that brought this bug,
and add a corresponding Fixes: tag


It will be a total pain to bisect this problem since my test
case that causes this is running my modified firmware (and a buggy one at that),
modified ath10k driver (to work with this firmware and support my test case 
easily),
and the failure case appears to cause multiple different-but-probably-related
crashes and often hangs or reboots the test system.

Probably this is all caused by some nasty race or buggy logic related to
dealing with a crashed ath10k firmware tearing down txq logic from the
bottom up.  There have been many such bugs in the past, I and others fixed a 
few,
and very likely more remain.

For what it is worth, I didn't see this crash in 4.13, and I spent some time
testing buggy firmware there occasionally.

If someone else has interest in debugging the ath10k driver, I will be happy to 
generate
a mostly-stock firmware image with ability to crash in the TX path and give it 
to them.
It will crash the stock upstream code reliably in my experience.

Thanks,
Ben





Signed-off-by: Ben Greear 
---
 include/net/fq_impl.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..e40354d 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,

flow = list_first_entry(head, struct fq_flow, flowchain);

+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+
if (flow->deficit <= 0) {
flow->deficit += fq->quantum;
list_move_tail(>flowchain,






--
Ben Greear 
Candela Technologies Inc  http://www.candelatech.com



Re: [PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread Eric Dumazet



On 06/07/2018 09:06 AM, gree...@candelatech.com wrote:
> From: Ben Greear 
> 
> While testing an ath10k firmware that often crashed under load,
> I was seeing kernel crashes as well.  One of them appeared to
> be a dereference of a NULL flow object in fq_tin_dequeue.
> 
> I have since fixed the firmware flaw, but I think it would be
> worth adding the WARN_ON in case the problem appears again.
> 
>  common_interrupt+0xf/0xf
>  
> 

Please find the exact commit that brought this bug,
and add a corresponding Fixes: tag

> Signed-off-by: Ben Greear 
> ---
>  include/net/fq_impl.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
> index be7c0fa..e40354d 100644
> --- a/include/net/fq_impl.h
> +++ b/include/net/fq_impl.h
> @@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
>  
>   flow = list_first_entry(head, struct fq_flow, flowchain);
>  
> + if (WARN_ON_ONCE(!flow))
> + return NULL;
> +
>   if (flow->deficit <= 0) {
>   flow->deficit += fq->quantum;
>   list_move_tail(>flowchain,
> 


[PATCH] net-fq: Add WARN_ON check for null flow.

2018-06-07 Thread greearb
From: Ben Greear 

While testing an ath10k firmware that often crashed under load,
I was seeing kernel crashes as well.  One of them appeared to
be a dereference of a NULL flow object in fq_tin_dequeue.

I have since fixed the firmware flaw, but I think it would be
worth adding the WARN_ON in case the problem appears again.

BUG: unable to handle kernel NULL pointer dereference at 003c
IP: ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
PGD 8001417fe067 P4D 8001417fe067 PUD 13db41067 PMD 0
Oops:  [#1] PREEMPT SMP PTI
Modules linked in: nf_conntrack_netlink nf_conntrack nfnetlink nf_defrag_ipv4 
libcrc32c vrf 8021q garp mrp stp llc fuse macvlan wanlink(O) pktgen lm78 ]
CPU: 2 PID: 21733 Comm: ip Tainted: GW  O 4.16.8+ #35
Hardware name: _ _/, BIOS 5.11 08/26/2016
RIP: 0010:ieee80211_tx_dequeue+0xfb/0xb10 [mac80211]
RSP: 0018:880172d03c30 EFLAGS: 00010286
RAX: 88013b2c RBX: 88013b2c00b8 RCX: 0898
RDX: 0001 RSI: 88013b2c00d8 RDI: 88016ac40820
RBP: 88016ac42ba0 R08: 0020 R09: 
R10: 0010 R11: 001256c89fd8 R12: 88013b2c
R13: 88013b2c00d8 R14:  R15: 88013b2c00d8
FS:  7f04e3606700() GS:880172d0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 003c CR3: 00013b35a005 CR4: 003606e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 
 ? update_load_avg+0x607/0x6f0
 ath10k_mac_tx_push_txq+0x6e/0x220 [ath10k_core]
 ath10k_mac_tx_push_pending+0x151/0x1e0 [ath10k_core]
 ath10k_htt_txrx_compl_task+0x113e/0x1940 [ath10k_core]
 ? ath10k_ce_completed_send_next_nolock+0x6f/0x90 [ath10k_pci]
 ? ath10k_ce_completed_send_next+0x31/0x40 [ath10k_pci]
 ? ath10k_pci_htc_tx_cb+0x30/0xc0 [ath10k_pci]
 ? ath10k_bus_pci_write32+0x3c/0xa0 [ath10k_pci]
 ath10k_pci_napi_poll+0x44/0xf0 [ath10k_pci]
 net_rx_action+0x250/0x3b0
 __do_softirq+0xc2/0x2c2
 irq_exit+0x93/0xa0
 do_IRQ+0x45/0xc0
 common_interrupt+0xf/0xf
 

Signed-off-by: Ben Greear 
---
 include/net/fq_impl.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..e40354d 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -80,6 +80,9 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
 
flow = list_first_entry(head, struct fq_flow, flowchain);
 
+   if (WARN_ON_ONCE(!flow))
+   return NULL;
+
if (flow->deficit <= 0) {
flow->deficit += fq->quantum;
list_move_tail(>flowchain,
-- 
2.4.11