On 06/10/2018 10:10 AM, Michał Kazior wrote:
Ben,

The patch is symptomatic. fq_tin_dequeue() already checks if the list
is empty before it tries to access first entry. I see no point in
using the _or_null() + WARN_ON.

The 0x3c deref is likely an offset off of NULL base pointer. Did you
check gdb/addr2line of the ieee80211_tx_dequeue+0xfb? Where did it
point to?

gdb pointed to one line above the flow dereference, which is why I was
going to put some debugging in there.


I suspect there's not enough synchronization between quescing the
device/ath10k after fw crashes and performing mac80211's reconfig
procedure.

I am already running this patch which helps with some of that.  That
patch never made it upstream, but it fixed problems for me earlier.

https://patchwork.kernel.org/patch/9457639/

Could easily be there are some more issues in that logic.

Someone else posted a patch to disable mac-80211 tx when FW crashes,
I think...I have not tried to backport that.

https://patchwork.kernel.org/patch/10411967/

Thanks,
Ben




Michał

On 8 June 2018 at 23:40, Arend van Spriel <arend.vanspr...@broadcom.com> wrote:
On 6/8/2018 5:17 PM, Ben Greear wrote:

I recalled an email from Michał leaving tieto so adding his alternate email
he provided back then.

Gr. AvS


On 06/07/2018 04:59 PM, Cong Wang wrote:

On Thu, Jun 7, 2018 at 4:48 PM,  <gree...@candelatech.com> wrote:

diff --git a/include/net/fq_impl.h b/include/net/fq_impl.h
index be7c0fa..cb911f0 100644
--- a/include/net/fq_impl.h
+++ b/include/net/fq_impl.h
@@ -78,7 +78,10 @@ static struct sk_buff *fq_tin_dequeue(struct fq *fq,
                        return NULL;
        }

-       flow = list_first_entry(head, struct fq_flow, flowchain);
+       flow = list_first_entry_or_null(head, struct fq_flow,
flowchain);
+
+       if (WARN_ON_ONCE(!flow))
+               return NULL;


This does not make sense either. list_first_entry_or_null()
returns NULL only when the list is empty, but we already check
list_empty() right before this code, and it is protected by fq->lock.


Hello Michal,

git blame shows you as the author of the fq_impl.h code.

I saw a crash when debugging funky ath10k firmware in a 4.16 + hacks
kernel.  There was an apparent
mostly-null deref in the fq_tin_dequeue method.  According to gdb, it
was within
1 line of the dereference of 'flow'.

My hack above is probably not that useful.  Cong thinks maybe the
locking is bad.

If you get a chance, please review this thread and see if you have any
ideas for
a better fix (or better debugging code).

As always, if you would like me to generate you a buggy firmware that
will crash
in the tx path and cause all sorts of mayhem in the ath10k driver and
wifi stack,
I will be happy to do so.

https://www.mail-archive.com/netdev@vger.kernel.org/msg239738.html

Thanks,
Ben




--
Ben Greear <gree...@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

Reply via email to