Re: nfs NULL-dereferencing in net-next

2016-11-10 Thread Anna Schumaker
On 11/10/2016 09:47 AM, Olaf Hering wrote:
> On Thu, Nov 03, Anna Schumaker wrote:
> 
>> Aww, I was hoping that patch would work.  It still seemed to fix some
>> issues for me when mounting multiple servers, so I'm planning to keep
>> it.  Unfortunately I'm out of town this week, so I haven't had much of
>> a chance to keep poking at this issue.  I should be able to get back
>> to it next week!
> 
> Is this supposed to be fixed already? I get an oops in
> rpc_clnt_xprt_switch_has_addr+0xc/0x40 with 4.9.0-rc4.

Not yet, sorry.  I'm waiting on one bugfix patch before sending my pull request.

Thanks for waiting patiently,
Anna

> 
> Olaf
> 



Re: nfs NULL-dereferencing in net-next

2016-11-10 Thread Olaf Hering
On Thu, Nov 03, Anna Schumaker wrote:

> Aww, I was hoping that patch would work.  It still seemed to fix some
> issues for me when mounting multiple servers, so I'm planning to keep
> it.  Unfortunately I'm out of town this week, so I haven't had much of
> a chance to keep poking at this issue.  I should be able to get back
> to it next week!

Is this supposed to be fixed already? I get an oops in
rpc_clnt_xprt_switch_has_addr+0xc/0x40 with 4.9.0-rc4.

Olaf


signature.asc
Description: PGP signature


Re: nfs NULL-dereferencing in net-next

2016-11-03 Thread Anna Schumaker
On 10/31/2016 11:25 AM, Jakub Kicinski wrote:
> On Thu, 27 Oct 2016 06:50:22 +, Yotam Gigi wrote:
>>> -Original Message-
>>> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
>>> Sent: Wednesday, October 26, 2016 9:17 PM
>>> To: Jakub Kicinski <kubak...@wp.pl>
>>> Cc: Yotam Gigi <yot...@mellanox.com>; Andy Adamson <and...@netapp.com>;
>>> linux-...@vger.kernel.org; netdev@vger.kernel.org; Trond Myklebust
>>> <trond.mykleb...@netapp.com>; Yotam Gigi <yotam...@gmail.com>; mlxsw
>>> <ml...@mellanox.com>
>>> Subject: Re: nfs NULL-dereferencing in net-next
>>>
>>> On 10/26/2016 02:08 PM, Jakub Kicinski wrote:  
>>>> On Wed, 26 Oct 2016 16:15:24 +, Yotam Gigi wrote:  
>>>>>> -Original Message-
>>>>>> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
>>>>>> Sent: Wednesday, October 26, 2016 5:40 PM
>>>>>> To: Yotam Gigi <yot...@mellanox.com>; Jakub Kicinski <kubak...@wp.pl>;  
>>> Andy  
>>>>>> Adamson <and...@netapp.com>; Anna Schumaker
>>>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>>>>> Cc: netdev@vger.kernel.org; Trond Myklebust  
>>> <trond.mykleb...@netapp.com>;  
>>>>>> Yotam Gigi <yotam...@gmail.com>; mlxsw <ml...@mellanox.com>
>>>>>> Subject: Re: nfs NULL-dereferencing in net-next
>>>>>>
>>>>>> On 10/25/2016 01:19 PM, Yotam Gigi wrote:  
>>>>>>>  
>>>>>>>> -Original Message-
>>>>>>>> From: netdev-ow...@vger.kernel.org [mailto:netdev-  
>>> ow...@vger.kernel.org]  
>>>>>> On  
>>>>>>>> Behalf Of Jakub Kicinski
>>>>>>>> Sent: Monday, October 17, 2016 10:20 PM
>>>>>>>> To: Andy Adamson <and...@netapp.com>; Anna Schumaker
>>>>>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>>>>>>> Cc: netdev@vger.kernel.org; Trond Myklebust  
>>>>>> <trond.mykleb...@netapp.com>  
>>>>>>>> Subject: nfs NULL-dereferencing in net-next
>>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>>>>>>>> ("fsl/fman: fix error return code in mac_probe()").  
>>>>>>>
>>>>>>>
>>>>>>> I see the same thing. It happens constantly on some of my machines, 
>>>>>>> making  
>>>>>> them  
>>>>>>> completely unusable.
>>>>>>>
>>>>>>> I bisected it and got to the commit:
>>>>>>>
>>>>>>> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
>>>>>>> Author: Andy Adamson <and...@netapp.com>
>>>>>>> Date:   Fri Sep 9 09:22:27 2016 -0400
>>>>>>>
>>>>>>> NFS add xprt switch addrs test to match client
>>>>>>>
>>>>>>> Signed-off-by: Andy Adamson <and...@netapp.com>
>>>>>>> Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>  
>>>>>>
>>>>>> Thanks for reporting on this everyone!  Does this patch help?  
>>>>>
>>>>> Actually, I still see the same bug with the same trace.  
>>>
>>> Well, it was worth a shot.  I'll keep poking at it.
>>>  
>>>>
>>>> I rebuild the latest net-next and I'm not seeing the trace any more...
>>>> I'm only seeing this (with or without your patch):
>>>>
>>>> [   23.465877] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>>>> [   23.473784] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>>>> [   23.588890] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>>>> [   23.596746] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>>>> [   23.781574] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>>>> [   23.789599] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0  
>>>
>>> Interesting, I get that too when I try to use NFS v4.1.  It's weird that 
>>> the crash would
>>> stop happening like that, so maybe something is racy in this area.
>>>
>>> Thanks for testing, Yotam and Jakub!
>>> Anna  
>>
>> I just found out that it happens on any of my machines, once I put two nfs 
>> entries in
>> my fstab. If I put only one, I don't see the problem. 
>>
>> I hope it might be helpful :)
> 
> Hi Anna,
> 
> any updates on this one?  The crash came back half an hour after I
> reported that it was gone...

Aww, I was hoping that patch would work.  It still seemed to fix some issues 
for me when mounting multiple servers, so I'm planning to keep it.  
Unfortunately I'm out of town this week, so I haven't had much of a chance to 
keep poking at this issue.  I should be able to get back to it next week!

Thanks for the update!
Anna

> 
> Over the weekend David Miller rebased net-next on top of 4.9.0-rc3 and
> the bug is still there :(  FWIW I also have multiple nfs mounts on my
> setup, 2 in fstab and one in a startup script.  Following Yotam I
> dropped one of the fstab entries and things seem to be working (even
> though I still have multiple mounts, the other one just comes a bit
> later).
> 



Re: nfs NULL-dereferencing in net-next

2016-10-31 Thread Jakub Kicinski
On Thu, 27 Oct 2016 06:50:22 +, Yotam Gigi wrote:
> >-Original Message-
> >From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
> >Sent: Wednesday, October 26, 2016 9:17 PM
> >To: Jakub Kicinski <kubak...@wp.pl>
> >Cc: Yotam Gigi <yot...@mellanox.com>; Andy Adamson <and...@netapp.com>;
> >linux-...@vger.kernel.org; netdev@vger.kernel.org; Trond Myklebust
> ><trond.mykleb...@netapp.com>; Yotam Gigi <yotam...@gmail.com>; mlxsw
> ><ml...@mellanox.com>
> >Subject: Re: nfs NULL-dereferencing in net-next
> >
> >On 10/26/2016 02:08 PM, Jakub Kicinski wrote:  
> >> On Wed, 26 Oct 2016 16:15:24 +, Yotam Gigi wrote:  
> >>>> -Original Message-
> >>>> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
> >>>> Sent: Wednesday, October 26, 2016 5:40 PM
> >>>> To: Yotam Gigi <yot...@mellanox.com>; Jakub Kicinski <kubak...@wp.pl>;  
> >Andy  
> >>>> Adamson <and...@netapp.com>; Anna Schumaker
> >>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
> >>>> Cc: netdev@vger.kernel.org; Trond Myklebust  
> ><trond.mykleb...@netapp.com>;  
> >>>> Yotam Gigi <yotam...@gmail.com>; mlxsw <ml...@mellanox.com>
> >>>> Subject: Re: nfs NULL-dereferencing in net-next
> >>>>
> >>>> On 10/25/2016 01:19 PM, Yotam Gigi wrote:  
> >>>>>  
> >>>>>> -Original Message-
> >>>>>> From: netdev-ow...@vger.kernel.org [mailto:netdev-  
> >ow...@vger.kernel.org]  
> >>>> On  
> >>>>>> Behalf Of Jakub Kicinski
> >>>>>> Sent: Monday, October 17, 2016 10:20 PM
> >>>>>> To: Andy Adamson <and...@netapp.com>; Anna Schumaker
> >>>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
> >>>>>> Cc: netdev@vger.kernel.org; Trond Myklebust  
> >>>> <trond.mykleb...@netapp.com>  
> >>>>>> Subject: nfs NULL-dereferencing in net-next
> >>>>>>
> >>>>>> Hi!
> >>>>>>
> >>>>>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
> >>>>>> ("fsl/fman: fix error return code in mac_probe()").  
> >>>>>
> >>>>>
> >>>>> I see the same thing. It happens constantly on some of my machines, 
> >>>>> making  
> >>>> them  
> >>>>> completely unusable.
> >>>>>
> >>>>> I bisected it and got to the commit:
> >>>>>
> >>>>> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
> >>>>> Author: Andy Adamson <and...@netapp.com>
> >>>>> Date:   Fri Sep 9 09:22:27 2016 -0400
> >>>>>
> >>>>> NFS add xprt switch addrs test to match client
> >>>>>
> >>>>> Signed-off-by: Andy Adamson <and...@netapp.com>
> >>>>> Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>  
> >>>>
> >>>> Thanks for reporting on this everyone!  Does this patch help?  
> >>>
> >>> Actually, I still see the same bug with the same trace.  
> >
> >Well, it was worth a shot.  I'll keep poking at it.
> >  
> >>
> >> I rebuild the latest net-next and I'm not seeing the trace any more...
> >> I'm only seeing this (with or without your patch):
> >>
> >> [   23.465877] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> >> [   23.473784] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> >> [   23.588890] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> >> [   23.596746] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> >> [   23.781574] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> >> [   23.789599] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0  
> >
> >Interesting, I get that too when I try to use NFS v4.1.  It's weird that the 
> >crash would
> >stop happening like that, so maybe something is racy in this area.
> >
> >Thanks for testing, Yotam and Jakub!
> >Anna  
> 
> I just found out that it happens on any of my machines, once I put two nfs 
> entries in
> my fstab. If I put only one, I don't see the problem. 
> 
> I hope it might be helpful :)

Hi Anna,

any updates on this one?  The crash came back half an hour after I
reported that it was gone...

Over the weekend David Miller rebased net-next on top of 4.9.0-rc3 and
the bug is still there :(  FWIW I also have multiple nfs mounts on my
setup, 2 in fstab and one in a startup script.  Following Yotam I
dropped one of the fstab entries and things seem to be working (even
though I still have multiple mounts, the other one just comes a bit
later).


RE: nfs NULL-dereferencing in net-next

2016-10-27 Thread Yotam Gigi


>-Original Message-
>From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
>Sent: Wednesday, October 26, 2016 9:17 PM
>To: Jakub Kicinski <kubak...@wp.pl>
>Cc: Yotam Gigi <yot...@mellanox.com>; Andy Adamson <and...@netapp.com>;
>linux-...@vger.kernel.org; netdev@vger.kernel.org; Trond Myklebust
><trond.mykleb...@netapp.com>; Yotam Gigi <yotam...@gmail.com>; mlxsw
><ml...@mellanox.com>
>Subject: Re: nfs NULL-dereferencing in net-next
>
>On 10/26/2016 02:08 PM, Jakub Kicinski wrote:
>> On Wed, 26 Oct 2016 16:15:24 +, Yotam Gigi wrote:
>>>> -Original Message-
>>>> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
>>>> Sent: Wednesday, October 26, 2016 5:40 PM
>>>> To: Yotam Gigi <yot...@mellanox.com>; Jakub Kicinski <kubak...@wp.pl>;
>Andy
>>>> Adamson <and...@netapp.com>; Anna Schumaker
>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>>> Cc: netdev@vger.kernel.org; Trond Myklebust
><trond.mykleb...@netapp.com>;
>>>> Yotam Gigi <yotam...@gmail.com>; mlxsw <ml...@mellanox.com>
>>>> Subject: Re: nfs NULL-dereferencing in net-next
>>>>
>>>> On 10/25/2016 01:19 PM, Yotam Gigi wrote:
>>>>>
>>>>>> -Original Message-
>>>>>> From: netdev-ow...@vger.kernel.org [mailto:netdev-
>ow...@vger.kernel.org]
>>>> On
>>>>>> Behalf Of Jakub Kicinski
>>>>>> Sent: Monday, October 17, 2016 10:20 PM
>>>>>> To: Andy Adamson <and...@netapp.com>; Anna Schumaker
>>>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>>>>> Cc: netdev@vger.kernel.org; Trond Myklebust
>>>> <trond.mykleb...@netapp.com>
>>>>>> Subject: nfs NULL-dereferencing in net-next
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>>>>>> ("fsl/fman: fix error return code in mac_probe()").
>>>>>
>>>>>
>>>>> I see the same thing. It happens constantly on some of my machines, making
>>>> them
>>>>> completely unusable.
>>>>>
>>>>> I bisected it and got to the commit:
>>>>>
>>>>> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
>>>>> Author: Andy Adamson <and...@netapp.com>
>>>>> Date:   Fri Sep 9 09:22:27 2016 -0400
>>>>>
>>>>> NFS add xprt switch addrs test to match client
>>>>>
>>>>> Signed-off-by: Andy Adamson <and...@netapp.com>
>>>>> Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>
>>>>
>>>> Thanks for reporting on this everyone!  Does this patch help?
>>>
>>> Actually, I still see the same bug with the same trace.
>
>Well, it was worth a shot.  I'll keep poking at it.
>
>>
>> I rebuild the latest net-next and I'm not seeing the trace any more...
>> I'm only seeing this (with or without your patch):
>>
>> [   23.465877] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>> [   23.473784] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>> [   23.588890] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>> [   23.596746] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>> [   23.781574] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>> [   23.789599] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
>
>Interesting, I get that too when I try to use NFS v4.1.  It's weird that the 
>crash would
>stop happening like that, so maybe something is racy in this area.
>
>Thanks for testing, Yotam and Jakub!
>Anna

I just found out that it happens on any of my machines, once I put two nfs 
entries in
my fstab. If I put only one, I don't see the problem. 

I hope it might be helpful :)

>
>>
>> HTH
>>



Re: nfs NULL-dereferencing in net-next

2016-10-26 Thread Anna Schumaker
On 10/26/2016 02:08 PM, Jakub Kicinski wrote:
> On Wed, 26 Oct 2016 16:15:24 +, Yotam Gigi wrote:
>>> -Original Message-
>>> From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
>>> Sent: Wednesday, October 26, 2016 5:40 PM
>>> To: Yotam Gigi <yot...@mellanox.com>; Jakub Kicinski <kubak...@wp.pl>; Andy
>>> Adamson <and...@netapp.com>; Anna Schumaker
>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>> Cc: netdev@vger.kernel.org; Trond Myklebust <trond.mykleb...@netapp.com>;
>>> Yotam Gigi <yotam...@gmail.com>; mlxsw <ml...@mellanox.com>
>>> Subject: Re: nfs NULL-dereferencing in net-next
>>>
>>> On 10/25/2016 01:19 PM, Yotam Gigi wrote:  
>>>>  
>>>>> -Original Message-
>>>>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]  
>>> On  
>>>>> Behalf Of Jakub Kicinski
>>>>> Sent: Monday, October 17, 2016 10:20 PM
>>>>> To: Andy Adamson <and...@netapp.com>; Anna Schumaker
>>>>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
>>>>> Cc: netdev@vger.kernel.org; Trond Myklebust  
>>> <trond.mykleb...@netapp.com>  
>>>>> Subject: nfs NULL-dereferencing in net-next
>>>>>
>>>>> Hi!
>>>>>
>>>>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>>>>> ("fsl/fman: fix error return code in mac_probe()").  
>>>>
>>>>
>>>> I see the same thing. It happens constantly on some of my machines, making 
>>>>  
>>> them  
>>>> completely unusable.
>>>>
>>>> I bisected it and got to the commit:
>>>>
>>>> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
>>>> Author: Andy Adamson <and...@netapp.com>
>>>> Date:   Fri Sep 9 09:22:27 2016 -0400
>>>>
>>>> NFS add xprt switch addrs test to match client
>>>>
>>>> Signed-off-by: Andy Adamson <and...@netapp.com>
>>>> Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>  
>>>
>>> Thanks for reporting on this everyone!  Does this patch help?  
>>
>> Actually, I still see the same bug with the same trace.

Well, it was worth a shot.  I'll keep poking at it.

> 
> I rebuild the latest net-next and I'm not seeing the trace any more...
> I'm only seeing this (with or without your patch):
> 
> [   23.465877] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> [   23.473784] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> [   23.588890] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> [   23.596746] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> [   23.781574] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
> [   23.789599] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0

Interesting, I get that too when I try to use NFS v4.1.  It's weird that the 
crash would stop happening like that, so maybe something is racy in this area.

Thanks for testing, Yotam and Jakub!
Anna

> 
> HTH
> 



Re: nfs NULL-dereferencing in net-next

2016-10-26 Thread Jakub Kicinski
On Wed, 26 Oct 2016 16:15:24 +, Yotam Gigi wrote:
> >-Original Message-
> >From: Anna Schumaker [mailto:anna.schuma...@netapp.com]
> >Sent: Wednesday, October 26, 2016 5:40 PM
> >To: Yotam Gigi <yot...@mellanox.com>; Jakub Kicinski <kubak...@wp.pl>; Andy
> >Adamson <and...@netapp.com>; Anna Schumaker
> ><anna.schuma...@netapp.com>; linux-...@vger.kernel.org
> >Cc: netdev@vger.kernel.org; Trond Myklebust <trond.mykleb...@netapp.com>;
> >Yotam Gigi <yotam...@gmail.com>; mlxsw <ml...@mellanox.com>
> >Subject: Re: nfs NULL-dereferencing in net-next
> >
> >On 10/25/2016 01:19 PM, Yotam Gigi wrote:  
> >>  
> >>> -Original Message-
> >>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]  
> >On  
> >>> Behalf Of Jakub Kicinski
> >>> Sent: Monday, October 17, 2016 10:20 PM
> >>> To: Andy Adamson <and...@netapp.com>; Anna Schumaker
> >>> <anna.schuma...@netapp.com>; linux-...@vger.kernel.org
> >>> Cc: netdev@vger.kernel.org; Trond Myklebust  
> ><trond.mykleb...@netapp.com>  
> >>> Subject: nfs NULL-dereferencing in net-next
> >>>
> >>> Hi!
> >>>
> >>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
> >>> ("fsl/fman: fix error return code in mac_probe()").  
> >>
> >>
> >> I see the same thing. It happens constantly on some of my machines, making 
> >>  
> >them  
> >> completely unusable.
> >>
> >> I bisected it and got to the commit:
> >>
> >> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
> >> Author: Andy Adamson <and...@netapp.com>
> >> Date:   Fri Sep 9 09:22:27 2016 -0400
> >>
> >> NFS add xprt switch addrs test to match client
> >>
> >> Signed-off-by: Andy Adamson <and...@netapp.com>
> >> Signed-off-by: Anna Schumaker <anna.schuma...@netapp.com>  
> >
> >Thanks for reporting on this everyone!  Does this patch help?  
> 
> Actually, I still see the same bug with the same trace.

I rebuild the latest net-next and I'm not seeing the trace any more...
I'm only seeing this (with or without your patch):

[   23.465877] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
[   23.473784] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
[   23.588890] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
[   23.596746] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
[   23.781574] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0
[   23.789599] NFS: set_pnfs_layoutdriver: cl_exchange_flags 0x0

HTH


Re: nfs NULL-dereferencing in net-next

2016-10-26 Thread Anna Schumaker
On 10/25/2016 01:19 PM, Yotam Gigi wrote:
> 
>> -Original Message-
>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
>> Behalf Of Jakub Kicinski
>> Sent: Monday, October 17, 2016 10:20 PM
>> To: Andy Adamson ; Anna Schumaker
>> ; linux-...@vger.kernel.org
>> Cc: netdev@vger.kernel.org; Trond Myklebust 
>> Subject: nfs NULL-dereferencing in net-next
>>
>> Hi!
>>
>> I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>> ("fsl/fman: fix error return code in mac_probe()").
> 
> 
> I see the same thing. It happens constantly on some of my machines, making 
> them
> completely unusable.
> 
> I bisected it and got to the commit:
> 
> commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
> Author: Andy Adamson 
> Date:   Fri Sep 9 09:22:27 2016 -0400
> 
> NFS add xprt switch addrs test to match client
> 
> Signed-off-by: Andy Adamson 
> Signed-off-by: Anna Schumaker 

Thanks for reporting on this everyone!  Does this patch help?

>From 96376ca1dd4077a1d341bdcb9cc86426ee3844f1 Mon Sep 17 00:00:00 2001
From: Anna Schumaker 
Date: Wed, 26 Oct 2016 10:33:31 -0400
Subject: [PATCH] SUNRPC: Fix suspicious RCU usage

We need to hold the rcu_read_lock() when calling rcu_dereference(),
otherwise we can't guarantee that the object being dereferenced still
exists.

Signed-off-by: Anna Schumaker 
---
 net/sunrpc/clnt.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 34dd7b2..62a4827 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2753,14 +2753,18 @@ EXPORT_SYMBOL_GPL(rpc_cap_max_reconnect_timeout);

 void rpc_clnt_xprt_switch_put(struct rpc_clnt *clnt)
 {
+   rcu_read_lock();
xprt_switch_put(rcu_dereference(clnt->cl_xpi.xpi_xpswitch));
+   rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_put);

 void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *clnt, struct rpc_xprt 
*xprt)
 {
+   rcu_read_lock();
rpc_xprt_switch_add_xprt(rcu_dereference(clnt->cl_xpi.xpi_xpswitch),
 xprt);
+   rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_add_xprt);

@@ -2770,9 +2774,8 @@ bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
struct rpc_xprt_switch *xps;
bool ret;

-   xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
-
rcu_read_lock();
+   xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
ret = rpc_xprt_switch_has_addr(xps, sap);
rcu_read_unlock();
return ret;
--
2.10.1

> 
> 
>>
>> [   23.409633] BUG: unable to handle kernel NULL pointer dereference at
>> 0172
>> [   23.418716] IP: [] 
>> rpc_clnt_xprt_switch_has_addr+0xc/0x40
>> [sunrpc]
>> [   23.427574] PGD 859020067 [   23.430472] PUD 858f2d067
>> PMD 0 [   23.434311]
>> [   23.436133] Oops:  [#1] PREEMPT SMP
>> [   23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables 
>> iptable_filter
>> ip_tables ebtable_nat ebtables x_tables intel_ri
>> [   23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 4.8.0-perf-13951-
>> g3f3177bb680f #51
>> [   23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10
>> 03/10/2015
>> [   23.523937] task: 983e9086ea00 task.stack: ac6c0a57c000
>> [   23.530641] RIP: 0010:[]  []
>> rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc]
>> [   23.542229] RSP: 0018:ac6c0a57fb28  EFLAGS: 00010a97
>> [   23.548255] RAX: c80214ac RBX: 983e97c7b000 RCX: 
>> 983e9b3bc180
>> [   23.556320] RDX: 0001 RSI: 983e9928ed28 RDI: 
>> ffea
>> [   23.564386] RBP: ac6c0a57fb38 R08: 983e97090630 R09: 
>> 983e9928ed30
>> [   23.572452] R10: ac6c0a57fba0 R11: 0010 R12: 
>> ac6c0a57fba0
>> [   23.580517] R13: 983e9928ed28 R14:  R15: 
>> 983e91360560
>> [   23.588585] FS:  7f4c348aa880() GS:983e9f24()
>> knlGS:
>> [   23.597742] CS:  0010 DS:  ES:  CR0: 80050033
>> [   23.604251] CR2: 0172 CR3: 000850a5f000 CR4:
>> 001406e0
>> [   23.612316] Stack:
>> [   23.614648]  983e97c7b000 ac6c0a57fba0 ac6c0a57fb90 
>> c04d38c3
>> [   23.623331]  983e91360500 983e9928ed30 c0b9e560
>> 983e913605b8
>> [   23.632016]  983e9882e800 983e9882e800 ac6c0a57fc30 
>> ac6c0a57fdb8
>> [   23.640706] Call Trace:
>> [   23.643535]  [] nfs_get_client+0x123/0x340 [nfs]
>> [   23.650542]  [] nfs4_set_client+0x80/0xb0 [nfsv4]
>> [   23.657642]  [] nfs4_create_server+0x115/0x2a0 [nfsv4]
>> [   23.665230]  [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
>> [   23.672519]  [] mount_fs+0x3a/0x160
>> [   23.678254]  [] ? alloc_vfsmnt+0x19e/0x230
>> [   23.684669]  [] 

RE: nfs NULL-dereferencing in net-next

2016-10-25 Thread Yotam Gigi

>-Original Message-
>From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On
>Behalf Of Jakub Kicinski
>Sent: Monday, October 17, 2016 10:20 PM
>To: Andy Adamson ; Anna Schumaker
>; linux-...@vger.kernel.org
>Cc: netdev@vger.kernel.org; Trond Myklebust 
>Subject: nfs NULL-dereferencing in net-next
>
>Hi!
>
>I'm hitting this reliably on net-next, HEAD at 3f3177bb680f
>("fsl/fman: fix error return code in mac_probe()").


I see the same thing. It happens constantly on some of my machines, making them
completely unusable.

I bisected it and got to the commit:

commit 04ea1b3e6d8ed4978bb608c1748530af3de8c274
Author: Andy Adamson 
Date:   Fri Sep 9 09:22:27 2016 -0400

NFS add xprt switch addrs test to match client

Signed-off-by: Andy Adamson 
Signed-off-by: Anna Schumaker 


>
>[   23.409633] BUG: unable to handle kernel NULL pointer dereference at
>0172
>[   23.418716] IP: [] rpc_clnt_xprt_switch_has_addr+0xc/0x40
>[sunrpc]
>[   23.427574] PGD 859020067 [   23.430472] PUD 858f2d067
>PMD 0 [   23.434311]
>[   23.436133] Oops:  [#1] PREEMPT SMP
>[   23.440506] Modules linked in: nfsv4 ip6table_filter ip6_tables 
>iptable_filter
>ip_tables ebtable_nat ebtables x_tables intel_ri
>[   23.505915] CPU: 1 PID: 1067 Comm: mount.nfs Not tainted 4.8.0-perf-13951-
>g3f3177bb680f #51
>[   23.515363] Hardware name: Dell Inc. PowerEdge T630/0W9WXC, BIOS 1.2.10
>03/10/2015
>[   23.523937] task: 983e9086ea00 task.stack: ac6c0a57c000
>[   23.530641] RIP: 0010:[]  []
>rpc_clnt_xprt_switch_has_addr+0xc/0x40 [sunrpc]
>[   23.542229] RSP: 0018:ac6c0a57fb28  EFLAGS: 00010a97
>[   23.548255] RAX: c80214ac RBX: 983e97c7b000 RCX: 
>983e9b3bc180
>[   23.556320] RDX: 0001 RSI: 983e9928ed28 RDI: 
>ffea
>[   23.564386] RBP: ac6c0a57fb38 R08: 983e97090630 R09: 
>983e9928ed30
>[   23.572452] R10: ac6c0a57fba0 R11: 0010 R12: 
>ac6c0a57fba0
>[   23.580517] R13: 983e9928ed28 R14:  R15: 
>983e91360560
>[   23.588585] FS:  7f4c348aa880() GS:983e9f24()
>knlGS:
>[   23.597742] CS:  0010 DS:  ES:  CR0: 80050033
>[   23.604251] CR2: 0172 CR3: 000850a5f000 CR4:
>001406e0
>[   23.612316] Stack:
>[   23.614648]  983e97c7b000 ac6c0a57fba0 ac6c0a57fb90 
>c04d38c3
>[   23.623331]  983e91360500 983e9928ed30 c0b9e560
>983e913605b8
>[   23.632016]  983e9882e800 983e9882e800 ac6c0a57fc30 
>ac6c0a57fdb8
>[   23.640706] Call Trace:
>[   23.643535]  [] nfs_get_client+0x123/0x340 [nfs]
>[   23.650542]  [] nfs4_set_client+0x80/0xb0 [nfsv4]
>[   23.657642]  [] nfs4_create_server+0x115/0x2a0 [nfsv4]
>[   23.665230]  [] nfs4_remote_mount+0x2e/0x60 [nfsv4]
>[   23.672519]  [] mount_fs+0x3a/0x160
>[   23.678254]  [] ? alloc_vfsmnt+0x19e/0x230
>[   23.684669]  [] vfs_kern_mount+0x67/0x110
>[   23.690990]  [] nfs_do_root_mount+0x84/0xc0 [nfsv4]
>[   23.698284]  [] nfs4_try_mount+0x37/0x50 [nfsv4]
>[   23.705287]  [] nfs_fs_mount+0x2d1/0xa70 [nfs]
>[   23.712092]  [] ? find_next_bit+0x18/0x20
>[   23.718413]  [] ? nfs_remount+0x3c0/0x3c0 [nfs]
>[   23.725316]  [] ? nfs_clone_super+0x130/0x130 [nfs]
>[   23.732606]  [] mount_fs+0x3a/0x160
>[   23.738340]  [] ? alloc_vfsmnt+0x19e/0x230
>[   23.744755]  [] vfs_kern_mount+0x67/0x110
>[   23.751071]  [] do_mount+0x1bf/0xc70
>[   23.756904]  [] ? copy_mount_options+0xbb/0x220
>[   23.763803]  [] SyS_mount+0x83/0xd0
>[   23.769538]  [] entry_SYSCALL_64_fastpath+0x17/0x98
>[   23.776817] Code: 01 00 48 8b 93 f8 04 00 00 44 89 e6 48 c7 c7 98 b2 43 c0 
>e8 9f 0d d4
>f9 eb c0 0f 1f 44 00 00 0f 1f 44 00 00
>[   23.802909] RIP  [] rpc_clnt_xprt_switch_has_addr+0xc/0x40
>[sunrpc]
>[   23.811857]  RSP 
>[   23.815839] CR2: 0172
>[   23.819629] ---[ end trace 9958eca92c9eeafe ]---
>[   23.827345] note: mount.nfs[1067] exited with preempt_count 1