Re: general protection fault in encode_rpcb_string

2018-05-08 Thread Bruce Fields
On Tue, May 08, 2018 at 12:34:48PM -0400, Chuck Lever wrote:
> 
> 
> > On May 8, 2018, at 12:15 PM, bfie...@fieldses.org wrote:
> > 
> > On Tue, Apr 17, 2018 at 09:54:36PM +, Trond Myklebust wrote:
> >> Yes, and we can probably convert it, and the other GFP_ATOMIC
> >> allocations in the rpcbind client to use GFP_NOFS in order to improve
> >> reliability.
> > 
> > Chuck, I think the GFP_ATOMIC is unnecessary here as well?
> > 
> > --b.
> > 
> > diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> > index e8adad33d0bb..de90c6c90cde 100644
> > --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> > +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> > @@ -228,7 +228,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, 
> > struct xdr_buf *xdrbuf,
> > /* XXX: Certain upper layer operations do
> >  *  not provide receive buffer pages.
> >  */
> > -   *ppages = alloc_page(GFP_ATOMIC);
> > +   *ppages = alloc_page(GFP_NOFS);
> > if (!*ppages)
> > return -EAGAIN;
> > }
> 
> This code can't sleep, as I understand it. Caller is holding
> the transport write lock. This logic was copied from
> xdr_partial_copy_from_skb, which uses GFP_ATOMIC.

OK.

> Recall that this is here because of GETACL. As I've stated in
> the past, the correct solution is to ensure that these pages
> are provided in every case by the upper layer, making this
> alloc_page call site unnecessary.

Got it.

--b.


Re: general protection fault in encode_rpcb_string

2018-05-08 Thread Chuck Lever


> On May 8, 2018, at 12:15 PM, bfie...@fieldses.org wrote:
> 
> On Tue, Apr 17, 2018 at 09:54:36PM +, Trond Myklebust wrote:
>> Yes, and we can probably convert it, and the other GFP_ATOMIC
>> allocations in the rpcbind client to use GFP_NOFS in order to improve
>> reliability.
> 
> Chuck, I think the GFP_ATOMIC is unnecessary here as well?
> 
> --b.
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index e8adad33d0bb..de90c6c90cde 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -228,7 +228,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct 
> xdr_buf *xdrbuf,
>   /* XXX: Certain upper layer operations do
>*  not provide receive buffer pages.
>*/
> - *ppages = alloc_page(GFP_ATOMIC);
> + *ppages = alloc_page(GFP_NOFS);
>   if (!*ppages)
>   return -EAGAIN;
>   }

This code can't sleep, as I understand it. Caller is holding
the transport write lock. This logic was copied from
xdr_partial_copy_from_skb, which uses GFP_ATOMIC.

Recall that this is here because of GETACL. As I've stated in
the past, the correct solution is to ensure that these pages
are provided in every case by the upper layer, making this
alloc_page call site unnecessary.


--
Chuck Lever
chuckle...@gmail.com





Re: general protection fault in encode_rpcb_string

2018-05-08 Thread bfie...@fieldses.org
On Tue, Apr 17, 2018 at 09:54:36PM +, Trond Myklebust wrote:
> Yes, and we can probably convert it, and the other GFP_ATOMIC
> allocations in the rpcbind client to use GFP_NOFS in order to improve
> reliability.

Chuck, I think the GFP_ATOMIC is unnecessary here as well?

--b.

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e8adad33d0bb..de90c6c90cde 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -228,7 +228,7 @@ rpcrdma_convert_iovs(struct rpcrdma_xprt *r_xprt, struct 
xdr_buf *xdrbuf,
/* XXX: Certain upper layer operations do
 *  not provide receive buffer pages.
 */
-   *ppages = alloc_page(GFP_ATOMIC);
+   *ppages = alloc_page(GFP_NOFS);
if (!*ppages)
return -EAGAIN;
}


Re: general protection fault in encode_rpcb_string

2018-05-08 Thread bfie...@fieldses.org
From: "J. Bruce Fields" 
Date: Tue, 8 May 2018 11:47:03 -0400
Subject: [PATCH 2/2] sunrpc: convert unnecessary GFP_ATOMIC to GFP_NOFS

It's OK to sleep here, we just don't want to recurse into the filesystem
as this writeout could be waiting on this.

As a next step: the documentation for GFP_NOFS says "Please try to avoid
using this flag directly and instead use memalloc_nofs_{save,restore} to
mark the whole scope which cannot/shouldn't recurse into the FS layer
with a short explanation why. All allocation requests will inherit
GFP_NOFS implicitly."

But I'm not sure where to do this.  Should the workqueue could be
arranging that for us in the case of workqueues created with
WQ_MEM_RECLAIM?

Reported-by: Trond Myklebust 
Signed-off-by: J. Bruce Fields 
---
 net/sunrpc/rpcb_clnt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

On Tue, Apr 17, 2018 at 09:54:36PM +, Trond Myklebust wrote:
> Yes, and we can probably convert it, and the other GFP_ATOMIC
> allocations in the rpcbind client to use GFP_NOFS in order to improve
> reliability.

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 82c120e51d64..576e84a1adee 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -752,7 +752,7 @@ void rpcb_getport_async(struct rpc_task *task)
goto bailout_nofree;
}
 
-   map = kzalloc(sizeof(struct rpcbind_args), GFP_ATOMIC);
+   map = kzalloc(sizeof(struct rpcbind_args), GFP_NOFS);
if (!map) {
status = -ENOMEM;
dprintk("RPC: %5u %s: no memory available\n",
@@ -770,7 +770,7 @@ void rpcb_getport_async(struct rpc_task *task)
case RPCBVERS_4:
case RPCBVERS_3:
map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
-   map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);
+   map->r_addr = rpc_sockaddr2uaddr(sap, GFP_NOFS);
if (!map->r_addr) {
status = -ENOMEM;
dprintk("RPC: %5u %s: no memory available\n",
-- 
2.17.0



Re: general protection fault in encode_rpcb_string

2018-04-17 Thread Trond Myklebust
On Tue, 2018-04-17 at 17:33 -0400, J. Bruce Fields wrote:
> On Mon, Apr 16, 2018 at 09:02:01PM -0700, syzbot wrote:
> > syzbot hit the following crash on bpf-next commit
> > 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018
> > +)
> > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b
> > 
> > So far this crash happened 2 times on bpf-next.
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633
> > 868800
> > syzkaller reproducer:
> > https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
> > Raw console output:
> > https://syzkaller.appspot.com/x/log.txt?id=5861511176126464
> 
> Based on that, looks like it's attempting an nfs mount while causing
> kmalloc failures?
> 
> Probably one of rpcb->r_netid, r_addr, or r_owner was bad in
> rpcb_enc_getaddr.
> 
> Hm, and previous log makes it look like it was an
> rpc_sockaddr2uaddr()
> in rpcb_getport_async() that was made to fail.  Do we need to check
> for
> failure of:
> 
>   map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);
> 
> ?

Yes, and we can probably convert it, and the other GFP_ATOMIC
allocations in the rpcbind client to use GFP_NOFS in order to improve
reliability.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammer.space


Re: general protection fault in encode_rpcb_string

2018-04-17 Thread J. Bruce Fields
On Mon, Apr 16, 2018 at 09:02:01PM -0700, syzbot wrote:
> syzbot hit the following crash on bpf-next commit
> 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +)
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b
> 
> So far this crash happened 2 times on bpf-next.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633868800
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5861511176126464

Based on that, looks like it's attempting an nfs mount while causing
kmalloc failures?

Probably one of rpcb->r_netid, r_addr, or r_owner was bad in
rpcb_enc_getaddr.

Hm, and previous log makes it look like it was an rpc_sockaddr2uaddr()
in rpcb_getport_async() that was made to fail.  Do we need to check for
failure of:

map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);

?

--b.

> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4b98281f2401ab849...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> RBP: 006dbc50 R08: 2000a000 R09: 3437
> R10:  R11: 0246 R12: 7fe464ffed80
> R13: 0030656c69662f2e R14:  R15: 0006
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 1861 Comm: kworker/u4:4 Not tainted 4.16.0+ #2
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Workqueue: rpciod rpc_async_schedule
> RIP: 0010:strlen+0x1f/0xa0 lib/string.c:479
> RSP: 0018:8801cf75f318 EFLAGS: 00010296
> RAX: dc00 RBX: 8801cf68f200 RCX: 86a8c407
> RDX:  RSI: 86a84d7b RDI: 
> RBP: 8801cf75f330 R08: 8801cf7de080 R09: ed0039ea3d43
> R10: ed0039ea3d43 R11: 8801cf51ea1f R12: 
> R13: 0200 R14:  R15: 8801cf75f3e0
> FS:  () GS:8801db00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f64808a4000 CR3: 0001b566a000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  strlen include/linux/string.h:267 [inline]
>  encode_rpcb_string+0x23/0x70 net/sunrpc/rpcb_clnt.c:914
>  rpcb_enc_getaddr+0x146/0x1f0 net/sunrpc/rpcb_clnt.c:940
>  rpcauth_wrap_req_encode net/sunrpc/auth.c:777 [inline]
>  rpcauth_wrap_req+0x1a8/0x230 net/sunrpc/auth.c:791
>  rpc_xdr_encode net/sunrpc/clnt.c:1754 [inline]
>  call_transmit+0x8a9/0xfe0 net/sunrpc/clnt.c:1949
>  __rpc_execute+0x28a/0xfe0 net/sunrpc/sched.c:784
>  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:857
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> Code: 37 ff ff ff 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff
> df 55 48 89 fa 48 c1 ea 03 48 89 e5 41 54 49 89 fc 53 48 83 ec 08
> <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 4d 41 80 3c
> RIP: strlen+0x1f/0xa0 lib/string.c:479 RSP: 8801cf75f318
> ---[ end trace bd76ed0378a56845 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug
> is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new
> bug report.
> Note: all commands must start from beginning of the line in the email body.


general protection fault in encode_rpcb_string

2018-04-16 Thread syzbot

Hello,

syzbot hit the following crash on bpf-next commit
5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b


So far this crash happened 2 times on bpf-next.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633868800
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5861511176126464
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5947642240294114534

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4b98281f2401ab849...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

RBP: 006dbc50 R08: 2000a000 R09: 3437
R10:  R11: 0246 R12: 7fe464ffed80
R13: 0030656c69662f2e R14:  R15: 0006
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 1861 Comm: kworker/u4:4 Not tainted 4.16.0+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: rpciod rpc_async_schedule
RIP: 0010:strlen+0x1f/0xa0 lib/string.c:479
RSP: 0018:8801cf75f318 EFLAGS: 00010296
RAX: dc00 RBX: 8801cf68f200 RCX: 86a8c407
RDX:  RSI: 86a84d7b RDI: 
RBP: 8801cf75f330 R08: 8801cf7de080 R09: ed0039ea3d43
R10: ed0039ea3d43 R11: 8801cf51ea1f R12: 
R13: 0200 R14:  R15: 8801cf75f3e0
FS:  () GS:8801db00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f64808a4000 CR3: 0001b566a000 CR4: 001406f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 strlen include/linux/string.h:267 [inline]
 encode_rpcb_string+0x23/0x70 net/sunrpc/rpcb_clnt.c:914
 rpcb_enc_getaddr+0x146/0x1f0 net/sunrpc/rpcb_clnt.c:940
 rpcauth_wrap_req_encode net/sunrpc/auth.c:777 [inline]
 rpcauth_wrap_req+0x1a8/0x230 net/sunrpc/auth.c:791
 rpc_xdr_encode net/sunrpc/clnt.c:1754 [inline]
 call_transmit+0x8a9/0xfe0 net/sunrpc/clnt.c:1949
 __rpc_execute+0x28a/0xfe0 net/sunrpc/sched.c:784
 rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:857
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
Code: 37 ff ff ff 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 55  
48 89 fa 48 c1 ea 03 48 89 e5 41 54 49 89 fc 53 48 83 ec 08 <0f> b6 04 02  
48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 4d 41 80 3c

RIP: strlen+0x1f/0xa0 lib/string.c:479 RSP: 8801cf75f318
---[ end trace bd76ed0378a56845 ]---


---
This bug is generated by a dumb bot. It may contain errors.
See https://goo.gl/tpsmEJ for details.
Direct all questions to syzkal...@googlegroups.com.

syzbot will keep track of this bug report.
If you forgot to add the Reported-by tag, once the fix for this bug is  
merged

into any tree, please reply to this email with:
#syz fix: exact-commit-title
If you want to test a patch for this bug, please reply with:
#syz test: git://repo/address.git branch
and provide the patch inline or as an attachment.
To mark this as a duplicate of another syzbot report, please reply with:
#syz dup: exact-subject-of-another-report
If it's a one-off invalid bug report, please reply with:
#syz invalid
Note: if the crash happens again, it will cause creation of a new bug  
report.

Note: all commands must start from beginning of the line in the email body.