from:"J. Bruce Fields"

Re: general protection fault in encode_rpcb_string

2018-04-17 Thread J. Bruce Fields

On Mon, Apr 16, 2018 at 09:02:01PM -0700, syzbot wrote:
> syzbot hit the following crash on bpf-next commit
> 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 +)
> Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b
> 
> So far this crash happened 2 times on bpf-next.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633868800
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=5861511176126464

Based on that, looks like it's attempting an nfs mount while causing
kmalloc failures?

Probably one of rpcb->r_netid, r_addr, or r_owner was bad in
rpcb_enc_getaddr.

Hm, and previous log makes it look like it was an rpc_sockaddr2uaddr()
in rpcb_getport_async() that was made to fail.  Do we need to check for
failure of:

map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC);

?

--b.

> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+4b98281f2401ab849...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
> 
> RBP: 006dbc50 R08: 2000a000 R09: 3437
> R10:  R11: 0246 R12: 7fe464ffed80
> R13: 0030656c69662f2e R14:  R15: 0006
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault:  [#1] SMP KASAN
> Dumping ftrace buffer:
>(ftrace buffer empty)
> Modules linked in:
> CPU: 0 PID: 1861 Comm: kworker/u4:4 Not tainted 4.16.0+ #2
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Workqueue: rpciod rpc_async_schedule
> RIP: 0010:strlen+0x1f/0xa0 lib/string.c:479
> RSP: 0018:8801cf75f318 EFLAGS: 00010296
> RAX: dc00 RBX: 8801cf68f200 RCX: 86a8c407
> RDX:  RSI: 86a84d7b RDI: 
> RBP: 8801cf75f330 R08: 8801cf7de080 R09: ed0039ea3d43
> R10: ed0039ea3d43 R11: 8801cf51ea1f R12: 
> R13: 0200 R14:  R15: 8801cf75f3e0
> FS:  () GS:8801db00() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 7f64808a4000 CR3: 0001b566a000 CR4: 001406f0
> DR0:  DR1:  DR2: 
> DR3:  DR6: fffe0ff0 DR7: 0400
> Call Trace:
>  strlen include/linux/string.h:267 [inline]
>  encode_rpcb_string+0x23/0x70 net/sunrpc/rpcb_clnt.c:914
>  rpcb_enc_getaddr+0x146/0x1f0 net/sunrpc/rpcb_clnt.c:940
>  rpcauth_wrap_req_encode net/sunrpc/auth.c:777 [inline]
>  rpcauth_wrap_req+0x1a8/0x230 net/sunrpc/auth.c:791
>  rpc_xdr_encode net/sunrpc/clnt.c:1754 [inline]
>  call_transmit+0x8a9/0xfe0 net/sunrpc/clnt.c:1949
>  __rpc_execute+0x28a/0xfe0 net/sunrpc/sched.c:784
>  rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:857
>  process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
>  worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
>  kthread+0x345/0x410 kernel/kthread.c:238
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
> Code: 37 ff ff ff 0f 1f 84 00 00 00 00 00 48 b8 00 00 00 00 00 fc ff
> df 55 48 89 fa 48 c1 ea 03 48 89 e5 41 54 49 89 fc 53 48 83 ec 08
> <0f> b6 04 02 48 89 fa 83 e2 07 38 d0 7f 04 84 c0 75 4d 41 80 3c
> RIP: strlen+0x1f/0xa0 lib/string.c:479 RSP: 8801cf75f318
> ---[ end trace bd76ed0378a56845 ]---
> 
> 
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkal...@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug
> is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new
> bug report.
> Note: all commands must start from beginning of the line in the email body.

Re: [PATCH] sunrpc: remove incorrect HMAC request initialization

2018-03-28 Thread J . Bruce Fields

Applying, thanks!--b.

On Wed, Mar 28, 2018 at 10:57:22AM -0700, Eric Biggers wrote:
> make_checksum_hmac_md5() is allocating an HMAC transform and doing
> crypto API calls in the following order:
> 
> crypto_ahash_init()
> crypto_ahash_setkey()
> crypto_ahash_digest()
> 
> This is wrong because it makes no sense to init() the request before a
> key has been set, given that the initial state depends on the key.  And
> digest() is short for init() + update() + final(), so in this case
> there's no need to explicitly call init() at all.
> 
> Before commit 9fa68f620041 ("crypto: hash - prevent using keyed hashes
> without setting key") the extra init() had no real effect, at least for
> the software HMAC implementation.  (There are also hardware drivers that
> implement HMAC-MD5, and it's not immediately obvious how gracefully they
> handle init() before setkey().)  But now the crypto API detects this
> incorrect initialization and returns -ENOKEY.  This is breaking NFS
> mounts in some cases.
> 
> Fix it by removing the incorrect call to crypto_ahash_init().
> 
> Reported-by: Michael Young 
> Fixes: 9fa68f620041 ("crypto: hash - prevent using keyed hashes without 
> setting key")
> Fixes: fffdaef2eb4a ("gss_krb5: Add support for rc4-hmac encryption")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Eric Biggers 
> ---
>  net/sunrpc/auth_gss/gss_krb5_crypto.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
> b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> index 12649c9fedab..8654494b4d0a 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> @@ -237,9 +237,6 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char 
> *header, int hdrlen,
>  
>   ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
>  
> - err = crypto_ahash_init(req);
> - if (err)
> - goto out;
>   err = crypto_ahash_setkey(hmac_md5, cksumkey, kctx->gk5e->keylength);
>   if (err)
>   goto out;
> -- 
> 2.17.0.rc1.321.gba9d0f2565-goog

Re: NFS mounts failing when keytab present on client

2018-03-28 Thread J. Bruce Fields

On Wed, Mar 28, 2018 at 02:03:34PM -0400, J. Bruce Fields wrote:
> Thanks, got it.  Do you know how to find a commit id for that change?
> It's not entirely fair to blame the crypto change for what was really a
> latent nfs bug, but it might still be worth adding a Fixes: line just so
> people know where it needs backporting.

Whoops, I see you already did that.  I should finish my inbox before
responding!

--b.

Re: NFS mounts failing when keytab present on client

2018-03-28 Thread J. Bruce Fields

On Wed, Mar 28, 2018 at 10:50:51AM -0700, Eric Biggers wrote:
> On Wed, Mar 28, 2018 at 11:46:28AM -0400, J. Bruce Fields wrote:
> > On Tue, Mar 27, 2018 at 03:29:50PM -0700, Eric Biggers wrote:
> > > Hi Michael,
> > > 
> > > On Tue, Mar 27, 2018 at 11:06:14PM +0100, Michael Young wrote:
> > > > NFS mounts stopped working on one of my computers after a kernel update 
> > > > from
> > > > 4.15.3 to 4.15.4. I traced the problem to the commit
> > > > [46e8d06e423c4f35eac7a8b677b713b3ec9b0684] crypto: hash - prevent using
> > > > keyed hashes without setting key
> > > > and a later kernel with this patch reverted works normally.
> > > > 
> > > > The problem seems to be related to kerberos as the mount fails when the
> > > > keytab is present, but works if I rename the keytab file. This is true 
> > > > even
> > > > though the mount is with sec=sys . The mount should also work with 
> > > > sec=krb5
> > > > but that also fails in the same way. When the mount fails there are 
> > > > errors
> > > > in dmesg like
> > > > [ 1232.522816] gss_marshal: gss_get_mic FAILED (851968)
> > > > [ 1232.522819] RPC: couldn't encode RPC header, exit EIO
> > > > [ 1232.522856] gss_marshal: gss_get_mic FAILED (851968)
> > > > [ 1232.522857] RPC: couldn't encode RPC header, exit EIO
> > > > [ 1232.522863] NFS: nfs4_discover_server_trunking unhandled error -5.
> > > > Exiting with error EIO
> > > > [ 1232.525039] gss_marshal: gss_get_mic FAILED (851968)
> > > > [ 1232.525042] RPC: couldn't encode RPC header, exit EIO
> > > > 
> > > > Michael Young
> > > 
> > > Thanks for the bug report.  I think the error is coming from
> > > net/sunrpc/auth_gss/gss_krb5_crypto.c.  There are two potential problems 
> > > I see.
> > > The first one, which is definitely a bug, is that make_checksum_hmac_md5()
> > > allocates an HMAC transform and request, then does these crypto API calls:
> > > 
> > >   crypto_ahash_init()
> > >   crypto_ahash_setkey()
> > >   crypto_ahash_digest()
> > > 
> > > This is wrong because it makes no sense to init() the HMAC request before 
> > > the
> > > key has been set, and doubly so when it's calling digest() which is 
> > > shorthand
> > > for init() + update() + final().  So I think it just needs to be removed. 
> > >  You
> > > can test the following patch:
> > 
> > When was this introduced? 
> > 
> > 3b5cf20cf439 "sunrpc: Use skcipher and ahash/shash"
> > - probably not, assuming the above was still just as wrong with
> >   crypto_hash_{init,setkey,digest} as it is with
> >   crypto_ahash_{init,setkey,digest}
> > 
> > So I'm guessing it was wrong from the start when it was added by
> > fffdaef2eb4a "gss_krb5: Add support for rc4-hmac encryption" 8 years
> > ago.  Wonder why it took this long to notice?  Did something else
> > change?
> > 
> > --b.
> 
> It was wrong from the start, but the crypto API only recently started 
> enforcing
> that the key has to be set before init() or digest() is called.  Before that 
> the
> code was just doing unnecessary work, at least with the software HMAC
> implementation.  Though, there are also hardware crypto drivers that implement
> HMAC-MD5, and it's not immediately obvious that they handle init() before
> setkey() as gracefully as the software implementation.

Thanks, got it.  Do you know how to find a commit id for that change?
It's not entirely fair to blame the crypto change for what was really a
latent nfs bug, but it might still be worth adding a Fixes: line just so
people know where it needs backporting.

--b.

Re: NFS mounts failing when keytab present on client

2018-03-28 Thread J. Bruce Fields

On Tue, Mar 27, 2018 at 03:29:50PM -0700, Eric Biggers wrote:
> Hi Michael,
> 
> On Tue, Mar 27, 2018 at 11:06:14PM +0100, Michael Young wrote:
> > NFS mounts stopped working on one of my computers after a kernel update from
> > 4.15.3 to 4.15.4. I traced the problem to the commit
> > [46e8d06e423c4f35eac7a8b677b713b3ec9b0684] crypto: hash - prevent using
> > keyed hashes without setting key
> > and a later kernel with this patch reverted works normally.
> > 
> > The problem seems to be related to kerberos as the mount fails when the
> > keytab is present, but works if I rename the keytab file. This is true even
> > though the mount is with sec=sys . The mount should also work with sec=krb5
> > but that also fails in the same way. When the mount fails there are errors
> > in dmesg like
> > [ 1232.522816] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.522819] RPC: couldn't encode RPC header, exit EIO
> > [ 1232.522856] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.522857] RPC: couldn't encode RPC header, exit EIO
> > [ 1232.522863] NFS: nfs4_discover_server_trunking unhandled error -5.
> > Exiting with error EIO
> > [ 1232.525039] gss_marshal: gss_get_mic FAILED (851968)
> > [ 1232.525042] RPC: couldn't encode RPC header, exit EIO
> > 
> > Michael Young
> 
> Thanks for the bug report.  I think the error is coming from
> net/sunrpc/auth_gss/gss_krb5_crypto.c.  There are two potential problems I 
> see.
> The first one, which is definitely a bug, is that make_checksum_hmac_md5()
> allocates an HMAC transform and request, then does these crypto API calls:
> 
>   crypto_ahash_init()
>   crypto_ahash_setkey()
>   crypto_ahash_digest()
> 
> This is wrong because it makes no sense to init() the HMAC request before the
> key has been set, and doubly so when it's calling digest() which is shorthand
> for init() + update() + final().  So I think it just needs to be removed.  You
> can test the following patch:

When was this introduced? 

3b5cf20cf439 "sunrpc: Use skcipher and ahash/shash"
- probably not, assuming the above was still just as wrong with
  crypto_hash_{init,setkey,digest} as it is with
  crypto_ahash_{init,setkey,digest}

So I'm guessing it was wrong from the start when it was added by
fffdaef2eb4a "gss_krb5: Add support for rc4-hmac encryption" 8 years
ago.  Wonder why it took this long to notice?  Did something else
change?

--b.


> 
> diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
> b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> index 12649c9fedab..8654494b4d0a 100644
> --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
> +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
> @@ -237,9 +237,6 @@ make_checksum_hmac_md5(struct krb5_ctx *kctx, char 
> *header, int hdrlen,
>  
> ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
>  
> -   err = crypto_ahash_init(req);
> -   if (err)
> -   goto out;
> err = crypto_ahash_setkey(hmac_md5, cksumkey, kctx->gk5e->keylength);
> if (err)
> goto out;
> 
> If that's not it, it's also possible that the error is coming from the
> crypto_ahash_init() in make_checksum().  That can only happen if 'cksumkey' is
> NULL and the hash algorithm is keyed, which implies a logical error as it
> doesn't make sense to use a keyed hash algorithm without the key.  The callers
> do check kctx->gk5e->keyed_cksum which I'd hope would prevent this, though
> perhaps kctx->cksum can be NULL.
> 
> Eric

Re: [PATCH net-next nfs 1/6] net: Convert rpcsec_gss_net_ops

2018-03-26 Thread J. Bruce Fields

On Fri, Mar 23, 2018 at 02:53:34PM -0400, Anna Schumaker wrote:
> 
> 
> On 03/13/2018 06:49 AM, Kirill Tkhai wrote:
> > These pernet_operations initialize and destroy sunrpc_net_id refered
> > per-net items. Only used global list is cache_list, and accesses
> > already serialized.
> > 
> > sunrpc_destroy_cache_detail() check for list_empty() without
> > cache_list_lock, but when it's called from
> > unregister_pernet_subsys(), there can't be callers in parallel, so
> > we won't miss list_empty() in this case.
> > 
> > Signed-off-by: Kirill Tkhai <ktk...@virtuozzo.com>
> 
> It might make sense to take these and the other NFS patches through
> the net tree, since the pernet_operations don't yet have the async
> field in my tree (and I therefore can't compile once these are
> applied).

Ditto for the nfsd patch, so, for what it's worth:

Acked-by: J. Bruce Fields <bfie...@redhat.com>

for that patch.--b.

--b.

Re: [PATCH net-next nfs 0/6] Converting pernet_operations (part #7)

2018-03-15 Thread J. Bruce Fields

On Thu, Mar 15, 2018 at 04:32:30PM +0300, Kirill Tkhai wrote:
> Trond, Anna, Bruce, Jeff, David and other NFS and RXRPC people,
> could you please provide your vision on this patches?

Whoops, sorry, I haven't been paying attention.  Do you have a pointer
to documentation?  I'm unclear what the actual concurrency change
is--sounds like it becomes possible that e.g. multiple ->init methods
(from the same pernet_operations but for different namespaces) could run
in parallel?

Sounds likely to be safe, and I don't actually care too much who merges
them as they look very unlikely to conflict with anything pending.  But
unless anyone tells me otherwise I'll take the one nfsd_net_ops patch
and leave the rest to Anna or Trond.

--b.

> 
> Thanks,
> Kirill
> 
> On 13.03.2018 13:49, Kirill Tkhai wrote:
> > Hi,
> > 
> > this series continues to review and to convert pernet_operations
> > to make them possible to be executed in parallel for several
> > net namespaces in the same time. There are nfs pernet_operations
> > in this series. All of them look similar each other, they mostly
> > create and destroy caches with small exceptions.
> > 
> > Also, there is rxrpc_net_ops, which is used in AFS.
> > 
> > Thanks,
> > Kirill
> > ---
> > 
> > Kirill Tkhai (6):
> >   net: Convert rpcsec_gss_net_ops
> >   net: Convert sunrpc_net_ops
> >   net: Convert nfsd_net_ops
> >   net: Convert nfs4_dns_resolver_ops
> >   net: Convert nfs4blocklayout_net_ops
> >   net: Convert rxrpc_net_ops
> > 
> > 
> >  fs/nfs/blocklayout/rpc_pipefs.c |1 +
> >  fs/nfs/dns_resolve.c|1 +
> >  fs/nfsd/nfsctl.c|1 +
> >  net/rxrpc/net_ns.c  |1 +
> >  net/sunrpc/auth_gss/auth_gss.c  |1 +
> >  net/sunrpc/sunrpc_syms.c|1 +
> >  6 files changed, 6 insertions(+)
> > 
> > --
> > Signed-off-by: Kirill Tkhai 
> >

Re: [PATCH 0/4] make function arg and structures as const

2017-11-10 Thread J. Bruce Fields

On Fri, Nov 10, 2017 at 10:09:46AM -0500, Anna Schumaker wrote:
> 
> 
> On 11/09/2017 09:21 PM, J. Bruce Fields wrote:
> > On Tue, Oct 17, 2017 at 12:40:27PM -0400, Jeff Layton wrote:
> >> On Tue, 2017-10-17 at 18:14 +0200, Bhumika Goyal wrote:
> >>> Make the function argument as const. After thing change, make
> >>> the cache_detail structures as const.
> >>>
> >>> Bhumika Goyal (4):
> >>>   sunrpc: make the function arg as const
> >>>   NFS: make cache_detail structures const
> >>>   NFSD: make cache_detail structures const
> >>>   SUNRPC: make cache_detail structures const
> >>>
> >>>  fs/nfs/dns_resolve.c  | 2 +-
> >>>  fs/nfsd/export.c  | 4 ++--
> >>>  fs/nfsd/nfs4idmap.c   | 4 ++--
> >>>  include/linux/sunrpc/cache.h  | 2 +-
> >>>  net/sunrpc/auth_gss/svcauth_gss.c | 4 ++--
> >>>  net/sunrpc/cache.c| 2 +-
> >>>  net/sunrpc/svcauth_unix.c | 4 ++--
> >>>  7 files changed, 11 insertions(+), 11 deletions(-)
> >>>
> >>
> >> Looks pretty straightforward. You can add this to the set:
> >>
> >> Reviewed-by: Jeff Layton <jlay...@redhat.com>
> > 
> > Thanks, I've applied 1, 3, and 4 and could take #2 as well if it's OK
> > with Trond/Anna.
> 
> I don't mind taking #2, it's already in my branch :)

OK, thanks.--b.

Re: [PATCH 0/4] make function arg and structures as const

2017-11-09 Thread J. Bruce Fields

On Tue, Oct 17, 2017 at 12:40:27PM -0400, Jeff Layton wrote:
> On Tue, 2017-10-17 at 18:14 +0200, Bhumika Goyal wrote:
> > Make the function argument as const. After thing change, make
> > the cache_detail structures as const.
> > 
> > Bhumika Goyal (4):
> >   sunrpc: make the function arg as const
> >   NFS: make cache_detail structures const
> >   NFSD: make cache_detail structures const
> >   SUNRPC: make cache_detail structures const
> > 
> >  fs/nfs/dns_resolve.c  | 2 +-
> >  fs/nfsd/export.c  | 4 ++--
> >  fs/nfsd/nfs4idmap.c   | 4 ++--
> >  include/linux/sunrpc/cache.h  | 2 +-
> >  net/sunrpc/auth_gss/svcauth_gss.c | 4 ++--
> >  net/sunrpc/cache.c| 2 +-
> >  net/sunrpc/svcauth_unix.c | 4 ++--
> >  7 files changed, 11 insertions(+), 11 deletions(-)
> > 
> 
> Looks pretty straightforward. You can add this to the set:
> 
> Reviewed-by: Jeff Layton 

Thanks, I've applied 1, 3, and 4 and could take #2 as well if it's OK
with Trond/Anna.

--b.

Re: [PATCH v3 01/21] grace: replace BUG_ON by WARN_ONCE in exit_net hook

2017-11-09 Thread J. Bruce Fields

Applied for 4.15, thanks.--b.

On Mon, Nov 06, 2017 at 04:22:48PM +0300, Vasily Averin wrote:
> Signed-off-by: Vasily Averin 
> ---
>  fs/nfs_common/grace.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
> index 420d3a0..1bd6599 100644
> --- a/fs/nfs_common/grace.c
> +++ b/fs/nfs_common/grace.c
> @@ -104,7 +104,9 @@ grace_exit_net(struct net *net)
>  {
>   struct list_head *grace_list = net_generic(net, grace_net_id);
>  
> - BUG_ON(!list_empty(grace_list));
> + WARN_ONCE(!list_empty(grace_list),
> +   "net %x %s: grace_list is not empty\n",
> +   net->ns.inum, __func__);
>  }
>  
>  static struct pernet_operations grace_net_ops = {
> -- 
> 2.7.4

Re: [PATCH v3 02/21] lockd: added cleanup checks in exit_net hook

2017-11-09 Thread J. Bruce Fields

Applied.--b.

On Mon, Nov 06, 2017 at 04:23:24PM +0300, Vasily Averin wrote:
> Signed-off-by: Vasily Averin 
> ---
>  fs/lockd/svc.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 809cbcc..2a48558 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -690,6 +690,17 @@ static int lockd_init_net(struct net *net)
>  
>  static void lockd_exit_net(struct net *net)
>  {
> + struct lockd_net *ln = net_generic(net, lockd_net_id);
> +
> + WARN_ONCE(!list_empty(>lockd_manager.list),
> +   "net %x %s: lockd_manager.list is not empty\n",
> +   net->ns.inum, __func__);
> + WARN_ONCE(!list_empty(>nsm_handles),
> +   "net %x %s: nsm_handles list is not empty\n",
> +   net->ns.inum, __func__);
> + WARN_ONCE(delayed_work_pending(>grace_period_end),
> +   "net %x %s: grace_period_end was not cancelled\n",
> +   net->ns.inum, __func__);
>  }
>  
>  static struct pernet_operations lockd_net_ops = {
> -- 
> 2.7.4

Re: [PATCH v3 01/21] grace: replace BUG_ON by WARN_ONCE in exit_net hook

2017-11-06 Thread J. Bruce Fields

I'm assuming you want me to apply this (and the following lockd patch),
but I'm not clear why this is part of a bigger series going to netdev.

--b.

On Mon, Nov 06, 2017 at 04:22:48PM +0300, Vasily Averin wrote:
> Signed-off-by: Vasily Averin 
> ---
>  fs/nfs_common/grace.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs_common/grace.c b/fs/nfs_common/grace.c
> index 420d3a0..1bd6599 100644
> --- a/fs/nfs_common/grace.c
> +++ b/fs/nfs_common/grace.c
> @@ -104,7 +104,9 @@ grace_exit_net(struct net *net)
>  {
>   struct list_head *grace_list = net_generic(net, grace_net_id);
>  
> - BUG_ON(!list_empty(grace_list));
> + WARN_ONCE(!list_empty(grace_list),
> +   "net %x %s: grace_list is not empty\n",
> +   net->ns.inum, __func__);
>  }
>  
>  static struct pernet_operations grace_net_ops = {
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sunrpc: svcauth_gss: use BUG_ON instead of if condition followed by BUG

2017-10-24 Thread J. Bruce Fields

On Tue, Oct 24, 2017 at 02:18:52PM -0400, Jeff Layton wrote:
> On Tue, 2017-10-24 at 13:53 -0400, J. Bruce Fields wrote:
> > On Tue, Oct 24, 2017 at 01:26:49PM -0400, Weston Andros Adamson wrote:
> > > Is there a reason to BUG() in these places? Couldn't we WARN_ON_ONCE and 
> > > return an error?
> > 
> > I think the BUG() will just kill an nfsd thread that isn't holding any
> > interesting locks.
> > 
> 
> Not necessarily. If panic_on_oops is set (and it usually is in
> "production" setups), it'll crash the box there.

Maybe they're getting what they asked for?

> > The failures look unlikely.  (Except for that read_u32... return, I
> > wonder if we're missing a check there.)
> 
> Agreed, looks like you only hit an error if the read attempts to go out
> of bounds. In principle that shouldn't ever happen (and I haven't seen
> any reports of it).
> 
> Still...I agree with Dros that it's better to handle this without
> oopsing if we can. We can return an error from either of those
> functions. A sane error and a WARN_ONCE would be better here.

OK, OK, OK.

There are also some more BUGs that could use looking into if anyone
wants to.

--b.

commit eb754930662f
Author: J. Bruce Fields <bfie...@redhat.com>
Date:   Tue Oct 24 14:58:11 2017 -0400

rpc: remove some BUG()s

It would be kinder to WARN() and recover in several spots here instead
of BUG()ing.

Also, it looks like the read_u32_from_xdr_buf() call could actually
fail, though it might require a broken (or malicious) client, so convert
    that to just an error return.

Reported-by: Weston Andros Adamson <d...@monkey.org>
Signed-off-by: J. Bruce Fields <bfie...@redhat.com>

diff --git a/net/sunrpc/auth_gss/svcauth_gss.c 
b/net/sunrpc/auth_gss/svcauth_gss.c
index 7b1ee5a0b03c..73165e9ca5bf 100644
--- a/net/sunrpc/auth_gss/svcauth_gss.c
+++ b/net/sunrpc/auth_gss/svcauth_gss.c
@@ -855,11 +855,13 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf 
*buf, u32 seq, struct g
return stat;
if (integ_len > buf->len)
return stat;
-   if (xdr_buf_subsegment(buf, _buf, 0, integ_len))
-   BUG();
+   if (xdr_buf_subsegment(buf, _buf, 0, integ_len)) {
+   WARN_ON_ONCE(1);
+   return stat;
+   }
/* copy out mic... */
if (read_u32_from_xdr_buf(buf, integ_len, ))
-   BUG();
+   return stat;
if (mic.len > RPC_MAX_AUTH_SIZE)
return stat;
mic.data = kmalloc(mic.len, GFP_KERNEL);
@@ -1611,8 +1613,10 @@ svcauth_gss_wrap_resp_integ(struct svc_rqst *rqstp)
BUG_ON(integ_len % 4);
*p++ = htonl(integ_len);
*p++ = htonl(gc->gc_seq);
-   if (xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len))
-   BUG();
+   if (xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len)) {
+   WARN_ON_ONCE(1);
+   goto out_err;
+   }
if (resbuf->tail[0].iov_base == NULL) {
if (resbuf->head[0].iov_len + RPC_MAX_AUTH_SIZE > PAGE_SIZE)
goto out_err;

Re: [PATCH] net: sunrpc: svcauth_gss: use BUG_ON instead of if condition followed by BUG

2017-10-24 Thread J. Bruce Fields

On Tue, Oct 24, 2017 at 01:26:49PM -0400, Weston Andros Adamson wrote:
> Is there a reason to BUG() in these places? Couldn't we WARN_ON_ONCE and 
> return an error?

I think the BUG() will just kill an nfsd thread that isn't holding any
interesting locks.

The failures look unlikely.  (Except for that read_u32... return, I
wonder if we're missing a check there.)

--b.

> 
> -dros
> 
> > On Oct 23, 2017, at 4:31 PM, J. Bruce Fields <bfie...@fieldses.org> wrote:
> > 
> > In the past we've avoided BUG_ON(X) where X might have side effects, on
> > the theory that it should actually be OK just to compile out BUG_ON()s.
> > Has that changed?
> > 
> > In any case, I don't find that this improves readability; dropping.
> > 
> > --b.
> > 
> > On Mon, Oct 23, 2017 at 01:16:35PM -0500, Gustavo A. R. Silva wrote:
> >> Use BUG_ON instead of if condition followed by BUG.
> >> 
> >> This issue was detected with the help of Coccinelle.
> >> 
> >> Signed-off-by: Gustavo A. R. Silva <garsi...@embeddedor.com>
> >> ---
> >> net/sunrpc/auth_gss/svcauth_gss.c | 9 +++--
> >> 1 file changed, 3 insertions(+), 6 deletions(-)
> >> 
> >> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c 
> >> b/net/sunrpc/auth_gss/svcauth_gss.c
> >> index 7b1ee5a..a10ce43 100644
> >> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> >> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> >> @@ -855,11 +855,9 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct 
> >> xdr_buf *buf, u32 seq, struct g
> >>return stat;
> >>if (integ_len > buf->len)
> >>return stat;
> >> -  if (xdr_buf_subsegment(buf, _buf, 0, integ_len))
> >> -  BUG();
> >> +  BUG_ON(xdr_buf_subsegment(buf, _buf, 0, integ_len));
> >>/* copy out mic... */
> >> -  if (read_u32_from_xdr_buf(buf, integ_len, ))
> >> -  BUG();
> >> +  BUG_ON(read_u32_from_xdr_buf(buf, integ_len, ));
> >>if (mic.len > RPC_MAX_AUTH_SIZE)
> >>return stat;
> >>mic.data = kmalloc(mic.len, GFP_KERNEL);
> >> @@ -1611,8 +1609,7 @@ svcauth_gss_wrap_resp_integ(struct svc_rqst *rqstp)
> >>BUG_ON(integ_len % 4);
> >>*p++ = htonl(integ_len);
> >>*p++ = htonl(gc->gc_seq);
> >> -  if (xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len))
> >> -  BUG();
> >> +  BUG_ON(xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len));
> >>if (resbuf->tail[0].iov_base == NULL) {
> >>if (resbuf->head[0].iov_len + RPC_MAX_AUTH_SIZE > PAGE_SIZE)
> >>goto out_err;
> >> -- 
> >> 2.7.4
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] net: sunrpc: svcauth_gss: use BUG_ON instead of if condition followed by BUG

2017-10-23 Thread J. Bruce Fields

In the past we've avoided BUG_ON(X) where X might have side effects, on
the theory that it should actually be OK just to compile out BUG_ON()s.
Has that changed?

In any case, I don't find that this improves readability; dropping.

--b.

On Mon, Oct 23, 2017 at 01:16:35PM -0500, Gustavo A. R. Silva wrote:
> Use BUG_ON instead of if condition followed by BUG.
> 
> This issue was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  net/sunrpc/auth_gss/svcauth_gss.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c 
> b/net/sunrpc/auth_gss/svcauth_gss.c
> index 7b1ee5a..a10ce43 100644
> --- a/net/sunrpc/auth_gss/svcauth_gss.c
> +++ b/net/sunrpc/auth_gss/svcauth_gss.c
> @@ -855,11 +855,9 @@ unwrap_integ_data(struct svc_rqst *rqstp, struct xdr_buf 
> *buf, u32 seq, struct g
>   return stat;
>   if (integ_len > buf->len)
>   return stat;
> - if (xdr_buf_subsegment(buf, _buf, 0, integ_len))
> - BUG();
> + BUG_ON(xdr_buf_subsegment(buf, _buf, 0, integ_len));
>   /* copy out mic... */
> - if (read_u32_from_xdr_buf(buf, integ_len, ))
> - BUG();
> + BUG_ON(read_u32_from_xdr_buf(buf, integ_len, ));
>   if (mic.len > RPC_MAX_AUTH_SIZE)
>   return stat;
>   mic.data = kmalloc(mic.len, GFP_KERNEL);
> @@ -1611,8 +1609,7 @@ svcauth_gss_wrap_resp_integ(struct svc_rqst *rqstp)
>   BUG_ON(integ_len % 4);
>   *p++ = htonl(integ_len);
>   *p++ = htonl(gc->gc_seq);
> - if (xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len))
> - BUG();
> + BUG_ON(xdr_buf_subsegment(resbuf, _buf, integ_offset, integ_len));
>   if (resbuf->tail[0].iov_base == NULL) {
>   if (resbuf->head[0].iov_len + RPC_MAX_AUTH_SIZE > PAGE_SIZE)
>   goto out_err;
> -- 
> 2.7.4

Re: [PATCH] sunrcp: make function _svc_create_xprt static

2017-10-16 Thread J . Bruce Fields

Thanks, applied for 4.15.--b.

On Mon, Oct 16, 2017 at 02:40:21PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The function _svc_create_xprt is local to the source and
> does not need to be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol '_svc_create_xprt' was not declared. Should it be static?
> 
> Signed-off-by: Colin Ian King 
> ---
>  net/sunrpc/svc_xprt.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index d16a8b423c20..18e87791350f 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -250,9 +250,9 @@ void svc_add_new_perm_xprt(struct svc_serv *serv, struct 
> svc_xprt *new)
>   svc_xprt_received(new);
>  }
>  
> -int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name,
> - struct net *net, const int family,
> - const unsigned short port, int flags)
> +static int _svc_create_xprt(struct svc_serv *serv, const char *xprt_name,
> + struct net *net, const int family,
> + const unsigned short port, int flags)
>  {
>   struct svc_xprt_class *xcl;
>  
> -- 
> 2.14.1

Re: [PATCH 00/18] use ARRAY_SIZE macro

2017-10-05 Thread J. Bruce Fields

On Mon, Oct 02, 2017 at 09:33:12PM -0400, Jérémy Lefaure wrote:
> On Mon, 2 Oct 2017 15:22:24 -0400
> bfie...@fieldses.org (J. Bruce Fields) wrote:
> 
> > Mainly I'd just like to know which you're asking for.  Do you want me to
> > apply this, or to ACK it so someone else can?  If it's sent as a series
> > I tend to assume the latter.
> > 
> > But in this case I'm assuming it's the former, so I'll pick up the nfsd
> > one
> Could you to apply the NFSD patch ("nfsd: use ARRAY_SIZE") with the
> Reviewed-by: Jeff Layton <jlay...@redhat.com>) tag please ?
> 
> This patch is an individual patch and it should not have been sent in a
> series (sorry about that).

Applying for 4.15, thanks.--b.

Re: [PATCH 00/18] use ARRAY_SIZE macro

2017-10-02 Thread J. Bruce Fields

On Mon, Oct 02, 2017 at 07:35:54AM +0200, Greg KH wrote:
> On Sun, Oct 01, 2017 at 08:52:20PM -0400, Jérémy Lefaure wrote:
> > On Mon, 2 Oct 2017 09:01:31 +1100
> > "Tobin C. Harding"  wrote:
> > 
> > > > In order to reduce the size of the To: and Cc: lines, each patch of the
> > > > series is sent only to the maintainers and lists concerned by the patch.
> > > > This cover letter is sent to every list concerned by this series.  
> > > 
> > > Why don't you just send individual patches for each subsystem? I'm not a 
> > > maintainer but I don't see
> > > how any one person is going to be able to apply this whole series, it is 
> > > making it hard for
> > > maintainers if they have to pick patches out from among the series (if 
> > > indeed any will bother
> > > doing that).
> > Yeah, maybe it would have been better to send individual patches.
> > 
> > From my point of view it's a series because the patches are related (I
> > did a git format-patch from my local branch). But for the maintainers
> > point of view, they are individual patches.
> 
> And the maintainers view is what matters here, if you wish to get your
> patches reviewed and accepted...

Mainly I'd just like to know which you're asking for.  Do you want me to
apply this, or to ACK it so someone else can?  If it's sent as a series
I tend to assume the latter.

But in this case I'm assuming it's the former, so I'll pick up the nfsd
one

--b.

Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception

2017-08-25 Thread J. Bruce Fields

On Fri, Aug 18, 2017 at 06:00:47AM -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ 
> iterations.

By the way, is there anything special about your setup that allows you
to reproduce this?  There's nothing special about connectathon tests, so
I'm just wondering why we haven't had a lot of reports of this.

--b.

> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual 
> address 
> [ 6708.647093] pgd = 094e
> [ 6708.650497] [] *pgd=01090003, *pud=01090003, 
> *pmd=01080003, *pte=
> [ 6708.660761] Internal error: Oops: 8605 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log 
> nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay 
> xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 
> xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag 
> udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm 
> libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp 
> scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u 
> nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat 
> ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac 
> i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
> xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea 
> sysfillrect sysimgblt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder 
> mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: 
> stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: GW  OE   
> 4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 
> 2017
> [ 6708.767910] task: 810006842e80 task.stack: 81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<>] lr : [] pstate: 
> 6145
> [ 6708.789248] sp : 810ffbad3900
> [ 6708.792551] x29: 810ffbad3900 x28: 08c73d58
> [ 6708.797853] x27:  x26: 81000bbe1e00
> [ 6708.803156] x25: 0020 x24: 800f7410bf28
> [ 6708.808458] x23: 08c63000 x22: 08c63000
> [ 6708.813760] x21: 800f7410bf28 x20: 81000bbe1e00
> [ 6708.819063] x19: 810012412400 x18: d82a9df2
> [ 6708.824365] x17:  x16: 
> [ 6708.829667] x15:  x14: 0001
> [ 6708.834969] x13:  x12: 722e736f622e676e
> [ 6708.840271] x11: f814dd99 x10: 
> [ 6708.845573] x9 : 737468722500 x8 : 
> [ 6708.850875] x7 :  x6 : 
> [ 6708.856177] x5 : 0028 x4 : 
> [ 6708.861479] x3 :  x2 : e500
> [ 6708.866781] x1 :  x0 : 81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0x81000689c000)
> [ 6708.880341] Stack: (0x810ffbad3900 to 0x8100068a)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0x810ffbad3710 to 0x810ffbad3840)
> [ 6708.894942] 3700:   810012412400 
> 0001
> [ 6708.902759] 3720: 810ffbad3900  6145 
> 800f7930
> [ 6708.910577] 3740: 09274d00 03ea 0015 
> 08c63000
> [ 6708.918395] 3760: 810ffbad3830 800f7930 004d 
> 
> [ 6708.926212] 3780: 810ffbad3890 080f88dc 800f7930 
> 004d
> [ 6708.934030] 37a0: 800f7930093c 08c63000  
> 0140
> [ 6708.941848] 37c0: 08c2c000 00040b00 81000bbe1e00 
> 
> [ 6708.949665] 37e0: e500   
> 0028
> [ 6708.957483] 3800:    
> 737468722500
> [ 6708.965300] 3820:  f814dd99 722e736f622e676e 
> 
> [ 6708.973117] [<  (null)>]   (null)
> [ 6708.977824] [] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] []

Re: [PATCH v4] net: sunrpc: svcsock: fix NULL-pointer exception

2017-08-24 Thread J. Bruce Fields

On Wed, Aug 23, 2017 at 06:33:42AM -0400, Jeff Layton wrote:
> I think this one looks fine. Bruce, do you mind picking this one up? It
> might also be reasonable for stable...

Yep, thanks to you both, I'll plan to send a pull request tonight or
tomorrow.

--b.

Re: [PATCH][net-next] svcrdma: fix an incorrect check on -E2BIG and -EINVAL

2017-07-13 Thread J. Bruce Fields

On Thu, Jul 13, 2017 at 01:53:10PM -0400, Chuck Lever wrote:
> 
> > On Jul 13, 2017, at 1:51 PM, Colin King  wrote:
> > 
> > From: Colin Ian King 
> > 
> > The current check will always be true and will always jump to
> > err1, this looks dubious to me. I believe && should be used
> > instead of ||.
> > 
> > Detected by CoverityScan, CID#1450120 ("Logically Dead Code")
> > 
> > Fixes: 107c1d0a991a ("svcrdma: Avoid Send Queue overflow")
> > Signed-off-by: Colin Ian King 
> 
> Reviewed-by: Chuck Lever 
> 
> Dan reported this today, and I have a similar patch in my
> "pending for-rc" series. This one works too.

Applied and merged upstream, thanks.

--b.

Re: [PATCH] virtio_net: lower limit on buffer size

2017-06-05 Thread J. Bruce Fields

On Sat, Jun 03, 2017 at 11:17:30PM +0300, Sergei Shtylyov wrote:
> On 06/02/2017 11:25 PM, J. Bruce Fields wrote:
> 
> >>>commit d85b758f72b0 "virtio_net: fix support for small rings"
> >>
> >>   Commit d85b758f72b0 ("virtio_net: fix support for small rings")
> >>
> >>>was supposed to increase the buffer size for small rings
> >>>but had an unintentional side effect of decreasing
> >>>it for large rings. This seems to break some setups -
> >>>it's not yet clear why, but increasing buffer size
> >>>back to what it was before helps.
> >>>
> >>>Fixes: d85b758f72b0 "virtio_net: fix support for small rings"
> >>
> >>Fixes: d85b758f72b0 ("virtio_net: fix support for small rings")
> >
> >I may be bikeshedding, but, personally I never do the parens--they're
> >redundant given the quotes, and space is often tight.
> 
>Just see Documetation/process/submitting-patches.rst.

Yeah, I know, I claim it's a bad rule (but I'm too lazy to send a patch,
so, weight my opinion accordingly).

--b.

Re: [PATCH] virtio_net: lower limit on buffer size

2017-06-02 Thread J. Bruce Fields

On Fri, Jun 02, 2017 at 12:34:57PM +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 6/2/2017 2:56 AM, Michael S. Tsirkin wrote:
> 
> >commit d85b758f72b0 "virtio_net: fix support for small rings"
> 
>Commit d85b758f72b0 ("virtio_net: fix support for small rings")
> 
> >was supposed to increase the buffer size for small rings
> >but had an unintentional side effect of decreasing
> >it for large rings. This seems to break some setups -
> >it's not yet clear why, but increasing buffer size
> >back to what it was before helps.
> >
> >Fixes: d85b758f72b0 "virtio_net: fix support for small rings"
> 
> Fixes: d85b758f72b0 ("virtio_net: fix support for small rings")

I may be bikeshedding, but, personally I never do the parens--they're
redundant given the quotes, and space is often tight.

--b.

> 
> >Reported-by: Mikulas Patocka <mpato...@redhat.com>
> >Reported-by: "J. Bruce Fields" <bfie...@fieldses.org>
> >Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> [...]
> 
> MBR, Sergei

Re: [PATCH] virtio_net: lower limit on buffer size

2017-06-01 Thread J. Bruce Fields

On Fri, Jun 02, 2017 at 02:56:04AM +0300, Michael S. Tsirkin wrote:
> commit d85b758f72b0 "virtio_net: fix support for small rings"
> was supposed to increase the buffer size for small rings
> but had an unintentional side effect of decreasing
> it for large rings. This seems to break some setups -
> it's not yet clear why, but increasing buffer size
> back to what it was before helps.
> 
> Fixes: d85b758f72b0 "virtio_net: fix support for small rings"
> Reported-by: Mikulas Patocka <mpato...@redhat.com>
> Reported-by: "J. Bruce Fields" <bfie...@fieldses.org>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> ---
> 
> Can reporters please confirm whether the following helps?

Works for me.

--b.

> If it does I think we should queue it even though I expect I'll need to
> investigate the exact reasons for the failure down the road - probably
> a hypervisor bug.
> 
>  drivers/net/virtio_net.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87b5c20..60abb5d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -842,7 +842,7 @@ static unsigned int get_mergeable_buf_len(struct 
> receive_queue *rq,
>   unsigned int len;
>  
>   len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> - rq->min_buf_len - hdr_len, PAGE_SIZE - hdr_len);
> + rq->min_buf_len, PAGE_SIZE - hdr_len);
>   return ALIGN(len, L1_CACHE_BYTES);
>  }
>  
> @@ -2039,7 +2039,8 @@ static unsigned int mergeable_min_buf_len(struct 
> virtnet_info *vi, struct virtqu
>   unsigned int buf_len = hdr_len + ETH_HLEN + VLAN_HLEN + packet_len;
>   unsigned int min_buf_len = DIV_ROUND_UP(buf_len, rq_size);
>  
> - return max(min_buf_len, hdr_len);
> + return max(max(min_buf_len, hdr_len) - hdr_len,
> +(unsigned int)GOOD_PACKET_LEN);
>  }
>  
>  static int virtnet_find_vqs(struct virtnet_info *vi)
> -- 
> MST

Re: [RFD] workqueue: WQ_MEM_RECLAIM usage in network drivers

2016-03-19 Thread J. Bruce Fields

On Fri, Mar 18, 2016 at 04:46:23PM -0400, Tejun Heo wrote:
> Hello, Jeff.
> 
> On Thu, Mar 17, 2016 at 09:32:16PM -0400, Jeff Layton wrote:
> > > * Are network devices expected to be able to serve as a part of
> > >   storage stack which is depended upon for memory reclamation?
> > 
> > I think they should be. Cached NFS pages can consume a lot of memory,
> > and flushing them generally takes network device access.
> 
> But does that actually work?  It's pointless to add WQ_MEM_RECLAIM to
> workqueues unless all other things are also guaranteed to make forward
> progress regardless of memory pressure.

It's supposed to work.

Also note there was a bunch of work done to allow swap on NFS: see
a564b8f0 "nfs: enable swap on NFS".

> 
> > > * If so, are all the pieces in place for that to work for all (or at
> > >   least most) network devices?  If it's only for a subset of NICs, how
> > >   can one tell whether a given driver needs forward progress guarantee
> > >   or not?
> > > 
> > > * I assume that wireless drivers aren't and can't be used in this
> > >   fashion.  Is that a correction assumption?
> > > 
> > 
> > People do mount NFS over wireless interfaces. It's not terribly common
> > though, in my experience.
> 
> Ditto, I'm very skeptical that this actually works in practice and
> people expect and depend on it.  I don't follow wireless development
> closely but haven't heard anyone talking about reserving memory pools
> or people complaining about wireless being the cause of OOM.
> 
> So, I really want to avoid spraying WQ_MEM_RECLAIM if it doesn't serve
> actual purposes.  It's wasteful, sets bad precedences and confuses
> future readers.

I use NFS mounts over wifi at home.  I may just be odd.  I seem to
recall some bug reports about suspend vs. NFS--were those also on
laptops using NFS?

I wonder if home media centers might do writes over wifi to network
storage?

Googling for "nfs wifi" turns up lots of individuals doing this.

My first impulse is to say that it's probably not perfect but that we
shouldn't make it worse.

But, I don't claim to understand the WQ_MEM_RECLAIM-proliferation issue
you're seeing

--b.

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-23 Thread J. Bruce Fields

On Fri, Oct 23, 2015 at 04:14:10AM +, Kosuke Tatsukawa wrote:
> J. Bruce Fields wrote:
> > On Fri, Oct 16, 2015 at 02:28:10AM +, Kosuke Tatsukawa wrote:
> >> Tatsukawa Kosuke wrote:
> >> > J. Bruce Fields wrote:
> >> >> On Thu, Oct 15, 2015 at 11:44:20AM +, Kosuke Tatsukawa wrote:
> >> >>> Tatsukawa Kosuke wrote:
> >> >>> > J. Bruce Fields wrote:
> >> >>> >> Thanks for the detailed investigation.
> >> >>> >> 
> >> >>> >> I think it would be worth adding a comment if that might help 
> >> >>> >> someone
> >> >>> >> having to reinvestigate this again some day.
> >> >>> > 
> >> >>> > It would be nice, but I find it difficult to write a comment in the
> >> >>> > sunrpc layer why a memory barrier isn't necessary, using the 
> >> >>> > knowledge
> >> >>> > of how nfsd uses it, and the current implementation of the network 
> >> >>> > code.
> >> >>> > 
> >> >>> > Personally, I would prefer removing the call to waitqueue_active() 
> >> >>> > which
> >> >>> > would make the memory barrier totally unnecessary at the cost of a
> >> >>> > spin_lock + spin_unlock by unconditionally calling
> >> >>> > wake_up_interruptible.
> >> >>> 
> >> >>> On second thought, the callbacks will be called frequently from the tcp
> >> >>> code, so it wouldn't be a good idea.
> >> >> 
> >> >> So, I was even considering documenting it like this, if it's not
> >> >> overkill.
> >> >> 
> >> >> Hmm... but if this is right, then we may as well ask why we're doing the
> >> >> wakeups at all.  Might be educational to test the code with them
> >> >> removed.
> >> > 
> >> > sk_write_space will be called in sock_wfree() with UDP/IP each time
> >> > kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
> >> > SOCK_NOSPACE has been set.
> >> > 
> >> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
> >> > and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
> >> > barrier with sk_data_ready called right after __skb_queue_tail().
> >> > I think this hasn't caused any problems because sk_data_ready wasn't
> >> > used.
> >> 
> >> Actually, svc_udp_data_ready() calls set_bit() which is an atomic
> >> operation.  So there won't be a problem unless svsk is NULL.
> > 
> > So is it true that every caller of these socket callbacks has adequate
> > memory barriers between the time the change is made visible and the time
> > the callback is called?
> > 
> > If so, then there's nothing really specific about nfsd here.
> > 
> > In that case maybe it's the networking code that use some documentation,
> > if it doesn't already?  (Or maybe common helper functions for this
> > 
> > if (waitqueue_active(wq))
> > wake_up(wq)
> > 
> > pattern?)
> 
> Some of the other places defining these callback functions are using
>   static inline bool wq_has_sleeper(struct socket_wq *wq)
> defined in include/net/sock.h
> 
> The comment above the function explains that it was introduced for
> exactly this purpose.
> 
> Even thought the argument variable uses the same name "wq", it has a
> different type from the wq used in svcsock.c (struct socket_wq *
> vs. wait_queue_head_t *).

OK, thanks.  So, I guess it still sounds like the code is OK as is, but
maybe my comment wasn't.  Here's another attempt.

--b.

commit b805ca58a81a
Author: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
Date:   Fri Oct 9 01:44:07 2015 +

svcrpc: document lack of some memory barriers

We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
expect them.  But it doesn't appear they're necessary in our case, and
this is likely a hot path--for now just document the odd behavior.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
[bfie...@redhat.com,nfbr

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-22 Thread J. Bruce Fields

On Fri, Oct 16, 2015 at 02:28:10AM +, Kosuke Tatsukawa wrote:
> Tatsukawa Kosuke wrote:
> > J. Bruce Fields wrote:
> >> On Thu, Oct 15, 2015 at 11:44:20AM +, Kosuke Tatsukawa wrote:
> >>> Tatsukawa Kosuke wrote:
> >>> > J. Bruce Fields wrote:
> >>> >> Thanks for the detailed investigation.
> >>> >> 
> >>> >> I think it would be worth adding a comment if that might help someone
> >>> >> having to reinvestigate this again some day.
> >>> > 
> >>> > It would be nice, but I find it difficult to write a comment in the
> >>> > sunrpc layer why a memory barrier isn't necessary, using the knowledge
> >>> > of how nfsd uses it, and the current implementation of the network code.
> >>> > 
> >>> > Personally, I would prefer removing the call to waitqueue_active() which
> >>> > would make the memory barrier totally unnecessary at the cost of a
> >>> > spin_lock + spin_unlock by unconditionally calling
> >>> > wake_up_interruptible.
> >>> 
> >>> On second thought, the callbacks will be called frequently from the tcp
> >>> code, so it wouldn't be a good idea.
> >> 
> >> So, I was even considering documenting it like this, if it's not
> >> overkill.
> >> 
> >> Hmm... but if this is right, then we may as well ask why we're doing the
> >> wakeups at all.  Might be educational to test the code with them
> >> removed.
> > 
> > sk_write_space will be called in sock_wfree() with UDP/IP each time
> > kfree_skb() is called.  With TCP/IP, sk_write_space is only called if
> > SOCK_NOSPACE has been set.
> > 
> > sk_data_ready will be called in both tcp_rcv_established() for TCP/IP
> > and in sock_queue_rcv_skb() for UDP/IP.  The latter lacks a memory
> > barrier with sk_data_ready called right after __skb_queue_tail().
> > I think this hasn't caused any problems because sk_data_ready wasn't
> > used.
> 
> Actually, svc_udp_data_ready() calls set_bit() which is an atomic
> operation.  So there won't be a problem unless svsk is NULL.

So is it true that every caller of these socket callbacks has adequate
memory barriers between the time the change is made visible and the time
the callback is called?

If so, then there's nothing really specific about nfsd here.

In that case maybe it's the networking code that use some documentation,
if it doesn't already?  (Or maybe common helper functions for this

if (waitqueue_active(wq))
wake_up(wq)

pattern?)

--b.

> 
> 
> >> --b.
> >> 
> >> commit 0882cfeb39e0
> >> Author: J. Bruce Fields <bfie...@redhat.com>
> >> Date:   Thu Oct 15 16:53:41 2015 -0400
> >> 
> >> svcrpc: document lack of some memory barriers.
> >> 
> >> Kosuke Tatsukawa points out an odd lack of memory barriers in some 
> >> sites
> >> here.  I think the code's correct, but it's probably worth documenting.
> >> 
> >> Reported-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>
> >> 
> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> >> index 856407fa085e..90480993ec4a 100644
> >> --- a/net/sunrpc/svcsock.c
> >> +++ b/net/sunrpc/svcsock.c
> >> @@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst 
> >> *rqstp)
> >>return svc_port_is_privileged(svc_addr(rqstp));
> >>  }
> >>  
> >> +static void svc_no_smp_mb(void)
> >> +{
> >> +  /*
> >> +   * Kosuke Tatsukawa points out there should normally be an
> >> +   * smp_mb() at the callsites of this function.  (Either that or
> >> +   * we could just drop the waitqueue_active() checks.)
> >> +   *
> >> +   * It appears they aren't currently necessary, though, basically
> >> +   * because nfsd does non-blocking reads from these sockets, so
> >> +   * the only places we wait on this waitqueue is in sendpage and
> >> +   * sendmsg, which won't be waiting for wakeups on newly arrived
> >> +   * data.
> >> +   *
> >> +   * Maybe we should add the memory barriers anyway, but these are
> >> +   * hot paths so we'd need to be convinced there's no sigificant
> >> +   * penalty.
> >> +   */
> >> +}
> >> +
> >>  /*
> >>   * INET callback when data has been received on the socket.
> >>   */
> >> @@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk)
> >

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-15 Thread J. Bruce Fields

On Thu, Oct 15, 2015 at 11:44:20AM +, Kosuke Tatsukawa wrote:
> Tatsukawa Kosuke wrote:
> > J. Bruce Fields wrote:
> >> Thanks for the detailed investigation.
> >> 
> >> I think it would be worth adding a comment if that might help someone
> >> having to reinvestigate this again some day.
> > 
> > It would be nice, but I find it difficult to write a comment in the
> > sunrpc layer why a memory barrier isn't necessary, using the knowledge
> > of how nfsd uses it, and the current implementation of the network code.
> > 
> > Personally, I would prefer removing the call to waitqueue_active() which
> > would make the memory barrier totally unnecessary at the cost of a
> > spin_lock + spin_unlock by unconditionally calling
> > wake_up_interruptible.
> 
> On second thought, the callbacks will be called frequently from the tcp
> code, so it wouldn't be a good idea.

So, I was even considering documenting it like this, if it's not
overkill.

Hmm... but if this is right, then we may as well ask why we're doing the
wakeups at all.  Might be educational to test the code with them
removed.

--b.

commit 0882cfeb39e0
Author: J. Bruce Fields <bfie...@redhat.com>
Date:   Thu Oct 15 16:53:41 2015 -0400

svcrpc: document lack of some memory barriers.

Kosuke Tatsukawa points out an odd lack of memory barriers in some sites
here.  I think the code's correct, but it's probably worth documenting.

Reported-by: Kosuke Tatsukawa <ta...@ab.jp.nec.com>

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 856407fa085e..90480993ec4a 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -399,6 +399,25 @@ static int svc_sock_secure_port(struct svc_rqst *rqstp)
return svc_port_is_privileged(svc_addr(rqstp));
 }
 
+static void svc_no_smp_mb(void)
+{
+   /*
+* Kosuke Tatsukawa points out there should normally be an
+* smp_mb() at the callsites of this function.  (Either that or
+* we could just drop the waitqueue_active() checks.)
+*
+* It appears they aren't currently necessary, though, basically
+* because nfsd does non-blocking reads from these sockets, so
+* the only places we wait on this waitqueue is in sendpage and
+* sendmsg, which won't be waiting for wakeups on newly arrived
+* data.
+*
+* Maybe we should add the memory barriers anyway, but these are
+* hot paths so we'd need to be convinced there's no sigificant
+* penalty.
+*/
+}
+
 /*
  * INET callback when data has been received on the socket.
  */
@@ -414,7 +433,7 @@ static void svc_udp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
@@ -433,7 +452,7 @@ static void svc_write_space(struct sock *sk)
svc_xprt_enqueue(>sk_xprt);
}
 
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq)) {
dprintk("RPC svc_write_space: someone sleeping on %p\n",
   svsk);
@@ -789,7 +808,7 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
}
 
wq = sk_sleep(sk);
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible_all(wq);
 }
@@ -811,7 +830,7 @@ static void svc_tcp_state_change(struct sock *sk)
set_bit(XPT_CLOSE, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible_all(wq);
 }
@@ -827,7 +846,7 @@ static void svc_tcp_data_ready(struct sock *sk)
set_bit(XPT_DATA, >sk_xprt.xpt_flags);
svc_xprt_enqueue(>sk_xprt);
}
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
@@ -1599,7 +1618,7 @@ static void svc_sock_detach(struct svc_xprt *xprt)
sk->sk_write_space = svsk->sk_owspace;
 
wq = sk_sleep(sk);
-   smp_mb();
+   svc_no_smp_mb();
if (wq && waitqueue_active(wq))
wake_up_interruptible(wq);
 }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-14 Thread J. Bruce Fields

On Wed, Oct 14, 2015 at 03:57:13AM +, Kosuke Tatsukawa wrote:
> J. Bruce Fields wrote:
> > On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote:
> >> J. Bruce Fields wrote:
> >> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
> >> >> Neil Brown wrote:
> >> >> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
> >> >> > 
> >> >> >> There are several places in net/sunrpc/svcsock.c which calls
> >> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
> >> >> >> barrier just as in wq_has_sleeper().
> >> >> >>
> >> >> >> I found this issue when I was looking through the linux source code
> >> >> >> for places calling waitqueue_active() before wake_up*(), but without
> >> >> >> preceding memory barriers, after sending a patch to fix a similar
> >> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue can 
> >> >> >> be
> >> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
> >> >> > 
> >> >> > hi,
> >> >> > this feels like the wrong approach to the problem.  It requires extra
> >> >> > 'smb_mb's to be spread around which are hard to understand as easy to
> >> >> > forget.
> >> >> > 
> >> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
> >> >> > will need an smb_mb.  Could we just put the smb_mb() inside
> >> >> > waitqueue_active()??
> >> >> 
> >> >> 
> >> >> There are around 200 occurrences of waitqueue_active() in the kernel
> >> >> source, and most of the places which use it before wake_up are either
> >> >> protected by some spin lock, or already has a memory barrier or some
> >> >> kind of atomic operation before it.
> >> >> 
> >> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
> >> >> many cases and won't be a good idea.
> >> >> 
> >> >> Another way to solve this problem is to remove the waitqueue_active(),
> >> >> making the code look like this;
> >> >> if (wq)
> >> >> wake_up_interruptible(wq);
> >> >> This also fixes the problem because the spinlock in the wake_up*() acts
> >> >> as a memory barrier and prevents the code from being reordered by the
> >> >> CPU (and it also makes the resulting code is much simpler).
> >> > 
> >> > I might not care which we did, except I don't have the means to test
> >> > this quickly, and I guess this is some of our most frequently called
> >> > code.
> >> > 
> >> > I suppose your patch is the most conservative approach, as the
> >> > alternative is a spinlock/unlock in wake_up_interruptible, which I
> >> > assume is necessarily more expensive than an smp_mb().
> >> > 
> >> > As far as I can tell it's been this way since forever.  (Well, since a
> >> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
> >> > removed some spinlocks from the data_ready routines.)
> >> > 
> >> > I don't understand what the actual race is yet (which code exactly is
> >> > missing the wakeup in this case?  nfsd threads seem to instead get
> >> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
> >> 
> >> Thank you for the reply.  I tried looking into this.
> >> 
> >> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
> >> svc_udp_init(), which are both called from svc_setup_socket().
> >> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
> >> callback port related code.
> >> 
> >> Maybe I'm wrong, but there might not be any kernel code that is using
> >> the socket's wait queue in this case.
> > 
> > As Trond points out there are probably waiters internal to the
> > networking code.
> 
> Trond and Bruce, thank you for the comment.  I was able to find the call
> to the wait function that was called from nfsd.
> 
> sk_stream_wait_connect() and sk_stream_wait_memory() were called from
> either do_tcp_sendpages() or tcp_sendmsg() called from within
> svc_send().  sk_stream_wait_connect() shouldn't be called at this point,
> because the soc

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-12 Thread J. Bruce Fields

On Mon, Oct 12, 2015 at 10:41:06AM +, Kosuke Tatsukawa wrote:
> J. Bruce Fields wrote:
> > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
> >> Neil Brown wrote:
> >> > Kosuke Tatsukawa <ta...@ab.jp.nec.com> writes:
> >> > 
> >> >> There are several places in net/sunrpc/svcsock.c which calls
> >> >> waitqueue_active() without calling a memory barrier.  Add a memory
> >> >> barrier just as in wq_has_sleeper().
> >> >>
> >> >> I found this issue when I was looking through the linux source code
> >> >> for places calling waitqueue_active() before wake_up*(), but without
> >> >> preceding memory barriers, after sending a patch to fix a similar
> >> >> issue in drivers/tty/n_tty.c  (Details about the original issue can be
> >> >> found here: https://lkml.org/lkml/2015/9/28/849).
> >> > 
> >> > hi,
> >> > this feels like the wrong approach to the problem.  It requires extra
> >> > 'smb_mb's to be spread around which are hard to understand as easy to
> >> > forget.
> >> > 
> >> > A quick look seems to suggest that (nearly) every waitqueue_active()
> >> > will need an smb_mb.  Could we just put the smb_mb() inside
> >> > waitqueue_active()??
> >> 
> >> 
> >> There are around 200 occurrences of waitqueue_active() in the kernel
> >> source, and most of the places which use it before wake_up are either
> >> protected by some spin lock, or already has a memory barrier or some
> >> kind of atomic operation before it.
> >> 
> >> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
> >> many cases and won't be a good idea.
> >> 
> >> Another way to solve this problem is to remove the waitqueue_active(),
> >> making the code look like this;
> >>if (wq)
> >>wake_up_interruptible(wq);
> >> This also fixes the problem because the spinlock in the wake_up*() acts
> >> as a memory barrier and prevents the code from being reordered by the
> >> CPU (and it also makes the resulting code is much simpler).
> > 
> > I might not care which we did, except I don't have the means to test
> > this quickly, and I guess this is some of our most frequently called
> > code.
> > 
> > I suppose your patch is the most conservative approach, as the
> > alternative is a spinlock/unlock in wake_up_interruptible, which I
> > assume is necessarily more expensive than an smp_mb().
> > 
> > As far as I can tell it's been this way since forever.  (Well, since a
> > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
> > removed some spinlocks from the data_ready routines.)
> > 
> > I don't understand what the actual race is yet (which code exactly is
> > missing the wakeup in this case?  nfsd threads seem to instead get
> > woken up by the wake_up_process() in svc_xprt_do_enqueue().)
> 
> Thank you for the reply.  I tried looking into this.
> 
> The callbacks in net/sunrpc/svcsock.c are set up in svc_tcp_init() and
> svc_udp_init(), which are both called from svc_setup_socket().
> svc_setup_socket() is called (indirectly) from lockd, nfsd, and nfsv4
> callback port related code.
> 
> Maybe I'm wrong, but there might not be any kernel code that is using
> the socket's wait queue in this case.

As Trond points out there are probably waiters internal to the
networking code.

--b.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc

2015-10-09 Thread J. Bruce Fields

On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote:
> Neil Brown wrote:
> > Kosuke Tatsukawa  writes:
> > 
> >> There are several places in net/sunrpc/svcsock.c which calls
> >> waitqueue_active() without calling a memory barrier.  Add a memory
> >> barrier just as in wq_has_sleeper().
> >>
> >> I found this issue when I was looking through the linux source code
> >> for places calling waitqueue_active() before wake_up*(), but without
> >> preceding memory barriers, after sending a patch to fix a similar
> >> issue in drivers/tty/n_tty.c  (Details about the original issue can be
> >> found here: https://lkml.org/lkml/2015/9/28/849).
> > 
> > hi,
> > this feels like the wrong approach to the problem.  It requires extra
> > 'smb_mb's to be spread around which are hard to understand as easy to
> > forget.
> > 
> > A quick look seems to suggest that (nearly) every waitqueue_active()
> > will need an smb_mb.  Could we just put the smb_mb() inside
> > waitqueue_active()??
> 
> 
> There are around 200 occurrences of waitqueue_active() in the kernel
> source, and most of the places which use it before wake_up are either
> protected by some spin lock, or already has a memory barrier or some
> kind of atomic operation before it.
> 
> Simply adding smp_mb() to waitqueue_active() would incur extra cost in
> many cases and won't be a good idea.
> 
> Another way to solve this problem is to remove the waitqueue_active(),
> making the code look like this;
>   if (wq)
>   wake_up_interruptible(wq);
> This also fixes the problem because the spinlock in the wake_up*() acts
> as a memory barrier and prevents the code from being reordered by the
> CPU (and it also makes the resulting code is much simpler).

I might not care which we did, except I don't have the means to test
this quickly, and I guess this is some of our most frequently called
code.

I suppose your patch is the most conservative approach, as the
alternative is a spinlock/unlock in wake_up_interruptible, which I
assume is necessarily more expensive than an smp_mb().

As far as I can tell it's been this way since forever.  (Well, since a
2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which
removed some spinlocks from the data_ready routines.)

I don't understand what the actual race is yet (which code exactly is
missing the wakeup in this case?  nfsd threads seem to instead get
woken up by the wake_up_process() in svc_xprt_do_enqueue().)

--b.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] sunrpc: avoid warning in gss_key_timeout

2015-10-09 Thread J. Bruce Fields

On Fri, Oct 09, 2015 at 04:13:45PM +0200, Arnd Bergmann wrote:
> The gss_key_timeout() function causes a harmless warning in some
> configurations, e.g. ARM imx_v6_v7_defconfig with gcc-5.2, if the
> compiler cannot figure out the state of the 'expire' variable across
> an rcu_read_unlock():
> 
> net/sunrpc/auth_gss/auth_gss.c: In function 'gss_key_timeout':
> net/sunrpc/auth_gss/auth_gss.c:1422:211: warning: 'expire' may be used 
> uninitialized in this function [-Wmaybe-uninitialized]
> 
> To avoid this warning without adding a bogus initialization, this
> rewrites the function so the comparison is done inside of the
> critical section. As a side-effect, it also becomes slightly
> easier to understand because the implementation now more closely
> resembles the comment above it.

Looks reasonable, thanks; applying for 4.4--b.

> 
> Signed-off-by: Arnd Bergmann 
> Fixes: c5e6aecd034e7 ("sunrpc: fix RCU handling of gc_ctx field")
> 
> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
> index dace13d7638e..799e65b944b9 100644
> --- a/net/sunrpc/auth_gss/auth_gss.c
> +++ b/net/sunrpc/auth_gss/auth_gss.c
> @@ -1411,17 +1411,16 @@ gss_key_timeout(struct rpc_cred *rc)
>  {
>   struct gss_cred *gss_cred = container_of(rc, struct gss_cred, gc_base);
>   struct gss_cl_ctx *ctx;
> - unsigned long now = jiffies;
> - unsigned long expire;
> + unsigned long timeout = jiffies + (gss_key_expire_timeo * HZ);
> + int ret = 0;
>  
>   rcu_read_lock();
>   ctx = rcu_dereference(gss_cred->gc_ctx);
> - if (ctx)
> - expire = ctx->gc_expiry - (gss_key_expire_timeo * HZ);
> + if (!ctx || time_after(timeout, ctx->gc_expiry))
> + ret = -EACCES;
>   rcu_read_unlock();
> - if (!ctx || time_after(now, expire))
> - return -EACCES;
> - return 0;
> +
> + return ret;
>  }
>  
>  static int
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 01/19] SUNRPC: fix variable type

2015-09-25 Thread J. Bruce Fields

ACK.  Assuming Trond gets this.--b.

On Thu, Sep 24, 2015 at 04:00:09PM +0200, Andrzej Hajda wrote:
> Due to incorrect len type bc_send_request returned always zero.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107
> 
> Signed-off-by: Andrzej Hajda 
> ---
> Hi,
> 
> To avoid problems with too many mail recipients I have sent whole
> patchset only to LKML. Anyway patches have no dependencies.
> 
> Regards
> Andrzej
> ---
>  net/sunrpc/xprtsock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 1a85e0e..60fb002 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2472,7 +2472,7 @@ static int bc_send_request(struct rpc_task *task)
>  {
>   struct rpc_rqst *req = task->tk_rqstp;
>   struct svc_xprt *xprt;
> - u32 len;
> + int len;
>  
>   dprintk("sending request with xid: %08x\n", ntohl(req->rq_xid));
>   /*
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/39] SUNRPC: drop null test before destroy functions

2015-09-14 Thread J. Bruce Fields

ACK, but assuming Trond takes this one.--b.

On Sun, Sep 13, 2015 at 02:15:07PM +0200, Julia Lawall wrote:
> Remove unneeded NULL test.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // 
> @@ expression x; @@
> -if (x != NULL)
>   \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
> // 
> 
> Signed-off-by: Julia Lawall 
> 
> ---
>  net/sunrpc/sched.c |   12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index b140c09..425ca2f 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -1092,14 +1092,10 @@ void
>  rpc_destroy_mempool(void)
>  {
>   rpciod_stop();
> - if (rpc_buffer_mempool)
> - mempool_destroy(rpc_buffer_mempool);
> - if (rpc_task_mempool)
> - mempool_destroy(rpc_task_mempool);
> - if (rpc_task_slabp)
> - kmem_cache_destroy(rpc_task_slabp);
> - if (rpc_buffer_slabp)
> - kmem_cache_destroy(rpc_buffer_slabp);
> + mempool_destroy(rpc_buffer_mempool);
> + mempool_destroy(rpc_task_mempool);
> + kmem_cache_destroy(rpc_task_slabp);
> + kmem_cache_destroy(rpc_buffer_slabp);
>   rpc_destroy_wait_queue(_queue);
>  }
>  
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1 net-next] sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()

2015-06-19 Thread J. Bruce Fields

Thanks, applying.--b.

On Tue, Jun 16, 2015 at 09:57:52PM +0200, Fabian Frederick wrote:
 Don't opencode sg_init_one()
 
 Signed-off-by: Fabian Frederick f...@skynet.be
 ---
  net/sunrpc/auth_gss/gss_krb5_crypto.c | 8 ++--
  1 file changed, 2 insertions(+), 6 deletions(-)
 
 diff --git a/net/sunrpc/auth_gss/gss_krb5_crypto.c 
 b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 index b5408e8..fee3c15 100644
 --- a/net/sunrpc/auth_gss/gss_krb5_crypto.c
 +++ b/net/sunrpc/auth_gss/gss_krb5_crypto.c
 @@ -881,9 +881,7 @@ krb5_rc4_setup_seq_key(struct krb5_ctx *kctx, struct 
 crypto_blkcipher *cipher,
   if (err)
   goto out_err;
  
 - sg_init_table(sg, 1);
 - sg_set_buf(sg, zeroconstant, 4);
 -
 + sg_init_one(sg, zeroconstant, 4);
   err = crypto_hash_digest(desc, sg, 4, Kseq);
   if (err)
   goto out_err;
 @@ -951,9 +949,7 @@ krb5_rc4_setup_enc_key(struct krb5_ctx *kctx, struct 
 crypto_blkcipher *cipher,
   if (err)
   goto out_err;
  
 - sg_init_table(sg, 1);
 - sg_set_buf(sg, zeroconstant, 4);
 -
 + sg_init_one(sg, zeroconstant, 4);
   err = crypto_hash_digest(desc, sg, 4, Kcrypt);
   if (err)
   goto out_err;
 -- 
 2.4.2
--
To unsubscribe from this list: send the line unsubscribe netdev in

Re: [RFC 00/10] NFS: add AF_VSOCK support to NFS client

2015-06-10 Thread J. Bruce Fields

On Wed, Jun 10, 2015 at 05:43:15PM +0100, Stefan Hajnoczi wrote:
 On Mon, Jun 08, 2015 at 05:02:47PM -0400, J. Bruce Fields wrote:
  On Thu, Jun 04, 2015 at 05:45:43PM +0100, Stefan Hajnoczi wrote:
   The approach in this series
   ---
   AF_VSOCK stream sockets can be used for NFSv4.1 much in the same way as 
   TCP.
   RFC 1831 record fragments divide messages since SOCK_STREAM semantics are
   present.  The backchannel shares the connection just like the default TCP
   configuration.
  
  So the NFSv4 backchannel isn't handled for now, I assume.
 
 Right, I did not touch nfs4_callback_up_net(), only
 nfs41_callback_up_net().
 
 If I'm reading the code right NFSv4 uses a separate listen port for the
 backchannel instead of sharing the client's socket?

Right.

 This is possible to implement with AF_VSOCK but I have only tested
 NFSv4.1 so far.  Should I go ahead and do this?

Personally I'd make it a lower priority--I don't see why you can't make
4.1 a requirement for the new transport--but I'd be curious what others
have to say.

  And I guess
  NFSv2/v3 is out too thanks to rpcbind?  Which maybe is fine.
 
 Yes, I ignored rpcbind and didn't test NFSv2/v3.
 
  Do we need an IETF draft or similar to document how NFS should work over
  AF_VSOCK?
 
 I am not familiar with the standards process but I came across a few
 places where it makes sense to have a standard:
 
  * SUNRPC netid for AF_VSOCK (currently tcp, udp, and others exist)
  * The uaddr string format (vsock:...)

Off the top of my head I can't remember where else that's used in the
protocol other than in setting up the 4.0 callback connection (and in
rpcbind).

  * Use of RFC 1831 record fragments (just like TCP) over AF_VSOCK
SOCK_STREAM sockets

As far as I can tell, 1831 claims to be independent of any transport
protocol details: The RPC protocol can be implemented on several
different transport protocols.  The RPC protocol does not care how a
message is passed from one process to another, but only with
specification and interpretation of messages.  And: When RPC messages
are passed on top of a byte stream transport protocol (like TCP)
So perhaps there's nothing more to say here.

 These are all at the SUNRPC level rather than at the NFS protocol level.
 
 Any idea who I need to talk to?

Anyay, if there is anything to be worked out, nf...@ietf.org is the
place to go.

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC 00/10] NFS: add AF_VSOCK support to NFS client

2015-06-08 Thread J. Bruce Fields

On Thu, Jun 04, 2015 at 05:45:43PM +0100, Stefan Hajnoczi wrote:
 This patch series enables AF_VSOCK address family support in the NFS client.
 Please use the https://github.com/stefanha/linux.git vsock-nfs branch, which
 contains the dependencies for this series.
 
 The AF_VSOCK address family provides dgram and stream socket communication
 between virtual machines and hypervisors.  A VMware VMCI transport is 
 currently
 available in-tree (see net/vmw_vsock) and I have posted virtio-vsock patches
 for use with QEMU/KVM: http://thread.gmane.org/gmane.linux.network/365205
 
 The goal of this work is sharing files between virtual machines and
 hypervisors.  AF_VSOCK is well-suited to this because it requires no
 configuration inside the virtual machine, making it simple to manage and
 reliable.
 
 Why NFS over AF_VSOCK?
 --
 It is unusual to add a new NFS transport, only TCP, RDMA, and UDP are 
 currently
 supported.  Here is the rationale for adding AF_VSOCK.
 
 Sharing files with a virtual machine can be configured manually:
 1. Add a dedicated network card to the virtual machine.  It will be used for
NFS traffic.
 2. Configure a local subnet and assign IP addresses to the virtual machine and
hypervisor
 3. Configure an NFS export on the hypervisor and start the NFS server
 4. Mount the export inside the virtual machine
 
 Automating these steps poses a problem: modifying network configuration inside
 the virtual machine is invasive.  It's hard to add a network interface to an
 arbitrary running system in an automated fashion, considering the network
 management tools, firewall rules, IP address usage, etc.
 
 Furthermore, the user may disrupt file sharing by accident when they add
 firewall rules, restart networking, etc because the NFS network interface is
 visible alongside the network interfaces managed by the user.
 
 AF_VSOCK is a zero-configuration network transport that avoids these problems.
 Adding it to a virtual machine is non-invasive.  It also avoids accidental
 misconfiguration by the user.  This is why guest agents and other services 
 in
 various hypervisors (KVM, Xen, VMware, VirtualBox) do not use regular network
 interfaces.
 
 This is why AF_VSOCK is appropriate for providing shared files as a hypervisor
 service.
 
 The approach in this series
 ---
 AF_VSOCK stream sockets can be used for NFSv4.1 much in the same way as TCP.
 RFC 1831 record fragments divide messages since SOCK_STREAM semantics are
 present.  The backchannel shares the connection just like the default TCP
 configuration.

So the NFSv4 backchannel isn't handled for now, I assume.  And I guess
NFSv2/v3 is out too thanks to rpcbind?  Which maybe is fine.

Do we need an IETF draft or similar to document how NFS should work over
AF_VSOCK?

NFS developers rely heavily on wireshark (and similar tools) for
debugging.  Is that still possible over AF_VSOCK?

 Addresses are Context ID, Port Number pairs.  These patches use 
 vsock:cid
 string representation to distinguish AF_VSOCK addresses from IPv4 and IPv6
 numeric addresses.
 
 The patches cover the following areas:
 
 Patch 1 - support struct sockaddr_vm in sunrpc addr.h
 
 Patch 2-4 - make sunrpc TCP record fragment parser reusable for any stream
 socket
 
 Patch 5 - add tcp_read_sock()-like interface to AF_VSOCK sockets
 
 Patch 6 - extend sunrpc xprtsock.c for AF_VSOCK RPC clients
 
 Patch 7-9 - AF_VSOCK backchannel support
 
 Patch 10 - add AF_VSOCK support to NFS client
 
 The following example mounts /export from the hypervisor (CID 2) inside the
 virtual machine (CID 3):
 
   # /sbin/mount.nfs 2:/export /mnt -o clientaddr=3,proto=vsock
 
 Status
 --
 I am looking for feedback on this approach.  There are TODOs remaining in the 
 code.
 
 Hopefully the way I add AF_VSOCK support to sunrpc is reasonable and something
 that can be standardized (a netid assigned and the uaddr string format 
 decided).
 
 See below for the nfs-utils patch.  It can be made nice once glibc
 getnameinfo()/getaddrinfo() support AF_VSOCK.
 
 The vsock_read_sock() implementation is dumb.  Less of a NFS/SUNRPC issue and
 more of a vsock issue, but perhaps virtio_transport.c should use skbs for its
 receive queue instead of a custom packet struct.  That would eliminate memory
 allocation and copying in vsock_read_sock().
 
 The next step is tackling NFS server.  In the meantime, I have tested the
 patches using the nc-vsock netcat-like utility that is available in my Linux
 kernel repo below.

So by a netcat-like utility, you mean it's proxying between client and a
server so the client thinks the server is communicating over AF_VSOCK
and the server thinks the client is using TCP?  (Sorry, I haven't looked
at the code.)

Once we have a server and client, how will you recommend testing them?
(Will the server side need to run on real hardware?)

I guess if it works then the main question is whether it's worth
supporting another

Re: [git patches] net driver fixes

2008-02-20 Thread J. Bruce Fields

On Wed, Feb 20, 2008 at 01:15:30PM -0800, David Miller wrote:
 From: Jeff Garzik [EMAIL PROTECTED]
 Date: Wed, 20 Feb 2008 11:55:57 -0500

  Note:  this is based off of Linus's latest commit
  (5d9c4a7de64d398604a978d267a6987f1f4025b7), since all my previous
  submissions are now upstream (thanks!).

 The whole point of my not rebasing net-2.6 is so that you can always
 use it as a base.

 With what you've giving me now I either have to:

 1) Pull in Linus's tree to net-2.6, then pull from you.

 2) Pull in directly from you to get it all.

Why are either of those a problem?

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [git patches] net driver fixes

2008-02-20 Thread J. Bruce Fields

On Wed, Feb 20, 2008 at 01:42:57PM -0800, David Miller wrote:
 From: J. Bruce Fields [EMAIL PROTECTED]
 Date: Wed, 20 Feb 2008 16:23:02 -0500

  On Wed, Feb 20, 2008 at 01:15:30PM -0800, David Miller wrote:
   From: Jeff Garzik [EMAIL PROTECTED]
   Date: Wed, 20 Feb 2008 11:55:57 -0500

Note:  this is based off of Linus's latest commit
(5d9c4a7de64d398604a978d267a6987f1f4025b7), since all my previous
submissions are now upstream (thanks!).

   The whole point of my not rebasing net-2.6 is so that you can always
   use it as a base.

   With what you've giving me now I either have to:

   1) Pull in Linus's tree to net-2.6, then pull from you.

   2) Pull in directly from you to get it all.

  Why are either of those a problem?

 Because it forces me to pull Linus's upstream into net-2.6,
 I don't have any choice in the matter.

Right.  I'm wondering what the problems are that you see with that.

The advantages include earlier warning of merge problems, and avoidance
of duplicate commits--if Jeff's done work that depends on patches that
already upstream, then he either does that work against upstream, or
includes backported patches in the branch he asks you to pull, and you
end up with both the original and the backported patch.  Which isn't the
end of the world, but the resulting history seems messier than
necessary.

Or I guess you could both wait to do this merge until you're ready to
pull in Linus's latest?

For non-git-using testers there may be an advantage to always keeping a
tree based on the latest tagged release, as it may simplify providing
them with patches in some cases.  But if the goal is to provide a basis
for other maintainer's work, I'd've thought the best policy would be
just to track the tip of every relevant branch (which includes Linus's
in this case).

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPv6 support for NFS server

2008-01-18 Thread J. Bruce Fields

On Fri, Jan 18, 2008 at 03:50:56PM +0100, Aurélien Charbon wrote:
 OK Bruce I have added this comment before the patch.
 I have also done the changes pointed by Brian.
 Please let me know if there is still something to change.

Thanks.  For the future, if you could just make the comment part of the
actual git commit, that'll help produce a patch that can be fed to my
scripts with less hassle

Anyway, looks fine to me, applied.  (But I may wait till 2.6.26 to
submit.)

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPv6 support for NFS server

2008-01-17 Thread J. Bruce Fields

On Thu, Jan 17, 2008 at 06:04:35PM +0100, Aurélien Charbon wrote:
 Hi Bruce.

 Thanks for your comments.
 Here is the patch with some cleanups.

Thanks for the revisions.  We need to submit this with a patch comment
that:

- Explains more precisely what this does (fixes export
  interfaces to allow ipv6) and what remains to be done (?)
- Credits the folks (like Brian Haley) who have provided
  feedback.

I'll help clean up that comment if needed but please make sure it's
always included with the patch when you resend it.

--b.


 Regards,
 Aurélien


 -- 

 
   Aurelien Charbon
   Linux NFSv4 team
   Bull SAS
 Echirolles - France
 http://nfsv4.bullopensource.org/
 


 From 51755892e19186cd18230bac3f783b0382bf9ae0 Mon Sep 17 00:00:00 2001
 From: Aurelien Charbon [EMAIL PROTECTED]
 Date: Thu, 17 Jan 2008 14:55:03 +0100
 Subject: [PATCH 1/1] IPv6 support for NFS server
 
 ---
  fs/nfsd/export.c   |9 ++-
  fs/nfsd/nfsctl.c   |   15 +-
  include/linux/sunrpc/svcauth.h |4 +-
  include/net/ipv6.h |9 +++
  net/sunrpc/svcauth_unix.c  |  118 
 +++-
  5 files changed, 110 insertions(+), 45 deletions(-)
 
 diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
 index 66d0aeb..208db3a 100644
 --- a/fs/nfsd/export.c
 +++ b/fs/nfsd/export.c
 @@ -35,6 +35,7 @@
  #include linux/lockd/bind.h
  #include linux/sunrpc/msg_prot.h
  #include linux/sunrpc/gss_api.h
 +#include net/ipv6.h
  
  #define NFSDDBG_FACILITY NFSDDBG_EXPORT
  
 @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
  {
   struct auth_domain  *dom;
   int i, err;
 + struct in6_addr addr6;
  
   /* First, consistency check. */
   err = -EINVAL;
 @@ -1574,9 +1576,10 @@ exp_addclient(struct nfsctl_client *ncp)
   goto out_unlock;
  
   /* Insert client into hashtable. */
 - for (i = 0; i  ncp-cl_naddr; i++)
 - auth_unix_add_addr(ncp-cl_addrlist[i], dom);
 -
 + for (i = 0; i  ncp-cl_naddr; i++) {
 + ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, addr6);
 + auth_unix_add_addr(addr6, dom);
 + }
   auth_unix_forget_old(dom);
   auth_domain_put(dom);
  
 diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
 index 77dc989..13d6b6b 100644
 --- a/fs/nfsd/nfsctl.c
 +++ b/fs/nfsd/nfsctl.c
 @@ -37,6 +37,7 @@
  #include linux/nfsd/syscall.h
  
  #include asm/uaccess.h
 +#include net/ipv6.h
  
  /*
   *   We have a single directory with 9 nodes in it.
 @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, 
 size_t size)
   struct auth_domain *clp;
   int err = 0;
   struct knfsd_fh *res;
 + struct in6_addr in6;
  
   if (size  sizeof(*data))
   return -EINVAL;
 @@ -236,7 +238,11 @@ static ssize_t write_getfs(struct file *file, char *buf, 
 size_t size)
   res = (struct knfsd_fh*)buf;
  
   exp_readlock();
 - if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 + ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
 +
 + clp = auth_unix_lookup(in6);
 + if (!clp)
   err = -EPERM;
   else {
   err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
 @@ -257,6 +263,7 @@ static ssize_t write_getfd(struct file *file, char *buf, 
 size_t size)
   int err = 0;
   struct knfsd_fh fh;
   char *res;
 + struct in6_addr in6;
  
   if (size  sizeof(*data))
   return -EINVAL;
 @@ -271,7 +278,11 @@ static ssize_t write_getfd(struct file *file, char *buf, 
 size_t size)
   res = buf;
   sin = (struct sockaddr_in *)data-gd_addr;
   exp_readlock();
 - if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 + ipv6_addr_set_v4mapped(sin-sin_addr.s_addr,in6);
 +
 + clp = auth_unix_lookup(in6);
 + if (!clp)
   err = -EPERM;
   else {
   err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
 diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
 index 22e1ef8..9e6fb86 100644
 --- a/include/linux/sunrpc/svcauth.h
 +++ b/include/linux/sunrpc/svcauth.h
 @@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t 
 flavor);
  
  extern struct auth_domain *unix_domain_find(char *name);
  extern void auth_domain_put(struct auth_domain *item);
 -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
 +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain 
 *dom);
  extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain 
 *new);
  extern struct auth_domain *auth_domain_find(char *name);
 -extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
 +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
  extern int auth_unix_forget_old(struct auth_domain *dom);
  extern void

Re: [PATCH] IPv6 support for NFS server

2008-01-15 Thread J. Bruce Fields

On Tue, Dec 11, 2007 at 07:00:08PM +0100, Aurélien Charbon wrote:
 Brian Haley wrote:

 In an email back on October 29th I sent-out a similar patch with a new  
 ipv6_addr_set_v4mapped() inline - it might be useful to pull that  
 piece into your patch since it cleans it up a bit to get rid of the  
 ipv6_addr_set() calls.  I can re-send you that patch off-line if you  
 can't find it.

 -Brian


 Thanks Brian. I forgot to include your changes in my tree.
 OK Bruce you can take this one.

One trivial note: I'd prefer patches inline with the message, instead of
attached.  If you need to attach it, please add From:, Subject: and a
patch comment in the standard format.  Something like git-format-patch
will do all that stuff for you.

E.g. see below (also with a minor whitespace problem fixed--fun
scripts/checkpatch.pl before submitting and it'll catch that.)

--b.
From c19360e877cfa1576dce98792cd513665deaa2ec Mon Sep 17 00:00:00 2001
From: =?utf-8?q?Aur=C3=A9lien=20Charbon?= [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 16:44:46 +0100
Subject: [PATCH] IPv6 support for NFS server

Prepare for IPv6 text-based mounts and exports.

Tested with IPv4 network and basic nfs ops (mount, file creation and
modification), and with unmodified nfs-utils-1.1.1 + CITI_NFS4_ALL
patch.

Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]
Cc: Neil Brown [EMAIL PROTECTED]
---
 fs/nfsd/export.c   |   10 ++-
 fs/nfsd/nfsctl.c   |   19 ++-
 include/linux/sunrpc/svcauth.h |4 +-
 include/net/ipv6.h |9 +++
 net/sunrpc/svcauth_unix.c  |  119 +++-
 5 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index cbbc594..e29b431 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -35,6 +35,7 @@
 #include linux/lockd/bind.h
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/gss_api.h
+#include net/ipv6.h
 
 #define NFSDDBG_FACILITY   NFSDDBG_EXPORT
 
@@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
 {
struct auth_domain  *dom;
int i, err;
+   struct in6_addr addr6;
 
/* First, consistency check. */
err = -EINVAL;
@@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
goto out_unlock;
 
/* Insert client into hashtable. */
-   for (i = 0; i  ncp-cl_naddr; i++)
-   auth_unix_add_addr(ncp-cl_addrlist[i], dom);
-
+   for (i = 0; i  ncp-cl_naddr; i++) {
+   /* Mapping address */
+   ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, addr6);
+   auth_unix_add_addr(addr6, dom);
+   }
auth_unix_forget_old(dom);
auth_domain_put(dom);
 
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e307972..a8f7a90 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -37,6 +37,7 @@
 #include linux/nfsd/syscall.h
 
 #include asm/uaccess.h
+#include net/ipv6.h
 
 /*
  * We have a single directory with 9 nodes in it.
@@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, 
size_t size)
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
+   struct in6_addr in6;
 
if (size  sizeof(*data))
return -EINVAL;
@@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, 
size_t size)
res = (struct knfsd_fh*)buf;
 
exp_readlock();
-   if (!(clp = auth_unix_lookup(sin-sin_addr)))
+
+   /* IPv6 address mapping */
+   ipv6_addr_set_v4mapped(
+   (((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr),
+   in6);
+
+   if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
@@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, 
size_t size)
int err = 0;
struct knfsd_fh fh;
char *res;
+   struct in6_addr in6;
 
if (size  sizeof(*data))
return -EINVAL;
@@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, 
size_t size)
res = buf;
sin = (struct sockaddr_in *)data-gd_addr;
exp_readlock();
-   if (!(clp = auth_unix_lookup(sin-sin_addr)))
+   /* IPv6 address mapping */
+   ipv6_addr_set_v4mapped(
+   (((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr),
+   in6
+   );
+
+   if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 22e1ef8..9e6fb86 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -120,10 +120,10 @@ extern void   svc_auth_unregister(rpc_authflavor_t 
flavor);
 
 extern struct auth_domain *unix_domain_find(char

Re: [PATCH] IPv6 support for NFS server

2008-01-15 Thread J. Bruce Fields

Mostly just more trivial stuff for now, apologies:

On Tue, Jan 15, 2008 at 05:32:21PM -0500, J. Bruce Fields wrote:
 diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
 index cbbc594..e29b431 100644
 --- a/fs/nfsd/export.c
 +++ b/fs/nfsd/export.c
 @@ -35,6 +35,7 @@
  #include linux/lockd/bind.h
  #include linux/sunrpc/msg_prot.h
  #include linux/sunrpc/gss_api.h
 +#include net/ipv6.h
  
  #define NFSDDBG_FACILITY NFSDDBG_EXPORT
  
 @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
  {
   struct auth_domain  *dom;
   int i, err;
 + struct in6_addr addr6;
  
   /* First, consistency check. */
   err = -EINVAL;
 @@ -1574,9 +1576,11 @@ exp_addclient(struct nfsctl_client *ncp)
   goto out_unlock;
  
   /* Insert client into hashtable. */
 - for (i = 0; i  ncp-cl_naddr; i++)
 - auth_unix_add_addr(ncp-cl_addrlist[i], dom);
 -
 + for (i = 0; i  ncp-cl_naddr; i++) {
 + /* Mapping address */
 + ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, addr6);

I think the name of the function explains well enough what it's doing,
so the preceding comment is superfluous.

 + auth_unix_add_addr(addr6, dom);
 + }
   auth_unix_forget_old(dom);
   auth_domain_put(dom);
  
 diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
 index e307972..a8f7a90 100644
 --- a/fs/nfsd/nfsctl.c
 +++ b/fs/nfsd/nfsctl.c
 @@ -37,6 +37,7 @@
  #include linux/nfsd/syscall.h
  
  #include asm/uaccess.h
 +#include net/ipv6.h
  
  /*
   *   We have a single directory with 9 nodes in it.
 @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *file, char *buf, 
 size_t size)
   struct auth_domain *clp;
   int err = 0;
   struct knfsd_fh *res;
 + struct in6_addr in6;
  
   if (size  sizeof(*data))
   return -EINVAL;
 @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *file, char *buf, 
 size_t size)
   res = (struct knfsd_fh*)buf;
  
   exp_readlock();
 - if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 + /* IPv6 address mapping */
 + ipv6_addr_set_v4mapped(
 + (((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr),
 + in6);

The case there appears to already have been done in the assignment of
sin a few lines above; so couldn't this last line just be written:

ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);

?

 +
 + if (!(clp = auth_unix_lookup(in6)))
   err = -EPERM;

I'd rather assignments be made separately from tests, so:

clp = auth_unix_lookup(in6);
if (!clp)
err = -EPERM;

Yeah, I know, that's not what the original code did, but as long as
we're modifying that line anyway

   else {
   err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
 @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *file, char *buf, 
 size_t size)
   int err = 0;
   struct knfsd_fh fh;
   char *res;
 + struct in6_addr in6;
  
   if (size  sizeof(*data))
   return -EINVAL;
 @@ -271,7 +280,13 @@ static ssize_t write_getfd(struct file *file, char *buf, 
 size_t size)
   res = buf;
   sin = (struct sockaddr_in *)data-gd_addr;
   exp_readlock();
 - if (!(clp = auth_unix_lookup(sin-sin_addr)))
 + /* IPv6 address mapping */
 + ipv6_addr_set_v4mapped(
 + (((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr),
 + in6
 + );
 +
 + if (!(clp = auth_unix_lookup(in6)))
   err = -EPERM;

See both comments above.

   else {
   err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
 diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
 index 22e1ef8..9e6fb86 100644
 --- a/include/linux/sunrpc/svcauth.h
 +++ b/include/linux/sunrpc/svcauth.h
 @@ -120,10 +120,10 @@ extern void svc_auth_unregister(rpc_authflavor_t 
 flavor);
  
  extern struct auth_domain *unix_domain_find(char *name);
  extern void auth_domain_put(struct auth_domain *item);
 -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom);
 +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain 
 *dom);
  extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain 
 *new);
  extern struct auth_domain *auth_domain_find(char *name);
 -extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
 +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
  extern int auth_unix_forget_old(struct auth_domain *dom);
  extern void svcauth_unix_purge(void);
  extern void svcauth_unix_info_release(void *);
 diff --git a/include/net/ipv6.h b/include/net/ipv6.h
 index ae328b6..9394710 100644
 --- a/include/net/ipv6.h
 +++ b/include/net/ipv6.h
 @@ -400,6 +400,15 @@ static inline int ipv6_addr_v4mapped(const struct 
 in6_addr *a)
a-s6_addr32[2] == htonl(0x));
  }
  
 +static inline void

Re: Top 10 kernel oopses for the week ending January 5th, 2008

2008-01-07 Thread J. Bruce Fields

On Sat, Jan 05, 2008 at 09:39:35PM +, Al Viro wrote:
 On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote:
  The http://www.kerneloops.org website collects kernel oops and
  warning reports from various mailing lists and bugzillas as well as
  with a client users can install to auto-submit oopses.
  Below is a top 10 list of the oopses collected in the last 7 days.
  (Reports prior to 2.6.23 have been omitted in collecting the top 10)
  
  This week, a total of 49 oopses and warnings have been reported,
  compared to 53 reports in the previous week.
 
 FWIW, people moaning about the lack of entry-level kernel work would
 do well by decoding those to the level of this place in this function,
 called from here, with so-and-so variable being this and posting
 the results.  As skills go, it's far more useful than how to trim
 the trailing whitespace and the rest of checkpatch.pl-inspired crap
 that got so popular lately...

Is there any good basic documentation on this to point people at?

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc6-mm1

2008-01-03 Thread J. Bruce Fields

On Thu, Jan 03, 2008 at 04:37:46PM +0100, Torsten Kaiser wrote:
 On Jan 2, 2008 10:57 PM, J. Bruce Fields [EMAIL PROTECTED] wrote:
  On Thu, Jan 03, 2008 at 08:51:54AM +1100, Herbert Xu wrote:
   The two specific trees of interest would be git-nfsd and git-net.
 
  Also, if it's git-nfsd, it'd be useful to test with the current git-nfsd
  from the for-mm branch at:
 
  git://linux-nfs.org/~bfields/linus.git for-mm
 
  and then any bisection results (even partial) from that tree would help
  immensely
 
 Wrong URL, its (now?) at git://git.linux-nfs.org/projects/bfields/linux.git

Whoops, apologies.  Actually, the only change required should have been
to change linus to linux.  I should cut-n-paste and not relying on
typing it right each time

 Using HEAD is now at cd7e1c9... Merge commit 'server-xprt-switch^' into 
 for-mm
 I was able to compileinstall 54 packages, so this seems to be working.

Hm, OK, thanks again for all this followup.

--b.

 
 Now git-fetch'ing
 git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25.git
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 2.6.24-rc6-mm1

2008-01-02 Thread J. Bruce Fields

On Thu, Jan 03, 2008 at 08:51:54AM +1100, Herbert Xu wrote:
 On Wed, Jan 02, 2008 at 07:29:59PM +0100, Torsten Kaiser wrote:
 
  Vanilla 2.6.24-rc6 seems stable. I did not see any crash or warnings.
 
 OK that's great.  The next step would be to try excluding specific git
 trees from mm to see if they make a difference.
 
 The two specific trees of interest would be git-nfsd and git-net.

Also, if it's git-nfsd, it'd be useful to test with the current git-nfsd
from the for-mm branch at:

git://linux-nfs.org/~bfields/linus.git for-mm

and then any bisection results (even partial) from that tree would help
immensely

--b.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPv6 support for NFS server

2007-12-10 Thread J. Bruce Fields

On Mon, Dec 10, 2007 at 07:34:41PM +0100, Aurélien Charbon wrote:

 Here is a cleanup for the ip_map caching patch in nfs server.

 It prepares for IPv6 text-based mounts and exports.

 Tests: tested with only IPv4 network and basic nfs ops (mount, file 
 creation and modification)

Thanks!  And also tested with an unmodified rpc.mountd?

--b.


 -

 Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]

 diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/export.c 
 linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c
 --- linux-2.6.24-rc4/fs/nfsd/export.c2007-12-10 16:11:37.0 +0100
 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c2007-12-10 
 17:50:37.0 +0100
 @@ -35,6 +35,7 @@
 #include linux/lockd/bind.h
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/gss_api.h
 +#include net/ipv6.h

 #define NFSDDBG_FACILITYNFSDDBG_EXPORT

 @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
 {
 struct auth_domain*dom;
 inti, err;
 +struct in6_addr addr6;

 /* First, consistency check. */
 err = -EINVAL;
 @@ -1574,9 +1576,12 @@ exp_addclient(struct nfsctl_client *ncp)
 goto out_unlock;

 /* Insert client into hashtable. */
 -for (i = 0; i  ncp-cl_naddr; i++)
 -auth_unix_add_addr(ncp-cl_addrlist[i], dom);
 -
 +for (i = 0; i  ncp-cl_naddr; i++) {
 +/* Mapping address */
 +ipv6_addr_set(addr6, 0, 0,
 +htonl(0x), ncp-cl_addrlist[i].s_addr);
 +auth_unix_add_addr(addr6, dom);
 +}
 auth_unix_forget_old(dom);
 auth_domain_put(dom);

 diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/nfsctl.c 
 linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c
 --- linux-2.6.24-rc4/fs/nfsd/nfsctl.c2007-12-10 16:11:37.0 +0100
 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c2007-12-10 
 18:15:22.0 +0100
 @@ -37,6 +37,7 @@
 #include linux/nfsd/syscall.h

 #include asm/uaccess.h
 +#include net/ipv6.h

 /*
  *We have a single directory with 9 nodes in it.
 @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file *
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh *res;
 +struct in6_addr in6;

 if (size  sizeof(*data))
 return -EINVAL;
 @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file *
 res = (struct knfsd_fh*)buf;

 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 +/* IPv6 address mapping */
 +ipv6_addr_set(in6, 0, 0,
 +htonl(0x),
 +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr));
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
 @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file *
 int err = 0;
 struct knfsd_fh fh;
 char *res;
 +struct in6_addr in6;

 if (size  sizeof(*data))
 return -EINVAL;
 @@ -271,7 +280,12 @@ static ssize_t write_getfd(struct file *
 res = buf;
 sin = (struct sockaddr_in *)data-gd_addr;
 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +/* IPv6 address mapping */
 +ipv6_addr_set(in6, 0, 0,
 +htonl(0x),
 +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr));
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
 diff -p -u -r -N linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h 
 linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h
 --- linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h2007-12-10 
 16:01:43.0 +0100
 +++ linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h
 2007-12-10 17:09:34.0 +0100
 @@ -120,10 +120,10 @@ extern voidsvc_auth_unregister(rpc_auth

 extern struct auth_domain *unix_domain_find(char *name);
 extern void auth_domain_put(struct auth_domain *item);
 -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain 
 *dom);
 +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain 
 *dom);
 extern struct auth_domain *auth_domain_lookup(char *name, struct 
 auth_domain *new);
 extern struct auth_domain *auth_domain_find(char *name);
 -extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
 +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr);
 extern int auth_unix_forget_old(struct auth_domain *dom);
 extern void svcauth_unix_purge(void);
 extern void svcauth_unix_info_release(void *);
 diff -p -u -r -N linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c 
 linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c
 --- linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c2007-12-10 
 16:01:46.0 +0100
 +++ linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c
 2007-12-10 17:38:50.0 +0100
 @@ -11,7 +11,8 @@
 #include linux/hash.h
 #include linux/string.h
 #include net/sock.h
 -
 +#include

Re: [BUG] New Kernel Bugs

2007-11-15 Thread J. Bruce Fields

On Thu, Nov 15, 2007 at 01:50:43PM +1100, Neil Brown wrote:
 Virtual Folders.
 
 I use VM mode in EMACS, but I believe some other mail readers have the
 same functionality.
 I have a virtual folder called nfs which shows me all mail in my
 inbox which has the string 'nfs' or 'lockd' in a To, Cc, or Subject
 field.  When I visit that folder, I see all mail about nfs, whether it
 was sent to me personally, or to a relevant list, or to lkml.

Hm (googling around for mutt and virtual folders): looks like I can
get most of the way there in mutt with some macros based on its limit
command:

http://www.tummy.com/journals/entries/jafo_20060303_00

Thanks.--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] New Kernel Bugs

2007-11-14 Thread J. Bruce Fields

On Wed, Nov 14, 2007 at 09:38:20AM -0800, Randy Dunlap wrote:
 On Wed, 14 Nov 2007 15:08:47 +0100 Ingo Molnar wrote:
  so please stop this too busy and too noisy nonsense already. It was 
  nonsense 10 years ago and it's nonsense today. In 10 years the kernel 
  grew from a 1 million lines codebase to an 8 million lines codebase, so 
  what? Deal with it and be intelligent about filtering your information 
  influx instead of imposing a hard pre-filtering criteria that restricts 
  intelligent processing of information.
 
 So you have a preferred method of handling email.  Please don't
 force it on the rest of us.

I'd be curious for any pointers on tools, actually.  I read (ok, skim)
lkml but still overlook relevant bug reports occasionally.
(Fortunately, between Trond and Andrew and others forwarding things it's
not actually a problem, but I'm still curious).

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] NFS: handle IPv6 addresses in nfs ctl

2007-10-31 Thread J. Bruce Fields

On Wed, Oct 31, 2007 at 10:06:18AM +0100, Aurélien Charbon wrote:
 Thank you Brian
 Sorry, I did not see what you sent.

 I have tested it with an IPv4 configuration. It's OK.
 So Neil, Bruce, you can take this one for review.

Did you miss Neil's question about the nfsctl stuff?  (Do we really need
that, or would the changes to the ip_map cache be sufficient?)--b.


 fs/nfsd/export.c   |9 ++-
 fs/nfsd/nfsctl.c   |   42 --
 include/linux/sunrpc/svcauth.h |5 +
 include/net/ipv6.h |   10 +++
 net/sunrpc/svcauth_unix.c  |  118 
 +++--
 5 files changed, 134 insertions(+), 50 deletions(-)

 Signed-off-by: Brian Haley [EMAIL PROTECTED]
 Signed-off-by: Aurelien Charbon [EMAIL PROTECTED]

 ---

 diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
 index 66d0aeb..c47ba77 100644
 --- a/fs/nfsd/export.c
 +++ b/fs/nfsd/export.c
 @@ -35,6 +35,7 @@
 #include linux/lockd/bind.h
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/gss_api.h
 +#include net/ipv6.h
 #define NFSDDBG_FACILITYNFSDDBG_EXPORT
 @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp)
 {
 struct auth_domain*dom;
 inti, err;
 +struct in6_addrin6;
 /* First, consistency check. */
 err = -EINVAL;
 @@ -1574,9 +1576,10 @@ exp_addclient(struct nfsctl_client *ncp)
 goto out_unlock;
 /* Insert client into hashtable. */
 -for (i = 0; i  ncp-cl_naddr; i++)
 -auth_unix_add_addr(ncp-cl_addrlist[i], dom);
 -
 +for (i = 0; i  ncp-cl_naddr; i++) {
 +ipv6_addr_set_v4mapped(ncp-cl_addrlist[i].s_addr, in6);
 +auth_unix_add_addr(in6, dom);
 +}
 auth_unix_forget_old(dom);
 auth_domain_put(dom);
 diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
 index 77dc989..5cb5f0d 100644
 --- a/fs/nfsd/nfsctl.c
 +++ b/fs/nfsd/nfsctl.c
 @@ -37,6 +37,7 @@
 #include linux/nfsd/syscall.h
 #include asm/uaccess.h
 +#include net/ipv6.h
 /*
  *We have a single directory with 9 nodes in it.
 @@ -219,24 +220,37 @@ static ssize_t write_getfs(struct file *file, char 
 *buf, size_t size)
 {
 struct nfsctl_fsparm *data;
 struct sockaddr_in *sin;
 +struct sockaddr_in6 *sin6;
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh *res;
 +struct in6_addr in6;
 if (size  sizeof(*data))
 return -EINVAL;
 data = (struct nfsctl_fsparm*)buf;
 err = -EPROTONOSUPPORT;
 -if (data-gd_addr.sa_family != AF_INET)
 +switch (data-gd_addr.sa_family) {
 +case AF_INET:
 +sin = (struct sockaddr_in *)data-gd_addr;
 +ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
 +break;
 +case AF_INET6:
 +sin6 = (struct sockaddr_in6 *)data-gd_addr;
 +ipv6_addr_copy(in6, sin6-sin6_addr);
 +break;
 +default:
 goto out;
 -sin = (struct sockaddr_in *)data-gd_addr;
 +}
 +
 if (data-gd_maxlen  NFS3_FHSIZE)
 data-gd_maxlen = NFS3_FHSIZE;
 res = (struct knfsd_fh*)buf;
 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen);
 @@ -253,25 +267,41 @@ static ssize_t write_getfd(struct file *file, char 
 *buf, size_t size)
 {
 struct nfsctl_fdparm *data;
 struct sockaddr_in *sin;
 +struct sockaddr_in6 *sin6;
 struct auth_domain *clp;
 int err = 0;
 struct knfsd_fh fh;
 char *res;
 +struct in6_addr in6;
 if (size  sizeof(*data))
 return -EINVAL;
 data = (struct nfsctl_fdparm*)buf;
 err = -EPROTONOSUPPORT;
 -if (data-gd_addr.sa_family != AF_INET)
 +if (data-gd_addr.sa_family != AF_INET 
 +data-gd_addr.sa_family != AF_INET6)
 goto out;
 err = -EINVAL;
 if (data-gd_version  2 || data-gd_version  NFSSVC_MAXVERS)
 goto out;
 res = buf;
 -sin = (struct sockaddr_in *)data-gd_addr;
 exp_readlock();
 -if (!(clp = auth_unix_lookup(sin-sin_addr)))
 +
 +switch (data-gd_addr.sa_family) {
 +case AF_INET:
 +sin = (struct sockaddr_in *)data-gd_addr;
 +ipv6_addr_set_v4mapped(sin-sin_addr.s_addr, in6);
 +break;
 +case AF_INET6:
 +sin6 = (struct sockaddr_in6 *)data-gd_addr;
 +ipv6_addr_copy(in6, sin6-sin6_addr);
 +break;
 +default:
 +goto out;
 +}
 +
 +if (!(clp = auth_unix_lookup(in6)))
 err = -EPERM;
 else {
 err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE);
 diff --git a/include/linux/sunrpc/svcauth.h 
 b/include/linux/sunrpc/svcauth.h
 index 22e1ef8..64ecb93 100644
 --- a/include/linux/sunrpc/svcauth.h
 +++ b/include/linux/sunrpc/svcauth.h
 @@ -15,6 +15,7 @@
 #include linux/sunrpc/msg_prot.h
 #include linux/sunrpc/cache.h
 #include linux/hash.h
 +#include net/ipv6.h
 #define SVC_CRED_NGROUPS32
 struct svc_cred {
 @@ -120,10 +121,10 @@

Re: [PATCH 1/2] NFS: change the ip_map cache code to handle IPv6 addresses

2007-10-30 Thread J. Bruce Fields

Thanks for working on this.

Could you run linux/scripts/checkpatch.pl on your patch and fix the
problems it complains about?

On Tue, Oct 30, 2007 at 06:05:42PM +0100, Aurélien Charbon wrote:
 static void update(struct cache_head *cnew, struct cache_head *citem)
 {
 @@ -149,22 +157,24 @@ static void ip_map_request(struct cache_
   struct cache_head *h,
   char **bpp, int *blen)
 {
 -char text_addr[20];
 +char text_addr[40];
 struct ip_map *im = container_of(h, struct ip_map, h);
 -__be32 addr = im-m_addr.s_addr;
 -
 -snprintf(text_addr, 20, %u.%u.%u.%u,
 - ntohl(addr)  24  0xff,
 - ntohl(addr)  16  0xff,
 - ntohl(addr)   8  0xff,
 - ntohl(addr)   0  0xff);
 +if (ipv6_addr_v4mapped((im-m_addr))) {
 +snprintf(text_addr, 20, NIPQUAD_FMT,
 +ntohl(im-m_addr.s6_addr32[3])  24  0xff,
 +ntohl(im-m_addr.s6_addr32[3])  16  0xff,
 +ntohl(im-m_addr.s6_addr32[3])   8  0xff,
 +ntohl(im-m_addr.s6_addr32[3])   0  0xff);
 +} else {
 +snprintf(text_addr, 40, NIP6_FMT, NIP6(im-m_addr));
 +}
 qword_add(bpp, blen, im-m_class);
 qword_add(bpp, blen, text_addr);
 (*bpp)[-1] = '\n';
 }

What happens when an unpatched mountd gets this request?  Does it
ignore it, or respond with a negative entry?

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-14 Thread J. Bruce Fields

On Fri, Sep 14, 2007 at 11:12:30AM +0200, Wolfgang Walter wrote:
 Am Mittwoch, 12. September 2007 21:55 schrieb J. Bruce Fields:
  On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
   On Wednesday 12 September 2007, J. Bruce Fields wrote:
On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
 So it is in 2.6.21 and later and should probably go to .stable for
 .21 and .22.

 Bruce:  for you :-)
   
OK, thanks!  But, (as is alas often the case) I'm still confused:
   if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
   continue;
 - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY,
 svsk-sk_flags)) +   if (atomic_read(svsk-sk_inuse)  1
 + || test_bit(SK_BUSY, svsk-sk_flags))
   continue;
   atomic_inc(svsk-sk_inuse);
   list_move(le, to_be_aged);
   
What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
after that test?  Not all the code that does either of those is under
the same serv-sv_lock lock that this code is.
  
   This should not matter - SK_CLOSED may be set at any time.
  
   svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
   enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
   ensures that it is not enqueued twice.
 
  Oh, got it.  And the list manipulation is safe thanks to sv_lock.  Neat,
  thanks.  Can you verify that this solves your problem?
 
 Patch works fine here.

Great, thanks again!

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields

On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote:
 J. Bruce Fields wrote:
 On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote:
 I've been waiting for years for a smart person to come along and write a 
 POSIX-only distributed filesystem.

 What exactly do you mean by POSIX-only?

 Don't bother supporting attributes, file modes, and other details not 
 supported by POSIX.  The prime example being NFSv4, which is larded down 
 with Windows features.

I am sympathetic  Cutting those out may still leave you with
something pretty complicated, though.

 NFSv4.1 adds to the fun, by throwing interoperability completely out the 
 window.

What parts are you worried about in particular?

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields

On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote:
 My thoughts.  But first a disclaimer:   Perhaps you will recall me as one 
 of the people who really reads all your patches, and examines your code and 
 proposals closely.  So, with that in mind...

 I question the value of distributed block services (DBS), whether its your 
 version or the others out there.  DBS are not very useful, because it still 
 relies on a useful filesystem sitting on top of the DBS.  It devolves into 
 one of two cases:  (1) multi-path much like today's SCSI, with distributed 
 filesystem arbitrarion to ensure coherency, or (2) the filesystem running 
 on top of the DBS is on a single host, and thus, a single point of failure 
 (SPOF).

 It is quite logical to extend the concepts of RAID across the network, but 
 ultimately you are still bound by the inflexibility and simplicity of the 
 block device.

 In contrast, a distributed filesystem offers far more scalability, 
 eliminates single points of failure, and offers more room for optimization 
 and redundancy across the cluster.

 A distributed filesystem is also much more complex, which is why 
 distributed block devices are so appealing :)

 With a redundant, distributed filesystem, you simply do not need any 
 complexity at all at the block device level.  You don't even need RAID.

 It is my hope that you will put your skills towards a distributed 
 filesystem :)  Of the current solutions, GFS (currently in kernel) scales 
 poorly, and NFS v4.1 is amazingly bloated and overly complex.

 I've been waiting for years for a smart person to come along and write a 
 POSIX-only distributed filesystem.

What exactly do you mean by POSIX-only?

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields

On Fri, Sep 14, 2007 at 06:32:11PM -0400, Jeff Garzik wrote:
 J. Bruce Fields wrote:
 On Fri, Sep 14, 2007 at 05:14:53PM -0400, Jeff Garzik wrote:
 J. Bruce Fields wrote:
 On Fri, Sep 14, 2007 at 03:07:46PM -0400, Jeff Garzik wrote:
 I've been waiting for years for a smart person to come along and write 
 a POSIX-only distributed filesystem.
 What exactly do you mean by POSIX-only?
 Don't bother supporting attributes, file modes, and other details not 
 supported by POSIX.  The prime example being NFSv4, which is larded down 
 with Windows features.
 I am sympathetic  Cutting those out may still leave you with
 something pretty complicated, though.

 Far less complicated than NFSv4.1 though (which is easy :))

One would hope so.

 NFSv4.1 adds to the fun, by throwing interoperability completely out the 
 window.
 What parts are you worried about in particular?

 I'm not worried; I'm stating facts as they exist today (draft 13):

 NFS v4.1 does something completely without precedent in the history of NFS: 
  the specification is defined such that interoperability is -impossible- to 
 guarantee.

 pNFS permits private and unspecified layout types.  This means it is 
 impossible to guarantee that one NFSv4.1 implementation will be able to 
 talk another NFSv4.1 implementation.

No, servers are required to support ordinary nfs operations to the
metadata server.

At least, that's the way it was last I heard, which was a while ago.  I
agree that it'd stink (for any number of reasons) if you ever *had* to
get a layout to access some file.

Was that your main concern?

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Distributed storage. Move away from char device ioctls.

2007-09-14 Thread J. Bruce Fields

On Sat, Sep 15, 2007 at 12:08:42AM -0400, Jeff Garzik wrote:
 J. Bruce Fields wrote:
 No, servers are required to support ordinary nfs operations to the
 metadata server.
 At least, that's the way it was last I heard, which was a while ago.  I
 agree that it'd stink (for any number of reasons) if you ever *had* to
 get a layout to access some file.
 Was that your main concern?

 I just sorta assumed you could fall back to the NFSv4.0 mode of operation, 
 going through the metadata server for all data accesses.

Right.  So any two pNFS implementations *will* be able to talk to each
other; they just may not be able to use the (possibly higher-bandwidth)
read/write path that pNFS gives them.

 But look at that choice in practice:  you can either ditch pNFS completely, 
 or use a proprietary solution.  The market incentives are CLEARLY tilted in 
 favor of makers of proprietary solutions.

I doubt somebody would go to all the trouble to implement pNFS and then
present their customers with that kind of choice.  But maybe I'm missing
something.  What market incentives do you see that would make that more
attractive than either 1) using a standard fully-specified layout type,
or 2) just implementing your own proprietary protocol instead of pNFS?

 Overall, my main concern is that NFSv4.1 is no longer an open architecture 
 solution.  The no-pNFS or proprietary platform choice merely illustrate 
 one of many negative aspects of this architecture.

It's always been possible to extend NFS in various ways if you want.
You could use sideband protocols with v2 and v3, for example.  People
have done that.  Some of them have been standardized and widely
implemented, some haven't.  You could probably add your own compound ops
to v4 if you wanted, I guess.

And there's advantages to experimenting with extensions first and then
standardizing when you figure out what works.  I wish it happened that
way more often.

 Now, for the first time in NFS's history (AFAIK), the protocol is no longer 
 completely specified, completely known.  No longer a closed loop.  
 Private layout types mean that it is _highly_ unlikely that any OS or 
 appliance or implementation will be able to claim full NFS compatibility.

Do you know of any such private layout types?

This is kind of a boring argument, isn't it?  I'd rather hear whatever
ideas you have for a new distributed filesystem protocol.

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields

On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
 as already described old temporary sockets (client is gone) of lockd aren't
 closed after some time. So, with enough clients and some time gone, there
 are 80 open dangling sockets and you start getting messages of the form:
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads.

Thanks for working on this problem!

 If I understand the code then the intention was that the server closes
 temporary sockets after about 6 to 12 minutes:
 
   a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
   svc_age_temp_sockets:
   if a socket is marked OLD it gets closed.
   sockets which are not marked as OLD are marked OLD
 
   every time the sockets receives something OLD is cleared.
 
 But svc_age_temp_sockets never closes any socket though because it only
 closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
 Here is a patch against 2.6.22.6 which changes the test to
 svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
 here. Unused sockets get closed (after 6 to 12 minutes)

So the fact that this changes the behavior means that sk_inuse is taking
on negative values.  This can't be right--how can something like
svc_sock_put() (which does an atomic_dec_and_test) work in that case?

I wish I had time today to figure out what's going on in this case.  But
from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
of anything without the stereotyped behavior--initializing to one,
atomic_inc()ing whenever someone takes a reference, and
atomic_dec_and_test()ing whenever someone drops it

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields

On Wed, Sep 12, 2007 at 09:37:29AM -0400, bfields wrote:
 So the fact that this changes the behavior means that sk_inuse is taking
 on negative values.

Uh, no, I misread the tests, sorry.  I'm not awake.--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields

On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
 So it is in 2.6.21 and later and should probably go to .stable for .21
 and .22.
 
 Bruce:  for you :-)

OK, thanks!  But, (as is alas often the case) I'm still confused:

   if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
   continue;
 - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
 svsk-sk_flags))
 + if (atomic_read(svsk-sk_inuse)  1
 + || test_bit(SK_BUSY, svsk-sk_flags))
   continue;
   atomic_inc(svsk-sk_inuse);
   list_move(le, to_be_aged);

What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
after that test?  Not all the code that does either of those is under
the same serv-sv_lock lock that this code is.

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread J. Bruce Fields

On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
 On Wednesday 12 September 2007, J. Bruce Fields wrote:
  On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
   So it is in 2.6.21 and later and should probably go to .stable for .21
   and .22.
   
   Bruce:  for you :-)
  
  OK, thanks!  But, (as is alas often the case) I'm still confused:
  
 if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
 continue;
   - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
   svsk-sk_flags))
   + if (atomic_read(svsk-sk_inuse)  1
   + || test_bit(SK_BUSY, svsk-sk_flags))
 continue;
 atomic_inc(svsk-sk_inuse);
 list_move(le, to_be_aged);
  
  What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
  after that test?  Not all the code that does either of those is under
  the same serv-sv_lock lock that this code is.
  
 
 This should not matter - SK_CLOSED may be set at any time.
 
 svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then 
 enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue 
 ensures that it is not enqueued twice.

Oh, got it.  And the list manipulation is safe thanks to sv_lock.  Neat,
thanks.  Can you verify that this solves your problem?

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NFS] problems with lockd in 2.6.22.6

2007-09-07 Thread J. Bruce Fields

On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
 Hello,
 
 we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then 
 we get the message
 
 lockd: too many open TCP sockets, consider increasing the number of nfsd 
 threads
 lockd: last TCP connect from ^\\236^\É^D
 
 1) These random characters in the second line are caused by a bug in 
 svc_tcp_accept.
 I already posted this patch on netdev@vger.kernel.org:

Thanks, I've applied that.  (The bug is a little subtle: there's
actually two previous __svc_print_addr() calls which might have
initialized buf correctly, and it's not obvious that the second isn't
always called (since it's in a dprintk, which is a macro that expands
into a printk inside a conditional)).

 with this patch applied one gets something like
 
 lockd: too many open TCP sockets, consider increasing the number of
 nfsd threads lockd: last TCP connect from 10.11.0.12, port=784
 
 
 2) The number of nfsd threads we are running on the machine is 1024.
 So this is not the problem. It seems, though, that in the case of
 lockd svc_tcp_accept does not check the number of nfsd threads but the
 number of lockd threads which is one.  As soon as the number of open
 lockd sockets surpasses 80 this message gets logged.  This usually
 happens every evening when a lot of people shutdown their workstation.

So to be clear: there's not an actual problem here other than that the
logs are getting spammed?  (Not that that isn't a problem in itself.)

 3) For unknown reason these sockets then remain open. In the morning
 when people start their workstation again we therefor not only get a
 lot of these messages again but often the nfs-server does not proberly
 work any more. Restarting the nfs-daemon is a workaround.

Hm, thanks.

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [NFS] [3/4] 2.6.23-rc4: known regressions

2007-08-29 Thread J. Bruce Fields

On Wed, Aug 29, 2007 at 05:27:39PM +0200, Michal Piotrowski wrote:
 FS
 
 Subject : [NFSD OOPS] 2.6.23-rc1-git10
 References  : http://lkml.org/lkml/2007/8/2/462
 Last known good : ?
 Submitter   : Andrew Clayton [EMAIL PROTECTED]
 Caused-By   : ?
 Handled-By  : ?
 Status  : unknown

This is a bug, but (alas) appears to be a preexisting bug, so may not
belong on this list.--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] AFS: Add security support

2007-04-11 Thread J. Bruce Fields

On Wed, Apr 11, 2007 at 08:10:37PM +0100, David Howells wrote:
 Add security support to the AFS filesystem.  Kerberos IV tickets are
 added as RxRPC keys are added to the session keyring with the klog
 program.  open() and other VFS operations then find this ticket with
 request_key() and either use it immediately (eg: mkdir, unlink) or
 attach it to a file descriptor (open).

Just curious--when is the actual crypto done?  There doesn't seem to be
any in this patch.

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 8/8] AFS: Add security support

2007-04-11 Thread J. Bruce Fields

On Wed, Apr 11, 2007 at 09:10:32PM +0100, David Howells wrote:
 J. Bruce Fields [EMAIL PROTECTED] wrote:
 
  Just curious--when is the actual crypto done?  There doesn't seem to be
  any in this patch.
 
 See AF_RXRPC patch:
 
   http://people.redhat.com/~dhowells/rxrpc/04-af_rxrpc.diff
 
 You turn on CONFIG_RXKAD and load the rxkad module thus built (assuming you
 haven't built it in) after loading the af_rxrpc module.  I probably should've
 mentioned that in the cover.

Oh, I see--didn't think to check net/rxrpc.  Thanks!

--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

aironet: possible irq lock inversion dependency

2006-07-27 Thread J. Bruce Fields

I get this when I bring up a network interface (a cisco aironet wireless
minipci card) on 2.6.18-rc2.

Let me know if any additonal information would be useful.

--b.


Jul 27 15:18:44 puzzle kernel: 
Jul 27 15:18:44 puzzle kernel: 
=
Jul 27 15:18:44 puzzle kernel: [ INFO: possible irq lock inversion dependency 
detected ]
Jul 27 15:18:44 puzzle kernel: 
-
Jul 27 15:18:44 puzzle kernel: pump/4811 just changed the state of lock:
Jul 27 15:18:44 puzzle kernel:  (ai-aux_lock){+...}, at: [c0427608] 
airo_interrupt+0x3a8/0x1170
Jul 27 15:18:44 puzzle kernel: but this lock took another, hard-irq-unsafe lock 
in the past:
Jul 27 15:18:44 puzzle kernel:  (skb_queue_lock_key){-+..}
Jul 27 15:18:44 puzzle kernel: 
Jul 27 15:18:44 puzzle kernel: and interrupts could create inverse lock 
ordering between them.
Jul 27 15:18:44 puzzle kernel: 
Jul 27 15:18:44 puzzle kernel: 
Jul 27 15:18:44 puzzle kernel: other info that might help us debug this:
Jul 27 15:18:44 puzzle kernel: 1 lock held by pump/4811:
Jul 27 15:18:44 puzzle kernel:  #0:  (dev-_xmit_lock){-...}, at: [c04fb7d6] 
__qdisc_run+0x56/0x1c0
Jul 27 15:18:44 puzzle kernel: 
Jul 27 15:18:44 puzzle kernel: the first lock's dependencies:
Jul 27 15:18:44 puzzle kernel: - (ai-aux_lock){+...} ops: 2 {
Jul 27 15:18:44 puzzle kernel:initial-use  at:
Jul 27 15:18:44 puzzle kernel: [c013015a] 
lock_acquire+0x5a/0x80
Jul 27 15:18:44 puzzle kernel: [c056869e] 
_spin_lock_irqsave+0x2e/0x40
Jul 27 15:18:44 puzzle kernel: [c04271c4] 
mpi_start_xmit+0x54/0xf0
Jul 27 15:18:44 puzzle kernel: [c04eefe5] 
dev_hard_start_xmit+0x205/0x2d0
Jul 27 15:18:44 puzzle kernel: [c04fb7f4] 
__qdisc_run+0x74/0x1c0
Jul 27 15:18:44 puzzle kernel: [c04f09e5] 
dev_queue_xmit+0x175/0x2f0
Jul 27 15:18:44 puzzle kernel: [c05475b0] 
packet_sendmsg+0x1e0/0x260
Jul 27 15:18:44 puzzle kernel: [c04e3c4f] 
sock_sendmsg+0xcf/0xf0
Jul 27 15:18:44 puzzle kernel: [c04e4070] 
sys_sendto+0xc0/0xf0
Jul 27 15:18:44 puzzle kernel: [c04e5570] 
sys_socketcall+0x140/0x1e0
Jul 27 15:18:44 puzzle kernel: [c0102e37] 
syscall_call+0x7/0xb
Jul 27 15:18:44 puzzle kernel:in-hardirq-W at:
Jul 27 15:18:44 puzzle kernel: [c013015a] 
lock_acquire+0x5a/0x80
Jul 27 15:18:44 puzzle kernel: [c056869e] 
_spin_lock_irqsave+0x2e/0x40
Jul 27 15:18:44 puzzle kernel: [c0427608] 
airo_interrupt+0x3a8/0x1170
Jul 27 15:18:44 puzzle kernel: [c013a807] 
handle_IRQ_event+0x27/0x60
Jul 27 15:18:44 puzzle kernel: [c013a8d4] 
__do_IRQ+0x94/0x110
Jul 27 15:18:44 puzzle kernel: [c0104c6a] 
do_IRQ+0xaa/0xf0
Jul 27 15:18:44 puzzle kernel: [c0103105] 
common_interrupt+0x25/0x30
Jul 27 15:18:44 puzzle kernel: [c04e9cce] 
skb_release_data+0x4e/0x90
Jul 27 15:18:44 puzzle kernel: [c04e9a8e] 
kfree_skbmem+0xe/0x90
Jul 27 15:18:44 puzzle kernel: [c04e9b6e] 
__kfree_skb+0x5e/0xe0
Jul 27 15:18:44 puzzle kernel: [c04e9c09] 
kfree_skb+0x19/0x30
Jul 27 15:18:44 puzzle kernel: [c04f01dc] 
dev_kfree_skb_any+0x7c/0x90
Jul 27 15:18:44 puzzle kernel: [c04238fc] 
mpi_send_packet+0x12c/0x230
Jul 27 15:18:44 puzzle kernel: [c0427210] 
mpi_start_xmit+0xa0/0xf0
Jul 27 15:18:44 puzzle kernel: [c04eefe5] 
dev_hard_start_xmit+0x205/0x2d0
Jul 27 15:18:44 puzzle kernel: [c04fb7f4] 
__qdisc_run+0x74/0x1c0
Jul 27 15:18:44 puzzle kernel: [c04f09e5] 
dev_queue_xmit+0x175/0x2f0
Jul 27 15:18:44 puzzle kernel: [c05475b0] 
packet_sendmsg+0x1e0/0x260
Jul 27 15:18:44 puzzle kernel: [c04e3c4f] 
sock_sendmsg+0xcf/0xf0
Jul 27 15:18:44 puzzle kernel: [c04e4070] 
sys_sendto+0xc0/0xf0
Jul 27 15:18:44 puzzle kernel: [c04e5570] 
sys_socketcall+0x140/0x1e0
Jul 27 15:18:44 puzzle kernel: [c0102e37] 
syscall_call+0x7/0xb
Jul 27 15:18:44 puzzle kernel:  }
Jul 27 15:18:44 puzzle kernel:  ... key  at: [c0abcb20] 
__key.23102+0x0/0x8
Jul 27 15:18:44 puzzle kernel:  - (skb_queue_lock_key){-+..} ops: 2030 {
Jul 27 15:18:44 puzzle kernel: initial-use  at:
Jul 27 15:18:44 puzzle kernel:   [c013015a] 
lock_acquire+0x5a/0x80
Jul 27 15:18:44 puzzle kernel:   [c056869e] 
_spin_lock_irqsave+0x2e/0x40
Jul 27 15:18:44 puzzle kernel:

Re: [PATCH] crypto_free_tfm callers do not need to check for NULL

2005-08-30 Thread J. Bruce Fields

On Tue, Aug 30, 2005 at 10:45:54PM +0200, Jesper Juhl wrote:
 Since the patch to add a NULL short-circuit to crypto_free_tfm() went in, 
 there's no longer any need for callers of that function to check for NULL.
...
 Feedback, ACK, NACK, etc welcome. 

I've no problem with the auth_gss or nfsv4 bits.--b.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

66 matches

Mail list logo