Re: [PATCH net-next] SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer()
On Thu, 2018-11-08 at 02:04 +, YueHaibing wrote: > There is no need to have the '__be32 *p' variable static since new > value > always be assigned before use it. > > Signed-off-by: YueHaibing > --- > net/sunrpc/xdr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c > index 2bbb8d3..d80b156 100644 > --- a/net/sunrpc/xdr.c > +++ b/net/sunrpc/xdr.c > @@ -546,7 +546,7 @@ void xdr_commit_encode(struct xdr_stream *xdr) > static __be32 *xdr_get_next_encode_buffer(struct xdr_stream *xdr, > size_t nbytes) > { > - static __be32 *p; > + __be32 *p; > int space_left; > int frag1bytes, frag2bytes; > Ouch, that's a really nasty bug that could definitely cause corruption if you have 2 threads simultaneously calling this function! This really deserves to be a stable patch. Thank you, YueHaibing! Bruce, do you want to shepherd this one in? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.mykleb...@hammerspace.com
Re: general protection fault in encode_rpcb_string
On Tue, 2018-04-17 at 17:33 -0400, J. Bruce Fields wrote: > On Mon, Apr 16, 2018 at 09:02:01PM -0700, syzbot wrote: > > syzbot hit the following crash on bpf-next commit > > 5d1365940a68dd57b031b6e3c07d7d451cd69daf (Thu Apr 12 18:09:05 2018 > > +) > > Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net > > syzbot dashboard link: > > https://syzkaller.appspot.com/bug?extid=4b98281f2401ab849f4b > > > > So far this crash happened 2 times on bpf-next. > > C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6433835633 > > 868800 > > syzkaller reproducer: > > https://syzkaller.appspot.com/x/repro.syz?id=6407311794896896 > > Raw console output: > > https://syzkaller.appspot.com/x/log.txt?id=5861511176126464 > > Based on that, looks like it's attempting an nfs mount while causing > kmalloc failures? > > Probably one of rpcb->r_netid, r_addr, or r_owner was bad in > rpcb_enc_getaddr. > > Hm, and previous log makes it look like it was an > rpc_sockaddr2uaddr() > in rpcb_getport_async() that was made to fail. Do we need to check > for > failure of: > > map->r_addr = rpc_sockaddr2uaddr(sap, GFP_ATOMIC); > > ? Yes, and we can probably convert it, and the other GFP_ATOMIC allocations in the rpcbind client to use GFP_NOFS in order to improve reliability. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammer.space
Re: sunrpc: infinite unkillable console spam in xs_tcp_setup_socket
On Mon, 2017-11-20 at 14:02 +0100, Dmitry Vyukov wrote: > Hello, > > The following program triggers infinite stream of the following > output > on console. The program is unkillable and this effectively brings the > machine down: > > > ** 16 printk messages dropped ** [12875.022917] xs_tcp_setup_socket: > connect returned unhandled error -113 > Does the following fix the issue? 8<- From f48d3f01df45f50f0145060f5272ccf1aea855ac Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.mykleb...@primarydata.com> Date: Fri, 24 Nov 2017 12:00:24 -0500 Subject: [PATCH] SUNRPC: Allow connect to return EHOSTUNREACH Reported-by: Dmitry Vyukov <dvyu...@google.com> Signed-off-by: Trond Myklebust <trond.mykleb...@primarydata.com> --- net/sunrpc/xprtsock.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 4dad5da388d6..8cb40f8ffa5b 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2437,6 +2437,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -ECONNREFUSED: case -ECONNRESET: case -ENETUNREACH: + case -EHOSTUNREACH: case -EADDRINUSE: case -ENOBUFS: /* -- 2.14.3 -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCH 06/21] nfs client: exit_net cleanup check added
On Sun, 2017-11-05 at 19:48 +0300, Vasily Averin wrote: > On 2017-11-05 19:02, Trond Myklebust wrote: > > On Sun, 2017-11-05 at 13:00 +0300, Vasily Averin wrote: > > > Be sure that nfs_client_list and nfs_volume_list lists > > > initialized > > > in net_init hook were return to initial state in net_exit hook. > > > > > > Signed-off-by: Vasily Averin <v...@virtuozzo.com> > > > --- > > > fs/nfs/client.c | 4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > > > index 22880ef..7c0691c 100644 > > > --- a/fs/nfs/client.c > > > +++ b/fs/nfs/client.c > > > @@ -204,6 +204,10 @@ void nfs_cleanup_cb_ident_idr(struct net > > > *net) > > > struct nfs_net *nn = net_generic(net, nfs_net_id); > > > > > > idr_destroy(>cb_ident_idr); > > > + WARN(!list_empty(>nfs_client_list), > > > + "net %p exit: nfs_client_list is not empty\n", > > > net); > > > + WARN(!list_empty(>nfs_volume_list), > > > + "net %p exit: nfs_volume_list is not empty\n", > > > net); > > > } > > > > > > > Why do we need these? Is there a specific bug that you are trying > > to > > track down? > > I hope such checks allows to detect leaked per-netns objects. > Also I hope that all new pernet_operations will inherit such checks > too. > > I assume that elements added into per-net lists should not live > longer than net namespace, > and should be deleted from the list. I think exit_net hook is good > place for such check. > > Recently I've found lost list_entry and enabled timer on stop of net > namespace. > Then I've reviewed all existing pernet_operations and found that many > drivers > have such checks already. So I decided to complete this task and add > such checks > into all affected subsystems. > Unless there is a known problem that has specific debugging needs, this kind of assert should take the form of a WARN_ONCE() so that it doesn't fill user syslogs with redundant warnings if triggered. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCH 06/21] nfs client: exit_net cleanup check added
On Sun, 2017-11-05 at 13:00 +0300, Vasily Averin wrote: > Be sure that nfs_client_list and nfs_volume_list lists initialized > in net_init hook were return to initial state in net_exit hook. > > Signed-off-by: Vasily Averin <v...@virtuozzo.com> > --- > fs/nfs/client.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c > index 22880ef..7c0691c 100644 > --- a/fs/nfs/client.c > +++ b/fs/nfs/client.c > @@ -204,6 +204,10 @@ void nfs_cleanup_cb_ident_idr(struct net *net) > struct nfs_net *nn = net_generic(net, nfs_net_id); > > idr_destroy(>cb_ident_idr); > + WARN(!list_empty(>nfs_client_list), > + "net %p exit: nfs_client_list is not empty\n", net); > + WARN(!list_empty(>nfs_volume_list), > + "net %p exit: nfs_volume_list is not empty\n", net); > } > Why do we need these? Is there a specific bug that you are trying to track down? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: possible deadlock in flush_work (2)
dep.c:1258 > > check_prev_add kernel/locking/lockdep.c:1901 [inline] > > check_prevs_add kernel/locking/lockdep.c:2018 [inline] > > validate_chain kernel/locking/lockdep.c:2460 [inline] > > __lock_acquire+0x2f55/0x3d50 kernel/locking/lockdep.c:3487 > > lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3991 > > start_flush_work kernel/workqueue.c:2851 [inline] > > flush_work+0x57f/0x8a0 kernel/workqueue.c:2882 > > __cancel_work_timer+0x30a/0x7e0 kernel/workqueue.c:2954 > > cancel_work_sync+0x17/0x20 kernel/workqueue.c:2990 > > xprt_destroy+0xa1/0x130 net/sunrpc/xprt.c:1467 > > xprt_destroy_kref net/sunrpc/xprt.c:1477 [inline] > > kref_put include/linux/kref.h:70 [inline] > > xprt_put+0x38/0x40 net/sunrpc/xprt.c:1501 > > rpc_task_release_client+0x299/0x430 net/sunrpc/clnt.c:986 > > rpc_release_resources_task+0x7f/0xa0 net/sunrpc/sched.c:1020 > > rpc_release_task net/sunrpc/sched.c:1059 [inline] > > __rpc_execute+0x4d9/0xe70 net/sunrpc/sched.c:824 > > rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:848 > > process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2112 > > worker_thread+0x223/0x1990 kernel/workqueue.c:2246 > > kthread+0x38b/0x470 kernel/kthread.c:242 > > ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 > > > +sunrpc maintainers A fix for this has already been merged. Please retest with an up to date kernel. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCH 01/23] net, sunrpc: convert rpc_cred.cr_count from atomic_t to refcount_t
On Fri, 2017-03-17 at 09:02 -0400, Jeff Layton wrote: > On Fri, 2017-03-17 at 12:50 +0000, Trond Myklebust wrote: > > On Fri, 2017-03-17 at 14:10 +0200, Elena Reshetova wrote: > > > refcount_t type and corresponding API should be > > > used instead of atomic_t when the variable is used as > > > a reference counter. This allows to avoid accidental > > > refcounter overflows that might lead to use-after-free > > > situations. > > > > > > Signed-off-by: Elena Reshetova <elena.reshet...@intel.com> > > > Signed-off-by: Hans Liljestrand <ishkam...@gmail.com> > > > Signed-off-by: Kees Cook <keesc...@chromium.org> > > > Signed-off-by: David Windsor <dwind...@gmail.com> > > > --- > > > include/linux/sunrpc/auth.h | 8 > > > net/sunrpc/auth.c | 12 ++-- > > > 2 files changed, 10 insertions(+), 10 deletions(-) > > > > > > diff --git a/include/linux/sunrpc/auth.h > > > b/include/linux/sunrpc/auth.h > > > index b1bc62b..bd36e0b 100644 > > > --- a/include/linux/sunrpc/auth.h > > > +++ b/include/linux/sunrpc/auth.h > > > @@ -15,7 +15,7 @@ > > > #include > > > #include > > > > > > -#include > > > +#include > > > #include > > > #include > > > #include > > > @@ -68,7 +68,7 @@ struct rpc_cred { > > > #endif > > > unsigned long cr_expire; /* when > > > to gc > > > */ > > > unsigned long cr_flags; /* various > > > flags */ > > > - atomic_tcr_count; /* ref count */ > > > + refcount_t cr_count; /* ref count > > > */ > > > > > > > NACK. That's going to be hitting > > WARN_ONCE(!refcount_inc_not_zero(r), > > "refcount_t: increment on 0; use-after-free.\n") like there's no > > tomorrow... > > > > Please stop with these automated conversions. They are going to > > cause a > > lot more bugs than they fix. > > > > Agreed. These patchsets are touching places where we've already > banged > out most of the refcounting bugs. I'm against doing large scale > conversions like this without a damned good reason. > > I think it may be best to do this sort of thing in a more piecemeal > fashion. Pick a subsystem or two and do the conversions there to > prove > that they're better than what we have. If the subsystem already has > problems with its refcounting, then so much the better. Point to bugs > that this new infrastructure helped find. > > Encourage people to adopt your new infrastructure as new refcounted > objects are introduced into the kernel. You might even consider a LWN > article about this. > > Eventually we'll get around to changing existing code to use it, once > there is a sufficient advantage to doing so. Most likely when we're > reworking the code for other reasons, or when we're chasing some > horrid > refcounting bug and think that this might help find it. The main issue is that this "refcount_t" implementation appears to be assuming that there is one and only one model for refcounts (the one where a value of "0" means "free me immediately"). The kernel has a plethora of object caching implementations where this is simply not the case; the dcache is a prime example, and this cache is another. In both these implementation, the atomic_t variable is being used more as a semaphore-style lock that prevents freeing of the object while it is in active use as opposed to being freeable, but cached. This is why these automated conversions are a nuisance and a source of bugs. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCH 01/23] net, sunrpc: convert rpc_cred.cr_count from atomic_t to refcount_t
On Fri, 2017-03-17 at 14:10 +0200, Elena Reshetova wrote: > refcount_t type and corresponding API should be > used instead of atomic_t when the variable is used as > a reference counter. This allows to avoid accidental > refcounter overflows that might lead to use-after-free > situations. > > Signed-off-by: Elena Reshetova <elena.reshet...@intel.com> > Signed-off-by: Hans Liljestrand <ishkam...@gmail.com> > Signed-off-by: Kees Cook <keesc...@chromium.org> > Signed-off-by: David Windsor <dwind...@gmail.com> > --- > include/linux/sunrpc/auth.h | 8 > net/sunrpc/auth.c | 12 ++-- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/include/linux/sunrpc/auth.h > b/include/linux/sunrpc/auth.h > index b1bc62b..bd36e0b 100644 > --- a/include/linux/sunrpc/auth.h > +++ b/include/linux/sunrpc/auth.h > @@ -15,7 +15,7 @@ > #include > #include > > -#include > +#include > #include > #include > #include > @@ -68,7 +68,7 @@ struct rpc_cred { > #endif > unsigned long cr_expire; /* when to gc > */ > unsigned long cr_flags; /* various > flags */ > - atomic_tcr_count; /* ref count */ > + refcount_t cr_count; /* ref count */ > NACK. That's going to be hitting WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n") like there's no tomorrow... Please stop with these automated conversions. They are going to cause a lot more bugs than they fix. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com
Re: [PATCHv1] sunrpc: fix write space race causing stalls
> On Sep 16, 2016, at 13:29, David Vrabel <david.vra...@citrix.com> wrote: > > On 16/09/16 18:06, Trond Myklebust wrote: >> >>> On Sep 16, 2016, at 12:41, David Vrabel <david.vra...@citrix.com> wrote: >>> >>> On 16/09/16 17:01, Trond Myklebust wrote: >>>> >>>>> On Sep 16, 2016, at 08:28, David Vrabel <david.vra...@citrix.com> wrote: >>>>> >>>>> Write space becoming available may race with putting the task to sleep >>>>> in xprt_wait_for_buffer_space(). The existing mechanism to avoid the >>>>> race does not work. >>>>> >>>>> This (edited) partial trace illustrates the problem: >>>>> >>>>> [1] rpc_task_run_action: task:43546@5 ... action=call_transmit >>>>> [2] xs_write_space <-xs_tcp_write_space >>>>> [3] xprt_write_space <-xs_write_space >>>>> [4] rpc_task_sleep: task:43546@5 ... >>>>> [5] xs_write_space <-xs_tcp_write_space >>>>> >>>>> [1] Task 43546 runs but is out of write space. >>>>> >>>>> [2] Space becomes available, xs_write_space() clears the >>>>> SOCKWQ_ASYNC_NOSPACE bit. >>>>> >>>>> [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but >>>>> this has not yet been queued and the wake up is lost. >>>>> >>>>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space() >>>>> which queues task 43546. >>>>> >>>>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which >>>>> is supposed to handle the above race) does not call >>>>> xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and >>>>> thus the task is not woken. >>>>> >>>>> Fix the race by have xprt_wait_for_buffer_space() check for write >>>>> space after putting the task to sleep. >>>>> >>>>> Signed-off-by: David Vrabel <david.vra...@citrix.com> >>>>> --- >>>>> include/linux/sunrpc/xprt.h | 1 + >>>>> net/sunrpc/xprt.c | 4 >>>>> net/sunrpc/xprtsock.c | 21 +++-- >>>>> 3 files changed, 24 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h >>>>> index a16070d..621e74b 100644 >>>>> --- a/include/linux/sunrpc/xprt.h >>>>> +++ b/include/linux/sunrpc/xprt.h >>>>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops { >>>>> void(*connect)(struct rpc_xprt *xprt, struct rpc_task >>>>> *task); >>>>> void * (*buf_alloc)(struct rpc_task *task, size_t size); >>>>> void(*buf_free)(void *buffer); >>>>> + bool(*have_write_space)(struct rpc_xprt *task); >>>>> int (*send_request)(struct rpc_task *task); >>>>> void(*set_retrans_timeout)(struct rpc_task *task); >>>>> void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task); >>>>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>>>> index ea244b2..d3c1b1e 100644 >>>>> --- a/net/sunrpc/xprt.c >>>>> +++ b/net/sunrpc/xprt.c >>>>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task >>>>> *task, rpc_action action) >>>>> >>>>> task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; >>>>> rpc_sleep_on(>pending, task, action); >>>>> + >>>>> + /* Write space notification may race with putting task to sleep. */ >>>>> + if (xprt->ops->have_write_space(xprt)) >>>>> + rpc_wake_up_queued_task(>pending, task); >>>>> } >>>>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); >>>>> >>>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>>>> index bf16883..211de5b 100644 >>>>> --- a/net/sunrpc/xprtsock.c >>>>> +++ b/net/sunrpc/xprtsock.c >>>>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task) >>>>> >>>>> spin_unlock_bh(>transport_lock); >>>>> >>>>> - /* Race breaker in case memory is freed before above code is called */ >>>>> - sk->sk_write_space(sk); >>>>> return ret; >>>>> } >>>> >>>> Instead of these callbacks, why not just add a call to >>>> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in >>>> xs_nospace()? Won’t that fix the existing race breaker? >>> >>> I don't see how that would help. If sk->sk_write_space was already >>> called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to >>> sk->sk_write_space will still be a nop. >> >> Sorry. Copy+paste error. I meant SOCKWQ_ASYNC_NOSPACE. >> >>> >>> Or did you mean SOCKWQ_ASYNC_NOSPACE here? It doesn't seem right to set >>> this bit when we don't know if there's space or not. >> >> Why not? > > I prefer my solution because: > > a) It obviously fixes the race (games with bits are less understandable). > > b) It requires fewer atomic ops. > > c) It doesn't require me to understand what the behaviour of the > socket-internal SOCKWQ_ASYNC_NOSPACE bit is or should be. > > d) I'm not sure I understand the objection to the additional > have_write_space method -- it has simple, clear behaviour. > I don’t see the point of adding 24 lines of code over 3 different files if the problem can be solved with 1 line of code.
Re: [PATCHv1] sunrpc: fix write space race causing stalls
> On Sep 16, 2016, at 12:41, David Vrabel <david.vra...@citrix.com> wrote: > > On 16/09/16 17:01, Trond Myklebust wrote: >> >>> On Sep 16, 2016, at 08:28, David Vrabel <david.vra...@citrix.com> wrote: >>> >>> Write space becoming available may race with putting the task to sleep >>> in xprt_wait_for_buffer_space(). The existing mechanism to avoid the >>> race does not work. >>> >>> This (edited) partial trace illustrates the problem: >>> >>> [1] rpc_task_run_action: task:43546@5 ... action=call_transmit >>> [2] xs_write_space <-xs_tcp_write_space >>> [3] xprt_write_space <-xs_write_space >>> [4] rpc_task_sleep: task:43546@5 ... >>> [5] xs_write_space <-xs_tcp_write_space >>> >>> [1] Task 43546 runs but is out of write space. >>> >>> [2] Space becomes available, xs_write_space() clears the >>> SOCKWQ_ASYNC_NOSPACE bit. >>> >>> [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but >>> this has not yet been queued and the wake up is lost. >>> >>> [4] xs_nospace() is called which calls xprt_wait_for_buffer_space() >>> which queues task 43546. >>> >>> [5] The call to sk->sk_write_space() at the end of xs_nospace() (which >>> is supposed to handle the above race) does not call >>> xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and >>> thus the task is not woken. >>> >>> Fix the race by have xprt_wait_for_buffer_space() check for write >>> space after putting the task to sleep. >>> >>> Signed-off-by: David Vrabel <david.vra...@citrix.com> >>> --- >>> include/linux/sunrpc/xprt.h | 1 + >>> net/sunrpc/xprt.c | 4 >>> net/sunrpc/xprtsock.c | 21 +++-- >>> 3 files changed, 24 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h >>> index a16070d..621e74b 100644 >>> --- a/include/linux/sunrpc/xprt.h >>> +++ b/include/linux/sunrpc/xprt.h >>> @@ -129,6 +129,7 @@ struct rpc_xprt_ops { >>> void(*connect)(struct rpc_xprt *xprt, struct rpc_task >>> *task); >>> void * (*buf_alloc)(struct rpc_task *task, size_t size); >>> void(*buf_free)(void *buffer); >>> + bool(*have_write_space)(struct rpc_xprt *task); >>> int (*send_request)(struct rpc_task *task); >>> void(*set_retrans_timeout)(struct rpc_task *task); >>> void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task); >>> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c >>> index ea244b2..d3c1b1e 100644 >>> --- a/net/sunrpc/xprt.c >>> +++ b/net/sunrpc/xprt.c >>> @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, >>> rpc_action action) >>> >>> task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; >>> rpc_sleep_on(>pending, task, action); >>> + >>> + /* Write space notification may race with putting task to sleep. */ >>> + if (xprt->ops->have_write_space(xprt)) >>> + rpc_wake_up_queued_task(>pending, task); >>> } >>> EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); >>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>> index bf16883..211de5b 100644 >>> --- a/net/sunrpc/xprtsock.c >>> +++ b/net/sunrpc/xprtsock.c >>> @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task) >>> >>> spin_unlock_bh(>transport_lock); >>> >>> - /* Race breaker in case memory is freed before above code is called */ >>> - sk->sk_write_space(sk); >>> return ret; >>> } >> >> Instead of these callbacks, why not just add a call to >> sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in >> xs_nospace()? Won’t that fix the existing race breaker? > > I don't see how that would help. If sk->sk_write_space was already > called, SOCKWQ_ASYNC_NOSPACE will still be clear and the next call to > sk->sk_write_space will still be a nop. Sorry. Copy+paste error. I meant SOCKWQ_ASYNC_NOSPACE. > > Or did you mean SOCKWQ_ASYNC_NOSPACE here? It doesn't seem right to set > this bit when we don't know if there's space or not. Why not?
Re: [PATCHv1] sunrpc: fix write space race causing stalls
> On Sep 16, 2016, at 08:28, David Vrabelwrote: > > Write space becoming available may race with putting the task to sleep > in xprt_wait_for_buffer_space(). The existing mechanism to avoid the > race does not work. > > This (edited) partial trace illustrates the problem: > > [1] rpc_task_run_action: task:43546@5 ... action=call_transmit > [2] xs_write_space <-xs_tcp_write_space > [3] xprt_write_space <-xs_write_space > [4] rpc_task_sleep: task:43546@5 ... > [5] xs_write_space <-xs_tcp_write_space > > [1] Task 43546 runs but is out of write space. > > [2] Space becomes available, xs_write_space() clears the >SOCKWQ_ASYNC_NOSPACE bit. > > [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but >this has not yet been queued and the wake up is lost. > > [4] xs_nospace() is called which calls xprt_wait_for_buffer_space() >which queues task 43546. > > [5] The call to sk->sk_write_space() at the end of xs_nospace() (which >is supposed to handle the above race) does not call >xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and >thus the task is not woken. > > Fix the race by have xprt_wait_for_buffer_space() check for write > space after putting the task to sleep. > > Signed-off-by: David Vrabel > --- > include/linux/sunrpc/xprt.h | 1 + > net/sunrpc/xprt.c | 4 > net/sunrpc/xprtsock.c | 21 +++-- > 3 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h > index a16070d..621e74b 100644 > --- a/include/linux/sunrpc/xprt.h > +++ b/include/linux/sunrpc/xprt.h > @@ -129,6 +129,7 @@ struct rpc_xprt_ops { > void(*connect)(struct rpc_xprt *xprt, struct rpc_task > *task); > void * (*buf_alloc)(struct rpc_task *task, size_t size); > void(*buf_free)(void *buffer); > + bool(*have_write_space)(struct rpc_xprt *task); > int (*send_request)(struct rpc_task *task); > void(*set_retrans_timeout)(struct rpc_task *task); > void(*timer)(struct rpc_xprt *xprt, struct rpc_task *task); > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index ea244b2..d3c1b1e 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -502,6 +502,10 @@ void xprt_wait_for_buffer_space(struct rpc_task *task, > rpc_action action) > > task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0; > rpc_sleep_on(>pending, task, action); > + > + /* Write space notification may race with putting task to sleep. */ > + if (xprt->ops->have_write_space(xprt)) > + rpc_wake_up_queued_task(>pending, task); > } > EXPORT_SYMBOL_GPL(xprt_wait_for_buffer_space); > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index bf16883..211de5b 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -472,8 +472,6 @@ static int xs_nospace(struct rpc_task *task) > > spin_unlock_bh(>transport_lock); > > - /* Race breaker in case memory is freed before above code is called */ > - sk->sk_write_space(sk); > return ret; > } Instead of these callbacks, why not just add a call to sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk) after queueing the task in xs_nospace()? Won’t that fix the existing race breaker? Cheers Trond
Re: [PATCH 1/2] NFSv4.1: work around -Wmaybe-uninitialized warning
> On Aug 31, 2016, at 09:37, Arnd Bergmann <a...@arndb.de> wrote: > > On Wednesday, August 31, 2016 1:17:48 PM CEST Trond Myklebust wrote: >> What version of gcc are you using? I’m unable to reproduce with gcc 6.1.1.. > > This is also on 6.1.1 for ARM. Note that 6e8d666e9253 ("Disable > "maybe-uninitialized" warning globally") turned off those warnings, so > unless you explicitly pass -Wmaybe-uninitialized (e.g. by building with > "make W=1"), you won't get it. > I’m not getting that error on gcc 6.1.1 for x86_64 with either “make W=1” or “make W=2”. “make W=3” does gives rise to one warning in nfs4_slot_get_seqid: /home/trondmy/devel/kernel/linux/fs/nfs/nfs4session.c: In function ‘nfs4_slot_get_seqid’: /home/trondmy/devel/kernel/linux/fs/nfs/nfs4session.c:184:10: warning: conversion to ‘int’ from ‘long int’ may alter its value [-Wconversion] return PTR_ERR(slot); ^ (which is another false positive) but that’s all... > The reason I'm still sending the patches for this warning is that > we do get a number of valid ones (this was the only false positive > out of the seven such warnings since last week). There is a Zen-like quality to IS_ERR() when it casts a const pointer to an unsigned long, back to a non-const pointer, and then back to an unsigned long before comparing it to another unsigned long cast constant negative integer. However, I’m not sure the C99 standard would agree that a positive test result implies we can assume that a simple cast of the same pointer to a signed long will result in a negative, non-zero valued errno. I suspect that if we really want to fix these false negatives, we should probably address that issue. Cheers Trond
Re: [PATCH 1/2] NFSv4.1: work around -Wmaybe-uninitialized warning
> On Aug 31, 2016, at 08:39, Arnd Bergmannwrote: > > A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use: > > fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in > this function [-Werror=maybe-uninitialized] > > gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair > results in a nonzero return value here. Using PTR_ERR_OR_ZERO() > instead makes this clear to the compiler. > > Signed-off-by: Arnd Bergmann > Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback > races") > --- > fs/nfs/nfs4session.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > The patch that caused this just came in for v4.8-rc5. As the warning > is now disabled by default and this is harmless, this can probably > get queued for v4.9 instead. > > I mentioned earlier that I got the new warning for net-next, but > failed to notice that it had come from mainline instead. > > diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c > index b62973045a3e..150c5a1879bf 100644 > --- a/fs/nfs/nfs4session.c > +++ b/fs/nfs/nfs4session.c > @@ -178,12 +178,14 @@ static int nfs4_slot_get_seqid(struct nfs4_slot_table > *tbl, u32 slotid, > __must_hold(>slot_tbl_lock) > { > struct nfs4_slot *slot; > + int ret; > > slot = nfs4_lookup_slot(tbl, slotid); > - if (IS_ERR(slot)) > - return PTR_ERR(slot); > - *seq_nr = slot->seq_nr; > - return 0; > + ret = PTR_ERR_OR_ZERO(slot); > + if (!ret) > + *seq_nr = slot->seq_nr; > + > + return ret; > } > What version of gcc are you using? I’m unable to reproduce with gcc 6.1.1..
Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV
Hi Eric, > On Jul 27, 2016, at 14:59, Eric Dumazet <eric.duma...@gmail.com> wrote: > > On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote: >> On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote: >>> >>>> On Jul 26, 2016, at 11:43, J. Bruce Fields <bfie...@fieldses.org> wrote: >>>> >>>> On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote: >>>>> We're seeing traces of the following form: >>>>> >>>>> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2 >>>>> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80 >>>>> [10952.396362] nfsd: connect from 10.2.6.1, port=187 >>>>> [10952.396364] svc: svc_setup_socket 8800b99bcf00 >>>>> [10952.396368] setting up TCP socket for reading >>>>> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet >>>>> 88042b75b800) >>>>> [10952.396373] svc: transport 8803eb10a000 put into queue >>>>> [10952.396375] svc: transport 88042ba4a000 put into queue >>>>> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = >>>>> 360) >>>>> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2 >>>>> [10952.396381] svc_recv: found XPT_CLOSE >>>>> [10952.396397] svc: svc_delete_xprt(8803eb10a000) >>>>> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000) >>>>> [10952.396399] svc: svc_sock_detach(8803eb10a000) >>>>> [10952.396412] svc: svc_sock_free(8803eb10a000) >>>>> >>>>> i.e. an immediate close of the socket after initialisation. >>>> >>>> Interesting, thanks! >>>> >>>> So the one thing I don't understand is why this is correct behavior for >>>> accept--I thought it wasn't supposed to return a socket until it was >>>> fully established. >>> >>> inet_accept() appears to allow SYN_RECV: >> >> OK. Cc'ing netdev just to make sure we didn't overlook anything. >> > > SYN_RECV after accept() is a TCP Fast Open property I think. > > Maybe you are playing with some global TCP Fast Open settings ? > The Linux kernel client should not be using TCP fast open, but it is possible that some of the other NFSv3 clients we’re using are. Would a standard knfsd listener respond to a TCP fast open request, or would the default behaviour be to ignore it? If the default behaviour for the server is to allow fast open, then we do need these patches, IMO. > >> (Also: what were user-visible symptoms? Mounts failing, or unexpected >> delays?) >> Connection retry storms on the server. >> --b. >> >>> >>> int inet_accept(struct socket *sock, struct socket *newsock, int flags) >>> { >>>struct sock *sk1 = sock->sk; >>>int err = -EINVAL; >>>struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, ); >>> >>>if (!sk2) >>>goto do_err; >>> >>>lock_sock(sk2); >>> >>>sock_rps_record_flow(sk2); >>>WARN_ON(!((1 << sk2->sk_state) & >>> (TCPF_ESTABLISHED | TCPF_SYN_RECV | >>> TCPF_CLOSE_WAIT | TCPF_CLOSE))); >>> >>>sock_graft(sk2, newsock); >>> >>>newsock->state = SS_CONNECTED; >>>err = 0; >>>release_sock(sk2); >>> do_err: >>>return err; >>> } > >
Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
> On Jun 30, 2016, at 08:59, Steven Rostedtwrote: > > [ resending as a new email, as I'm assuming people do not sort their > INBOX via last email on thread, thus my last email is sitting in the > bottom of everyone's INBOX ] > > I've hit this again. Not sure when it started, but I applied my old > debug trace_printk() patch (attached) and rebooted (4.5.7). I just > tested the latest kernel from Linus's tree (from last nights pull), and > it still gives me the problem. > > Here's the trace I have: > >kworker/3:1H-134 [003] ..s.61.036129: inet_csk_get_port: snum 805 >kworker/3:1H-134 [003] ..s.61.036135: > => sched_clock > => inet_addr_type_table > => security_capable > => inet_bind > => xs_bind > => release_sock > => sock_setsockopt > => __sock_create > => xs_create_sock.isra.19 > => xs_tcp_setup_socket > => process_one_work > => worker_thread > => worker_thread > => kthread > => ret_from_fork > => kthread >kworker/3:1H-134 [003] ..s.61.036136: inet_bind_hash: add 805 >kworker/3:1H-134 [003] ..s.61.036138: > => inet_csk_get_port > => sched_clock > => inet_addr_type_table > => security_capable > => inet_bind > => xs_bind > => release_sock > => sock_setsockopt > => __sock_create > => xs_create_sock.isra.19 > => xs_tcp_setup_socket > => process_one_work > => worker_thread > => worker_thread > => kthread > => ret_from_fork > => kthread >kworker/3:1H-134 [003] 61.036139: xs_bind: RPC: xs_bind > 4.136.255.255:805: ok (0) >kworker/3:1H-134 [003] 61.036140: xs_tcp_setup_socket: RPC: > worker connecting xprt 880407eca800 via tcp to 192.168.23.22 (port 43651) >kworker/3:1H-134 [003] 61.036162: xs_tcp_setup_socket: RPC: > 880407eca800 connect status 115 connected 0 sock state 2 > -0 [001] ..s.61.036450: xs_tcp_state_change: RPC: > xs_tcp_state_change client 880407eca800... > -0 [001] ..s.61.036452: xs_tcp_state_change: RPC: > state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 >kworker/1:1H-136 [001] 61.036476: xprt_connect_status: RPC: > 43 xprt_connect_status: retrying >kworker/1:1H-136 [001] 61.036478: xprt_prepare_transmit: RPC: > 43 xprt_prepare_transmit >kworker/1:1H-136 [001] 61.036479: xprt_transmit: RPC:43 > xprt_transmit(72) >kworker/1:1H-136 [001] 61.036486: xs_tcp_send_request: RPC: > xs_tcp_send_request(72) = 0 >kworker/1:1H-136 [001] 61.036487: xprt_transmit: RPC:43 xmit > complete > -0 [001] ..s.61.036789: xs_tcp_data_ready: RPC: > xs_tcp_data_ready... >kworker/1:1H-136 [001] 61.036798: xs_tcp_data_recv: RPC: > xs_tcp_data_recv started >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > reading TCP record fragment of length 24 >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > reading XID (4 bytes) >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > reading request with XID 2f4c3f88 >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > reading CALL/REPLY flag (4 bytes) >kworker/1:1H-136 [001] 61.036801: xs_tcp_data_recv: RPC: > read reply XID 2f4c3f88 >kworker/1:1H-136 [001] ..s.61.036801: xs_tcp_data_recv: RPC: > XID 2f4c3f88 read 16 bytes >kworker/1:1H-136 [001] ..s.61.036802: xs_tcp_data_recv: RPC: > xprt = 880407eca800, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 >kworker/1:1H-136 [001] ..s.61.036802: xprt_complete_rqst: RPC:43 > xid 2f4c3f88 complete (24 bytes received) >kworker/1:1H-136 [001] 61.036803: xs_tcp_data_recv: RPC: > xs_tcp_data_recv done >kworker/1:1H-136 [001] 61.036812: xprt_release: RPC:43 > release request 88040b270800 > > > # unhide-tcp > Unhide-tcp 20130526 > Copyright © 2013 Yago Jesus & Patrick Gouin > License GPLv3+ : GNU GPL version 3 or later > http://www.unhide-forensics.info > Used options: > [*]Starting TCP checking > > Found Hidden port that not appears in ss: 805 > What is a “Hidden port that not appears in ss: 805”, and what does this report mean? Are we failing to close a socket? Cheers Trond
Re: [PATCH 8/8] net: sunrpc: Replace CURRENT_TIME by current_fs_time()
On Mon, Feb 22, 2016 at 10:17 AM, Deepa Dinamani <deepa.ker...@gmail.com> wrote: > > CURRENT_TIME macro is not appropriate for filesystems as it > doesn't use the right granularity for filesystem timestamps. > Use current_fs_time() instead. > > Signed-off-by: Deepa Dinamani <deepa.ker...@gmail.com> > Cc: "J. Bruce Fields" <bfie...@fieldses.org> > Cc: Jeff Layton <jlay...@poochiereds.net> > Cc: Trond Myklebust <trond.mykleb...@primarydata.com> > Cc: Anna Schumaker <anna.schuma...@netapp.com> > Cc: "David S. Miller" <da...@davemloft.net> > Cc: linux-...@vger.kernel.org > Cc: netdev@vger.kernel.org > --- > net/sunrpc/rpc_pipe.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c > index 31789ef..bab3187 100644 > --- a/net/sunrpc/rpc_pipe.c > +++ b/net/sunrpc/rpc_pipe.c > @@ -477,7 +477,9 @@ rpc_get_inode(struct super_block *sb, umode_t mode) > return NULL; > inode->i_ino = get_next_ino(); > inode->i_mode = mode; > - inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; > + inode->i_atime = current_fs_time(sb); > + inode->i_mtime = inode->i_atime; > + inode->i_ctime = inode->i_atime; > switch (mode & S_IFMT) { > case S_IFDIR: > inode->i_fop = _dir_operations; Why would we care? This is a pseudo-fs. There is no expectation w.r.t. timestamp accuracy or resolution. Cheers, Trond
Re: [PATCH v2] sunrpc: fix waitqueue_active without memory barrier in sunrpc
On Fri, Oct 9, 2015 at 5:18 PM, J. Bruce Fieldswrote: > > On Fri, Oct 09, 2015 at 06:29:44AM +, Kosuke Tatsukawa wrote: > > Neil Brown wrote: > > > Kosuke Tatsukawa writes: > > > > > >> There are several places in net/sunrpc/svcsock.c which calls > > >> waitqueue_active() without calling a memory barrier. Add a memory > > >> barrier just as in wq_has_sleeper(). > > >> > > >> I found this issue when I was looking through the linux source code > > >> for places calling waitqueue_active() before wake_up*(), but without > > >> preceding memory barriers, after sending a patch to fix a similar > > >> issue in drivers/tty/n_tty.c (Details about the original issue can be > > >> found here: https://lkml.org/lkml/2015/9/28/849). > > > > > > hi, > > > this feels like the wrong approach to the problem. It requires extra > > > 'smb_mb's to be spread around which are hard to understand as easy to > > > forget. > > > > > > A quick look seems to suggest that (nearly) every waitqueue_active() > > > will need an smb_mb. Could we just put the smb_mb() inside > > > waitqueue_active()?? > > > > > > There are around 200 occurrences of waitqueue_active() in the kernel > > source, and most of the places which use it before wake_up are either > > protected by some spin lock, or already has a memory barrier or some > > kind of atomic operation before it. > > > > Simply adding smp_mb() to waitqueue_active() would incur extra cost in > > many cases and won't be a good idea. > > > > Another way to solve this problem is to remove the waitqueue_active(), > > making the code look like this; > > if (wq) > > wake_up_interruptible(wq); > > This also fixes the problem because the spinlock in the wake_up*() acts > > as a memory barrier and prevents the code from being reordered by the > > CPU (and it also makes the resulting code is much simpler). > > I might not care which we did, except I don't have the means to test > this quickly, and I guess this is some of our most frequently called > code. > > I suppose your patch is the most conservative approach, as the > alternative is a spinlock/unlock in wake_up_interruptible, which I > assume is necessarily more expensive than an smp_mb(). > > As far as I can tell it's been this way since forever. (Well, since a > 2002 patch "NFSD: TCP: rationalise locking in RPC server routines" which > removed some spinlocks from the data_ready routines.) > > I don't understand what the actual race is yet (which code exactly is > missing the wakeup in this case? nfsd threads seem to instead get > woken up by the wake_up_process() in svc_xprt_do_enqueue().) > Those threads still use blocking calls for sendpage() and sendmsg(), so presumably they may be affected. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Race with ip=dhcp bootparameter in ip_rcv_finish on am335x
+linux-nfs mailing list On Wed, Sep 23, 2015 at 8:27 AM, Alexander Aring <alex.ar...@gmail.com> wrote: > Hi, > > On Wed, Sep 23, 2015 at 01:57:57PM +0200, Alexander Aring wrote: > ... >> > >> >> Ok, I think I have two issues with two different races the first one was >> fixed by bde6f9ded1bd ("net: Initialize table in fib result"), but the >> second one is still there: >> >> [8.615806] [ cut here ] >> [8.620678] Kernel BUG at c016c3d0 [verbose debug info unavailable] >> [8.627229] Internal error: Oops - BUG: 0 [#1] SMP ARM >> [8.632611] Modules linked in: >> [8.635836] CPU: 0 PID: 766 Comm: kworker/0:1H Tainted: GW >> 4.2.0-11248-gfbd0351 #140 >> [8.645208] Hardware name: Generic AM33XX (Flattened Device Tree) >> [8.651616] Workqueue: rpciod xprt_autoclose >> [8.656091] task: ce3c52c0 ti: ce642000 task.ti: ce642000 >> [8.661744] PC is at iput+0x1a8/0x1f0 >> [8.665579] LR is at xprt_autoclose+0x2c/0x54 >> [8.670136] pc : []lr : []psr: 2113 >> [8.670136] sp : ce643e80 ip : fp : c0b56688 >> [8.682133] r10: 0001 r9 : ce643ec8 r8 : >> [8.687599] r7 : feff3000 r6 : ce615800 r5 : ce615bc0 r4 : ce615b54 >> [8.694421] r3 : 0060 r2 : 000f r1 : 0f10e000 r0 : cdbed720 >> [8.701254] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment >> none >> [8.708718] Control: 10c5387d Table: 80004019 DAC: 0051 >> [8.714732] Process kworker/0:1H (pid: 766, stack limit = 0xce642218) >> [8.721464] Stack: (0xce643e80 to 0xce644000) >> [8.726033] 3e80: c066f828 ce615b54 ce615bc0 ce615800 feff3000 >> ce643ec8 c066c884 >> [8.734596] 3ea0: ce615b54 ce5ff440 cfb9e340 c0057928 0001 >> c00578b4 cfb9e340 >> [8.743152] 3ec0: c0057cc8 c137972c c0cc1960 c09979f4 >> cfb9e340 cfb9e340 >> [8.751714] 3ee0: ce5ff458 cfb9e370 ce642000 0008 c0b55ba0 ce5ff440 >> cfb9e340 c0057c54 >> [8.760274] 3f00: ce659940 ce5ff440 c0057c18 ce659940 ce5ff440 >> c0057c18 >> [8.768834] 3f20: c005d918 c0b5697c >> ce5ff440 >> [8.777390] 3f40: dead4ead c0b65d60 >> >> [8.785951] 3f60: c0922088 ce643f64 ce643f64 dead4ead >> >> [8.794513] 3f80: c0b65d60 c0922088 ce643f90 ce643f90 >> ce643fac ce659940 >> [8.803069] 3fa0: c005d844 c000f770 >> >> [8.811628] 3fc0: >> >> [8.820185] 3fe0: 0013 >> 8fdf6861 8fdf6c61 >> [8.828741] [] (iput) from [] >> (xprt_autoclose+0x2c/0x54) >> [8.836133] [] (xprt_autoclose) from [] >> (process_one_work+0x19c/0x48c) >> [8.844784] [] (process_one_work) from [] >> (worker_thread+0x3c/0x4a0) >> [8.853256] [] (worker_thread) from [] >> (kthread+0xd4/0xf0) >> [8.860827] [] (kthread) from [] >> (ret_from_fork+0x14/0x24) >> [8.868387] Code: e59f0044 e59f1044 ebfb467a eac1 (e7f001f2) > > Additional missing information is that I am booting via nfsroot and > xprt_autoclose is something from sunrpc. > > Finally I figured out that commit > 4876cc779ff525b9c2376d8076edf47815e71f2c ("SUNRPC: Ensure we release the > TCP socket once it has been closed") occur this races. After reverting > this commit everything works fine. > > I added now: > > Steven Rostedt <rost...@goodmis.org> > Trond Myklebust <trond.mykleb...@primarydata.com> > > to cc to report about this issue. > Is that happening when the transport is being torn down? If so, is it fixed by http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=79234c3db6842a3de03817211d891e0c2878f756 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS/TCP/IPv6 acting strangely in 4.2
Hi Russell, On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux > wrote: > > Following that idea, I just tried the patch below, and it seems to > > work. > > I don't know whether it handles all cases after a call to > > kernel_connect(), > > but it stops the multiple connection attempts: > > > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 > > WS=128 > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 > > TSval=870318939 TSecr=15712 WS=64 > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0x905379cc, [Check: RD LU MD XT DL] > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0x905379cc, [Check: RD LU MD XT DL] > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), > > [Allowed: RD LU MD XT DL] > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), > > [Allowed: RD LU MD XT DL] > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), > > [Allowed: RD LU MD XT DL] > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: > > 0xe15fc9c9 > > ... > > NFS people - any comments on this patch? Is it the correct way to > solve > this problem (please see the first message in this thread for the > problem.) > Without this patch, NFS is unusable as it tries to launch multiple > new > connections from the same port to the NFS server without giving the > NFS > server time to respond and establish the TCP connection. I agree that it addresses a real problem here, however there are a couple of issues with the patch itself: AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and TCP_CLOSE, so if the connection attempt fails, this patch leaves the XPRT_CONNECTING flag set. There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection attempt by canceling the XPRT_CONNECTING state. How about the following? It is based on your patch, but adds a check to ensure that xs_tcp_state_change() doesn't clear the 'connecting' state more than once (which could otherwise still happen in the TCP_CLOSE case). 8<--- >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 From: Trond Myklebust <trond.mykleb...@primarydata.com> Date: Wed, 16 Sep 2015 23:43:17 -0400 Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete before retrying Commit 718ba5b87343, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: Russell King <li...@arm.linux.org.uk> Signed-off-by: Trond Myklebust <trond.mykleb...@primarydata.com> --- include/linux/sunrpc/xprtsock.h | 3 +++ net/sunrpc/xprtsock.c | 11 --- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 7591788e9fbf..357e44c1a46b 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -42,6 +42,7 @@ struct sock_xprt { /* * Connection of transports */ + unsigned long sock_state; struct delayed_work connect_worker; struct sockaddr_storage srcaddr; unsigned short srcport;
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Thu, Sep 17, 2015 at 5:47 PM, Russell King - ARM Linux <li...@arm.linux.org.uk> wrote: > On Thu, Sep 17, 2015 at 10:18:29AM -0400, Trond Myklebust wrote: >> Hi Russell, >> >> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: >> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux >> > wrote: >> > > Following that idea, I just tried the patch below, and it seems to >> > > work. >> > > I don't know whether it handles all cases after a call to >> > > kernel_connect(), >> > > but it stops the multiple connection attempts: >> > > >> > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 >> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 >> > > WS=128 >> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] >> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 >> > > TSval=870318939 TSecr=15712 WS=64 >> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 >> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 >> > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 >> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), >> > > [Allowed: RD LU MD XT DL] >> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 >> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), >> > > [Allowed: RD LU MD XT DL] >> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), >> > > [Allowed: RD LU MD XT DL] >> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: >> > > 0xe15fc9c9 >> > > ... >> > >> > NFS people - any comments on this patch? Is it the correct way to >> > solve >> > this problem (please see the first message in this thread for the >> > problem.) >> > Without this patch, NFS is unusable as it tries to launch multiple >> > new >> > connections from the same port to the NFS server without giving the >> > NFS >> > server time to respond and establish the TCP connection. >> >> I agree that it addresses a real problem here, however there are a >> couple of issues with the patch itself: >> >> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and >> TCP_CLOSE, so if the connection attempt fails, this patch leaves the >> XPRT_CONNECTING flag set. >> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, >> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection >> attempt by canceling the XPRT_CONNECTING state. >> >> How about the following? It is based on your patch, but adds a check to >> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state >> more than once (which could otherwise still happen in the TCP_CLOSE >> case). > > This patch also seems to fix the problem I've been seeing. > > Yes, I wasn't sure about my patch - I didn't spend much time properly > reading and understanding the sunrpc code, beyond analysing what was > going on to cause the problem and deciding on a way to stop it happening. > I really wasn't sure that clearing the connecting flag everywhere I did > was the right thing, which is why I didn't send the patch properly > dressed up. > >> 8<--- >> >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.mykleb...@primarydata.com> >> Date: Wed, 16 Sep 2015 23:43:17 -0400 >> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete
Re: NFS/TCP/IPv6 acting strangely in 4.2
On Thu, Sep 17, 2015 at 12:27 PM, Benjamin Coddington <bcodd...@redhat.com> wrote: > On Thu, 17 Sep 2015, Trond Myklebust wrote: > >> Hi Russell, >> >> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: >> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux >> > wrote: >> > > Following that idea, I just tried the patch below, and it seems to >> > > work. >> > > I don't know whether it handles all cases after a call to >> > > kernel_connect(), >> > > but it stops the multiple connection attempts: >> > > >> > > 1 0.00 armada388 -> n2100 TCP 1009→nfs [SYN] Seq=3794066539 >> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 >> > > WS=128 >> > > 2 0.000414 n2100 -> armada388 TCP nfs→1009 [SYN, ACK] >> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 >> > > TSval=870318939 TSecr=15712 WS=64 >> > > 3 0.000787 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066540 >> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 5 0.001566 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 >> > > Ack=379400 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 7 0.001866 n2100 -> armada388 TCP nfs→1009 [ACK] Seq=1884476523 >> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), >> > > [Allowed: RD LU MD XT DL] >> > > 9 0.003415 armada388 -> n2100 TCP 1009→nfs [ACK] Seq=3794066780 >> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), >> > > [Allowed: RD LU MD XT DL] >> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), >> > > [Allowed: RD LU MD XT DL] >> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: >> > > 0xe15fc9c9 >> > > ... >> > >> > NFS people - any comments on this patch? Is it the correct way to >> > solve >> > this problem (please see the first message in this thread for the >> > problem.) >> > Without this patch, NFS is unusable as it tries to launch multiple >> > new >> > connections from the same port to the NFS server without giving the >> > NFS >> > server time to respond and establish the TCP connection. >> >> I agree that it addresses a real problem here, however there are a >> couple of issues with the patch itself: >> >> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and >> TCP_CLOSE, so if the connection attempt fails, this patch leaves the >> XPRT_CONNECTING flag set. >> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, >> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection >> attempt by canceling the XPRT_CONNECTING state. >> >> How about the following? It is based on your patch, but adds a check to >> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state >> more than once (which could otherwise still happen in the TCP_CLOSE >> case). >> >> 8<--- >> From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.mykleb...@primarydata.com> >> Date: Wed, 16 Sep 2015 23:43:17 -0400 >> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete >> before retrying >> >> Commit 718ba5b87343, moved the responsibility for unlocking the socket to >> xs_tcp_setup_socket, meaning that the socket will be unlocked before we >> know that it has finished trying to connect. The following patch is based on >> an initial patch by Russell King to ensure that we delay clearing the >> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate >> a connection attempt, or the connection attempt itself failed. >> >> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from >> racing") >> Reported-by: Russell King <li...@arm.linux.org.uk> >> Signed-off-by: Trond Myklebust <trond.mykleb...@primarydata.com> > > This fixes up my network segmentation problem, tested on top of your "Fix > races between socket connection and destroy code". > > Tested-by: Benjamin Coddington <bcodd...@redhat.com> > Thanks Ben! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 14/39] SUNRPC: drop null test before destroy functions
On Mon, Sep 14, 2015 at 12:07 PM, J. Bruce Fieldswrote: > ACK, but assuming Trond takes this one.--b. No problem. I'll pick it up... Cheers Trond > On Sun, Sep 13, 2015 at 02:15:07PM +0200, Julia Lawall wrote: >> Remove unneeded NULL test. >> >> The semantic patch that makes this change is as follows: >> (http://coccinelle.lip6.fr/) >> >> // >> @@ expression x; @@ >> -if (x != NULL) >> \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); >> // >> >> Signed-off-by: Julia Lawall >> >> --- >> net/sunrpc/sched.c | 12 >> 1 file changed, 4 insertions(+), 8 deletions(-) >> >> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c >> index b140c09..425ca2f 100644 >> --- a/net/sunrpc/sched.c >> +++ b/net/sunrpc/sched.c >> @@ -1092,14 +1092,10 @@ void >> rpc_destroy_mempool(void) >> { >> rpciod_stop(); >> - if (rpc_buffer_mempool) >> - mempool_destroy(rpc_buffer_mempool); >> - if (rpc_task_mempool) >> - mempool_destroy(rpc_task_mempool); >> - if (rpc_task_slabp) >> - kmem_cache_destroy(rpc_task_slabp); >> - if (rpc_buffer_slabp) >> - kmem_cache_destroy(rpc_buffer_slabp); >> + mempool_destroy(rpc_buffer_mempool); >> + mempool_destroy(rpc_task_mempool); >> + kmem_cache_destroy(rpc_task_slabp); >> + kmem_cache_destroy(rpc_buffer_slabp); >> rpc_destroy_wait_queue(_queue); >> } >> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Global protection fault in netfilter code
Hi, When doing NFS stress tests in a VM with a recent kernel (yesterday's commit 7ddab73346a1 Merge branch 'fixes' of git ://ftp.arm.linux.org.uk/~rmk/linux-arm), I've been seeing the following General Protection Fault code apparently in the nf_conntrack code: PID: 358TASK: 88003630cb80 CPU: 0 COMMAND: kworker/0:1H #0 [88013a603680] die at 81007608 #1 [88013a6036b0] do_general_protection at 8100407a #2 [88013a6036e0] general_protection at 817888c8 [exception RIP: detach_if_pending+103] RIP: 81101b37 RSP: 88013a603798 RFLAGS: 00010086 RAX: dead00200200 RBX: 8800b9771bd8 RCX: 000f RDX: 88013a60e818 RSI: 88013a60d980 RDI: 0046 RBP: 88013a6037b8 R8: R9: 0001 R10: 88013a60d998 R11: 0001 R12: 8800b9771bd8 R13: 88013a60d980 R14: R15: 0001 ORIG_RAX: CS: 0010 SS: 0018 #3 [88013a6037c0] mod_timer_pending at 81101fd2 #4 [88013a603820] __nf_ct_refresh_acct at a055891b [nf_conntrack] #5 [88013a603850] tcp_packet at a056232e [nf_conntrack] #6 [88013a603970] nf_conntrack_in at a055b70a [nf_conntrack] #7 [88013a603a40] ipv4_conntrack_in at a0576326 [nf_conntrack_ipv4] #8 [88013a603a50] nf_iterate at 81688dad #9 [88013a603aa0] nf_hook_slow at 81688e42 #10 [88013a603af0] ip_rcv at 81695ca3 #11 [88013a603b60] __netif_receive_skb_core at 8164d688 #12 [88013a603c00] __netif_receive_skb at 8164e108 #13 [88013a603c20] netif_receive_skb_internal at 8164f7f6 #14 [88013a603c60] napi_gro_complete at 8164fbf7 #15 [88013a603cb0] dev_gro_receive at 81650508 #16 [88013a603d20] napi_gro_receive at 81650a6b #17 [88013a603d50] e1000_clean_rx_irq at a00308db [e1000] #18 [88013a603e00] e1000_clean at a0030f3d [e1000] #19 [88013a603ec0] net_rx_action at 8164ffda #20 [88013a603f40] __do_softirq at 81087a18 #21 [88013a603fb0] do_softirq_own_stack at 8178875c --- IRQ stack --- #22 [880035fabaf0] do_softirq_own_stack at 8178875c [exception RIP: unknown or invalid address] RIP: 88007d7a6108 RSP: 88007d7a60d0 RFLAGS: a02827e5 RAX: 810868a9 RBX: 880035fabb38 RCX: 88007d7a6108 RDX: 880136e56a00 RSI: 880035fabb78 RDI: 81786209 RBP: 810d662d R8: 880035fabb58 R9: fe00 R10: 0046 R11: 810867e5 R12: 0046 R13: 810ce565 R14: 880035fabb18 R15: ORIG_RAX: 8800a4408840 CS: 880035fabbc8 SS: 0001 WARNING: possibly bogus exception frame #23 [880035fabbd0] nfs41_wake_and_assign_slot at a06c7a9d [nfsv4] #24 [880035fabbe0] nfs41_sequence_done at a069c7c0 [nfsv4] #25 [880035fabc30] nfs4_sequence_done at a069caaf [nfsv4] #26 [880035fabc40] nfs4_read_done at a06a2c0e [nfsv4] #27 [880035fabc60] nfs_readpage_done at a0654736 [nfs] #28 [880035fabc90] nfs_pgio_result at a0653414 [nfs] #29 [880035fabcc0] rpc_exit_task at a027f10c [sunrpc] #30 [880035fabce0] __rpc_execute at a02820dd [sunrpc] #31 [880035fabd60] rpc_async_schedule at a0282725 [sunrpc] #32 [880035fabd70] process_one_work at 8109fe89 #33 [880035fabdf0] worker_thread at 810a04ae #34 [880035fabe60] kthread at 810a6aef #35 [880035fabf50] ret_from_fork at 81786f5f I do not see that in vanilla Linux-4.1, so it seems to be a 4.2 cycle thing. Is anyone else seeing this, and is it being looked at by the netfilter folks? -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.mykleb...@primarydata.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 15:52 -0400, Jeff Layton wrote: On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs -local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Right, it was a little unclear to me before, but it now seems clear that the callback socket that the server is opening to the client is the one squatting on the port. ...and that sort of makes sense, doesn't it? That rpc_clnt will stick around for the life of the client's lease, and the rpc_clnt binds to a particular port so that it can reconnect using the same one. Given that Stephen has done the legwork and figured out that reverting those commits fixes the issue, then I suspect that the real culprit is caf4ccd4e88cf2. The client is likely closing down the other end of the callback socket when it goes idle. Before that commit, we probably did an xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves the port bound. Agreed. I've been looking into whether or not there is a simple fix. Reverting those patches is not an option, because the whole point was to ensure that the socket is in the TCP_CLOSED state before we release the socket. Steven, how about something like the following patch? 8- From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..75dcdadf0269 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct socket *sock = transport-sock; - if (sock != NULL) { + if (sock == NULL) + return; + if (xprt_connected(xprt)) { kernel_sock_shutdown(sock, SHUT_RDWR); trace_rpc_socket_shutdown(xprt, sock); - } + } else + xs_reset_transport(transport
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote: On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: Steven, how about something like the following patch? OK, the box I'm running this on is using v4.0.5, can you make a patch based on that, as whatever you make needs to go to stable as well. Is it causing any other damage than the rkhunter warning you reported? distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c on fedora/8 failed distcc[31554] (dcc_build_somewhere) Warning: remote compilation of '/home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c' failed, retrying locally distcc[31554] Warning: failed to distribute /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c to fedora/8, running locally instead /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown': /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration of function 'xs_reset_transport' [-Werror=implicit-function -declaration] /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c: At top level: /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types for 'xs_reset_transport' [enabled by default] /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of 'xs_reset_transport' follows non-static declaration /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit declaration of 'xs_reset_transport' was here cc1: some warnings being treated as errors distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux -build.git/net/sunrpc/xprtsock.c on localhost failed /home/rostedt/work/git/nobackup/linux -build.git/scripts/Makefile.build:258: recipe for target 'net/sunrpc/xprtsock.o' failed make[3]: *** [net/sunrpc/xprtsock.o] Error 1 Sorry. I sent that one off too quickly. Try the following. 8-- From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt rost...@goodmis.org Signed-off-by: Trond Myklebust trond.mykleb...@primarydata.com --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 40 ++-- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt-ops-close(xprt); clear_bit(XPRT_CLOSE_WAIT, xprt-state); + xprt-ops-close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..ee0715dfc3c7 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -623,24 +623,6 @@ process_status: } /** - * xs_tcp_shutdown - gracefully shut down a TCP socket - * @xprt: transport - * - * Initiates a graceful shutdown of the TCP socket by calling the - * equivalent of shutdown(SHUT_RDWR); - */ -static void xs_tcp_shutdown(struct rpc_xprt *xprt) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct socket *sock = transport-sock; - - if (sock != NULL) { - kernel_sock_shutdown(sock, SHUT_RDWR); - trace_rpc_socket_shutdown(xprt, sock); - } -} - -/** * xs_tcp_send_request - write an RPC request to a TCP socket * @task: address of RPC task that manages the state of an RPC request * @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** @@ -2103,6 +2086,27 @@ out: xprt_wake_pending_tasks(xprt, status); } +/** + * xs_tcp_shutdown - gracefully shut down a TCP socket + * @xprt: transport + * + * Initiates a graceful shutdown of the TCP socket
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 9:27 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust trond.mykleb...@primarydata.com wrote: 8-- From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust trond.mykleb...@primarydata.com Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 (SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Is there a way to test RPC traffic resuming? I'd like to try that before declaring this bug harmless. You should be seeing the same issue if you mount an NFSv3 partition. After about 5 minutes of inactivity, the client will close down the connection to the server, and rkhunter should again see the phantom socket. If you then try to access the partition, the RPC layer should immediately release the socket and establish a new connection on the same port. Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt rost...@goodmis.org wrote: I don't see that 55201 anywhere. But then again, I didn't look for it before the port disappeared. I could reboot and look for it again. I should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo options nfs callback_tcpport=4048 /etc/modprobe.d/nfs-local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt rost...@goodmis.org wrote: On Fri, 12 Jun 2015 11:50:38 -0400 Steven Rostedt rost...@goodmis.org wrote: I reverted the following commits: c627d31ba0696cbd829437af2be2f2dee3546b1e 9e2b9f37760e129cee053cc7b6e7288acc2a7134 caf4ccd4e88cf2795c927834bc488c8321437586 And the issue goes away. That is, I watched the port go from ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. In fact, I watched the port with my portlist.c module, and it disappeared there too when it entered the TIME_WAIT state. I've scanned those commits again and again, and I'm not seeing how we could be introducing a socket leak there. The only suspect I can see would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you using NFS swap? I've been running v4.0.5 with the above commits reverted for 5 days now, and there's still no hidden port appearing. What's the status on this? Should those commits be reverted or is there another solution to this bug? I'm trying to reproduce, but I've had no luck yet. Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote: On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote: I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter started reporting a hidden port on my box. Running unhide-tcp I see this: # unhide-tcp Unhide-tcp 20121229 Copyright © 2012 Yago Jesus Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 946 [*]Starting UDP checking This scared the hell out of me as I'm thinking that I have got some kind of NSA backdoor hooked into my server and it is monitoring my plans to smuggle Kinder Überraschung into the USA from Germany. I panicked! Well, I wasted the day writing modules to first look at all the sockets opened by all processes (via their file descriptors) and posted their port numbers. http://rostedt.homelinux.com/private/tasklist.c But this port wasn't there either. Then I decided to look at the ports in tcp_hashinfo. http://rostedt.homelinux.com/private/portlist.c This found the port but no file was connected to it, and worse yet, when I first ran it without using probe_kernel_read(), it crashed my kernel, because sk-sk_socket pointed to a freed socket! Note, each boot, the hidden port is different. Finally, I decided to bring in the big guns, and inserted a trace_printk() into the bind logic, to see if I could find the culprit. After fiddling with it a few times, I found a suspect: kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 Bah, it's a kernel thread doing it, via a work queue. I then added a trace_dump_stack() to find what was calling this, and here it is: kworker/3:1H-123 [003] ..s.96.696222: stack trace = inet_csk_get_port = inet_addr_type = inet_bind = xs_bind = sock_setsockopt = __sock_create = xs_create_sock.isra.18 = xs_tcp_setup_socket = process_one_work = worker_thread = worker_thread = kthread = kthread = ret_from_fork = kthread I rebooted, and examined what happens. I see the kworker binding that port, and all seems well: # netstat -tapn |grep 946 tcp0 0 192.168.23.9:946192.168.23.22:55201 ESTABLISHED - But waiting for a bit, the connection goes into a TIME_WAIT, and then it just disappears. But the bind to the port does not get released, and that port is from then on, taken. This never happened with my 3.19 kernels. I would bisect it but this is happening on my main server box which I usually only reboot every other month doing upgrades. It causes too much disturbance for myself (and my family) as when this box is offline, basically the rest of my machines are too. I figured this may be enough information to see if you can fix it. Otherwise I can try to do the bisect, but that's not going to happen any time soon. I may just go back to 3.19 for now, such that rkhunter stops complaining about the hidden port. The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set SO_REUSEPORT socket option for TCP connections). Perhaps there is an issue with that? Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure? I thought that SO_LINGER is more about controlling how the socket behaves w.r.t. waiting for the TCP_CLOSE state to be achieved (i.e. about aborting the FIN state negotiation early). I've never observed an effect on the TCP time-wait states. Are you sure SO_REUSEADDR was not the thing you wanted ? Yes. SO_REUSEADDR has the problem that it requires you bind to something other than 0.0.0.0, so it is less appropriate for outgoing connections; the RPC code really should not have to worry about routing and routability of a particular source address. Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote: I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter started reporting a hidden port on my box. Running unhide-tcp I see this: # unhide-tcp Unhide-tcp 20121229 Copyright © 2012 Yago Jesus Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 946 [*]Starting UDP checking This scared the hell out of me as I'm thinking that I have got some kind of NSA backdoor hooked into my server and it is monitoring my plans to smuggle Kinder Überraschung into the USA from Germany. I panicked! Well, I wasted the day writing modules to first look at all the sockets opened by all processes (via their file descriptors) and posted their port numbers. http://rostedt.homelinux.com/private/tasklist.c But this port wasn't there either. Then I decided to look at the ports in tcp_hashinfo. http://rostedt.homelinux.com/private/portlist.c This found the port but no file was connected to it, and worse yet, when I first ran it without using probe_kernel_read(), it crashed my kernel, because sk-sk_socket pointed to a freed socket! Note, each boot, the hidden port is different. Finally, I decided to bring in the big guns, and inserted a trace_printk() into the bind logic, to see if I could find the culprit. After fiddling with it a few times, I found a suspect: kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 Bah, it's a kernel thread doing it, via a work queue. I then added a trace_dump_stack() to find what was calling this, and here it is: kworker/3:1H-123 [003] ..s.96.696222: stack trace = inet_csk_get_port = inet_addr_type = inet_bind = xs_bind = sock_setsockopt = __sock_create = xs_create_sock.isra.18 = xs_tcp_setup_socket = process_one_work = worker_thread = worker_thread = kthread = kthread = ret_from_fork = kthread I rebooted, and examined what happens. I see the kworker binding that port, and all seems well: # netstat -tapn |grep 946 tcp0 0 192.168.23.9:946192.168.23.22:55201 ESTABLISHED - But waiting for a bit, the connection goes into a TIME_WAIT, and then it just disappears. But the bind to the port does not get released, and that port is from then on, taken. This never happened with my 3.19 kernels. I would bisect it but this is happening on my main server box which I usually only reboot every other month doing upgrades. It causes too much disturbance for myself (and my family) as when this box is offline, basically the rest of my machines are too. I figured this may be enough information to see if you can fix it. Otherwise I can try to do the bisect, but that's not going to happen any time soon. I may just go back to 3.19 for now, such that rkhunter stops complaining about the hidden port. The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set SO_REUSEPORT socket option for TCP connections). Perhaps there is an issue with that? Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SUNRPC: Mark buffer used for debug printks with __maybe_unused
On Wed, 2008-02-20 at 19:27 +0300, Pavel Emelyanov wrote: Patrick McHardy wrote: Joe Perches wrote: On Wed, 2008-02-20 at 07:29 -0800, Joe Perches wrote: fs/nfsd/nfsproc.c: char buf[RPC_MAX_ADDRBUFLEN]; Perhaps there should be a DECLARE_RPC_BUF(buf) macro? #define DECLARE_RPC_BUF(var) char var[MAC_BUF_SIZE] __maybe_unused Make that: #define DECLARE_RPC_BUF(var) char var[RPC_MAX_ADDRBUFLEN] __maybe_unuse OK, I'll send the patch in a moment. 1) Please always Cc [EMAIL PROTECTED] when changing the sunrpc code 2) Please don't use the name RPC_BUF. These are debugging strings, not buffers. Something like DECLARE_RPC_DEBUG_STR() would be more appropriate. 3) If you're going to use a macro, why not just use the existing RPC_IFDEBUG()? Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SUNRPC: Mark buffer used for debug printks with __maybe_unused
On Wed, 2008-02-20 at 08:36 -0800, Joe Perches wrote: On Wed, 2008-02-20 at 16:35 +0100, Patrick McHardy wrote: Alternatively change the dprintk macro to behave similar like pr_debug() and mark things like svc_print_addr() __pure, which has the advantage that is still performs format checking even if debugging is disabled. I think it's better to change the dprintk style macros to what Philip Craig suggested. http://marc.info/?l=linux-wirelessm=120338413108120w=2 This makes clear to the compiler that the called function is not going to be used so it can be optimized away, keeps any argument verification in place, and doesn't require __pure attributes on arbitrary functions that may be called during the dprintk #ifdef DEBUG #define some_print_wrapper(fmt, arg...) \ do { if (0) printk(KERN_DEBUG fmt, ##arg); } while (0) #else #define some_print_wrapper(fmt, arg...) \ printk(KERN_DEBUG fmt, ##arg) #endif Have you actually read include/linux/sunrpc/debug.h? Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] SUNRPC: Mark buffer used for debug printks with __maybe_unused
On Wed, 2008-02-20 at 09:23 -0800, Joe Perches wrote: On Wed, 2008-02-20 at 12:02 -0500, Trond Myklebust wrote: #ifdef DEBUG #define some_print_wrapper(fmt, arg...) \ do { if (0) printk(KERN_DEBUG fmt, ##arg); } while (0) #else #define some_print_wrapper(fmt, arg...) \ printk(KERN_DEBUG fmt, ##arg) #endif Have you actually read include/linux/sunrpc/debug.h? Yes, I have. What's there: #define dfprintk(fac, args...) do ; while (0) vs what's suggested: #define dfprintk(fac, args...) \ do { if (0) printk(##args); } while (0); No argument verification is done to args There has been code that fails to compile with -DDEBUG when the code use two different #ifdef DEBUG #else macros. I think some of the USB code was reworked because of that. The extra verification is just a guard against bad arguments when compiled normally. It's similar to what's done in kernel.h pr_debug without the __attribute__((format(printf,x,y))) so that calls made as arguments to functions aren't called unnecessarily. RPC_DEBUG is on by default if SYSCTL is compiled in, so it is not as if we're having any trouble typechecking the arguments. In fact, most of the major distros also tend to enable RPC_DEBUG, since it does come in handy when their customers report rpc/nfs problems. As you can see, there is already a macro RPC_IFDEBUG to deal with simple code that depends on whether or not RPC_DEBUG is set. Using that will make it obvious to a reader exactly when the declaration will be optimised away, and why, without compromising gcc's ability to warn us if it were to be unused due to some future code change. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET: Add the helper kernel_sock_shutdown()
On Mon, 2007-11-12 at 12:22 +, David Howells wrote: Trond Myklebust [EMAIL PROTECTED] wrote: take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the Hmmm... Why SHUT_*? Why not SHUTDOWN_*? SHUT_RD/SHUT_WR/SHUT_RDWR are the traditional names for these constants (see 'man 3 shutdown') and so should be easier to remember. I didn't however feel comfortable naming the function kernel_shutdown() or shutdown(), since those have other connotations, hence kernel_sock_shutdown(). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RESEND] NET: Add the helper kernel_sock_shutdown()
...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers. Looking at the sock-op-shutdown() handlers, it looks as if all of them take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the RCV_SHUTDOWN/SEND_SHUTDOWN arguments. Add a helper, and then define the SHUT_* enum to ensure that kernel users of shutdown() don't get confused. Signed-off-by: Trond Myklebust [EMAIL PROTECTED] Cc: Paul Clements [EMAIL PROTECTED] Acked-by: Mark Fasheh [EMAIL PROTECTED] Cc: Steve French [EMAIL PROTECTED] Acked-by: David Howells [EMAIL PROTECTED] Cc: David S. Miller [EMAIL PROTECTED] --- drivers/block/nbd.c|3 ++- fs/cifs/connect.c |2 +- fs/ocfs2/cluster/tcp.c |4 ++-- include/linux/net.h|8 net/rxrpc/ar-local.c |4 ++-- net/socket.c |6 ++ 6 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 6332aca..b4c0888 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -28,6 +28,7 @@ #include linux/err.h #include linux/kernel.h #include net/sock.h +#include linux/net.h #include asm/uaccess.h #include asm/system.h @@ -126,7 +127,7 @@ static void sock_shutdown(struct nbd_device *lo, int lock) if (lo-sock) { printk(KERN_WARNING %s: shutting down socket\n, lo-disk-disk_name); - lo-sock-ops-shutdown(lo-sock, SEND_SHUTDOWN|RCV_SHUTDOWN); + kernel_sock_shutdown(lo-sock, SHUT_RDWR); lo-sock = NULL; } if (lock) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 1102160..c52a76f 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -160,7 +160,7 @@ cifs_reconnect(struct TCP_Server_Info *server) if (server-ssocket) { cFYI(1, (State: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); - server-ssocket-ops-shutdown(server-ssocket, SEND_SHUTDOWN); + kernel_sock_shutdown(server-ssocket, SHUT_WR); cFYI(1, (Post shutdown state: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 685c180..d84bd15 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -58,6 +58,7 @@ #include linux/slab.h #include linux/idr.h #include linux/kref.h +#include linux/net.h #include net/tcp.h #include asm/uaccess.h @@ -616,8 +617,7 @@ static void o2net_shutdown_sc(struct work_struct *work) del_timer_sync(sc-sc_idle_timeout); o2net_sc_cancel_delayed_work(sc, sc-sc_keepalive_work); sc_put(sc); - sc-sc_sock-ops-shutdown(sc-sc_sock, - RCV_SHUTDOWN|SEND_SHUTDOWN); + kernel_sock_shutdown(sc-sc_sock, SHUT_RDWR); } /* not fatal so failed connects before the other guy has our diff --git a/include/linux/net.h b/include/linux/net.h index dd79cdb..596131e 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -95,6 +95,12 @@ enum sock_type { #endif /* ARCH_HAS_SOCKET_TYPES */ +enum sock_shutdown_cmd { + SHUT_RD = 0, + SHUT_WR = 1, + SHUT_RDWR = 2, +}; + /** * struct socket - general BSD socket * @state: socket state (%SS_CONNECTED, etc) @@ -223,6 +229,8 @@ extern int kernel_setsockopt(struct socket *sock, int level, int optname, extern int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); extern int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg); +extern int kernel_sock_shutdown(struct socket *sock, + enum sock_shutdown_cmd how); #ifndef CONFIG_SMP #define SOCKOPS_WRAPPED(name) name diff --git a/net/rxrpc/ar-local.c b/net/rxrpc/ar-local.c index fe03f71..f3a2bd7 100644 --- a/net/rxrpc/ar-local.c +++ b/net/rxrpc/ar-local.c @@ -114,7 +114,7 @@ static int rxrpc_create_local(struct rxrpc_local *local) return 0; error: - local-socket-ops-shutdown(local-socket, 2); + kernel_sock_shutdown(local-socket, SHUT_RDWR); local-socket-sk-sk_user_data = NULL; sock_release(local-socket); local-socket = NULL; @@ -267,7 +267,7 @@ static void rxrpc_destroy_local(struct work_struct *work) /* finish cleaning up the local descriptor */ rxrpc_purge_queue(local-accept_queue); rxrpc_purge_queue(local-reject_queue); - local-socket-ops-shutdown(local-socket, 2); + kernel_sock_shutdown(local-socket, SHUT_RDWR); sock_release(local-socket); up_read(rxrpc_local_sem); diff --git a/net/socket.c b/net/socket.c index 5d879fd..74784df 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2319,6 +2319,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg) return err
Re: [PATCH] NET: Add the helper kernel_sock_shutdown()
On Sun, 2007-11-11 at 10:03 -0800, Mark Fasheh wrote: On Thu, Nov 08, 2007 at 07:01:36PM -0500, Trond Myklebust wrote: From: Trond Myklebust [EMAIL PROTECTED] ...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers. Looking at the sock-op-shutdown() handlers, it looks as if all of them take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the RCV_SHUTDOWN/SEND_SHUTDOWN arguments. Add a helper, and then define the SHUT_* enum to ensure that kernel users of shutdown() don't get confused. That looks pretty good - any objection to naming the enum and using that name in the prototype for kernel_sock_shutdown() so it's even more obvious what type of shutdown argument this expects? --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] That would be fine by me. How about the attached patch? Cheers Trond ---BeginMessage--- ...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers. Looking at the sock-op-shutdown() handlers, it looks as if all of them take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the RCV_SHUTDOWN/SEND_SHUTDOWN arguments. Add a helper, and then define the SHUT_* enum to ensure that kernel users of shutdown() don't get confused. Signed-off-by: Trond Myklebust [EMAIL PROTECTED] Cc: Paul Clements [EMAIL PROTECTED] Cc: Mark Fasheh [EMAIL PROTECTED] Cc: Steve French [EMAIL PROTECTED] Cc: David Howells [EMAIL PROTECTED] Cc: David S Miller [EMAIL PROTECTED] --- drivers/block/nbd.c|3 ++- fs/cifs/connect.c |2 +- fs/ocfs2/cluster/tcp.c |4 ++-- include/linux/net.h|8 net/rxrpc/ar-local.c |4 ++-- net/socket.c |6 ++ 6 files changed, 21 insertions(+), 6 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 6332aca..b4c0888 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -28,6 +28,7 @@ #include linux/err.h #include linux/kernel.h #include net/sock.h +#include linux/net.h #include asm/uaccess.h #include asm/system.h @@ -126,7 +127,7 @@ static void sock_shutdown(struct nbd_device *lo, int lock) if (lo-sock) { printk(KERN_WARNING %s: shutting down socket\n, lo-disk-disk_name); - lo-sock-ops-shutdown(lo-sock, SEND_SHUTDOWN|RCV_SHUTDOWN); + kernel_sock_shutdown(lo-sock, SHUT_RDWR); lo-sock = NULL; } if (lock) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 19ee11f..bea0d2e 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -160,7 +160,7 @@ cifs_reconnect(struct TCP_Server_Info *server) if (server-ssocket) { cFYI(1, (State: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); - server-ssocket-ops-shutdown(server-ssocket, SEND_SHUTDOWN); + kernel_sock_shutdown(server-ssocket, SHUT_WR); cFYI(1, (Post shutdown state: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 685c180..d84bd15 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -58,6 +58,7 @@ #include linux/slab.h #include linux/idr.h #include linux/kref.h +#include linux/net.h #include net/tcp.h #include asm/uaccess.h @@ -616,8 +617,7 @@ static void o2net_shutdown_sc(struct work_struct *work) del_timer_sync(sc-sc_idle_timeout); o2net_sc_cancel_delayed_work(sc, sc-sc_keepalive_work); sc_put(sc); - sc-sc_sock-ops-shutdown(sc-sc_sock, - RCV_SHUTDOWN|SEND_SHUTDOWN); + kernel_sock_shutdown(sc-sc_sock, SHUT_RDWR); } /* not fatal so failed connects before the other guy has our diff --git a/include/linux/net.h b/include/linux/net.h index dd79cdb..596131e 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -95,6 +95,12 @@ enum sock_type { #endif /* ARCH_HAS_SOCKET_TYPES */ +enum sock_shutdown_cmd { + SHUT_RD = 0, + SHUT_WR = 1, + SHUT_RDWR = 2, +}; + /** * struct socket - general BSD socket * @state: socket state (%SS_CONNECTED, etc) @@ -223,6 +229,8 @@ extern int kernel_setsockopt(struct socket *sock, int level, int optname, extern int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); extern int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg); +extern int kernel_sock_shutdown(struct socket *sock, + enum sock_shutdown_cmd how); #ifndef CONFIG_SMP #define SOCKOPS_WRAPPED(name) name diff --git a/net/rxrpc/ar-local.c b/net/rxrpc/ar-local.c index fe03f71..f3a2bd7 100644 --- a/net/rxrpc/ar-local.c +++ b/net/rxrpc/ar-local.c @@ -114,7 +114,7 @@ static int rxrpc_create_local(struct
[PATCH] NET: Add the helper kernel_sock_shutdown()
From: Trond Myklebust [EMAIL PROTECTED] ...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers. Looking at the sock-op-shutdown() handlers, it looks as if all of them take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the RCV_SHUTDOWN/SEND_SHUTDOWN arguments. Add a helper, and then define the SHUT_* enum to ensure that kernel users of shutdown() don't get confused. Signed-off-by: Trond Myklebust [EMAIL PROTECTED] Cc: Paul Clements [EMAIL PROTECTED] Cc: Mark Fasheh [EMAIL PROTECTED] Cc: Steve French [EMAIL PROTECTED] Cc: David Howells [EMAIL PROTECTED] Cc: David S Miller [EMAIL PROTECTED] --- drivers/block/nbd.c|3 ++- fs/cifs/connect.c |2 +- fs/ocfs2/cluster/tcp.c |4 ++-- include/linux/net.h|7 +++ net/rxrpc/ar-local.c |4 ++-- net/socket.c |6 ++ 6 files changed, 20 insertions(+), 6 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 6332aca..b4c0888 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -28,6 +28,7 @@ #include linux/err.h #include linux/kernel.h #include net/sock.h +#include linux/net.h #include asm/uaccess.h #include asm/system.h @@ -126,7 +127,7 @@ static void sock_shutdown(struct nbd_device *lo, int lock) if (lo-sock) { printk(KERN_WARNING %s: shutting down socket\n, lo-disk-disk_name); - lo-sock-ops-shutdown(lo-sock, SEND_SHUTDOWN|RCV_SHUTDOWN); + kernel_sock_shutdown(lo-sock, SHUT_RDWR); lo-sock = NULL; } if (lock) diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 19ee11f..bea0d2e 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -160,7 +160,7 @@ cifs_reconnect(struct TCP_Server_Info *server) if (server-ssocket) { cFYI(1, (State: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); - server-ssocket-ops-shutdown(server-ssocket, SEND_SHUTDOWN); + kernel_sock_shutdown(server-ssocket, SHUT_WR); cFYI(1, (Post shutdown state: 0x%x Flags: 0x%lx, server-ssocket-state, server-ssocket-flags)); diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index 685c180..d84bd15 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -58,6 +58,7 @@ #include linux/slab.h #include linux/idr.h #include linux/kref.h +#include linux/net.h #include net/tcp.h #include asm/uaccess.h @@ -616,8 +617,7 @@ static void o2net_shutdown_sc(struct work_struct *work) del_timer_sync(sc-sc_idle_timeout); o2net_sc_cancel_delayed_work(sc, sc-sc_keepalive_work); sc_put(sc); - sc-sc_sock-ops-shutdown(sc-sc_sock, - RCV_SHUTDOWN|SEND_SHUTDOWN); + kernel_sock_shutdown(sc-sc_sock, SHUT_RDWR); } /* not fatal so failed connects before the other guy has our diff --git a/include/linux/net.h b/include/linux/net.h index dd79cdb..c804d81 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -95,6 +95,12 @@ enum sock_type { #endif /* ARCH_HAS_SOCKET_TYPES */ +enum { + SHUT_RD = 0, + SHUT_WR = 1, + SHUT_RDWR = 2, +}; + /** * struct socket - general BSD socket * @state: socket state (%SS_CONNECTED, etc) @@ -223,6 +229,7 @@ extern int kernel_setsockopt(struct socket *sock, int level, int optname, extern int kernel_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); extern int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg); +extern int kernel_sock_shutdown(struct socket *sock, int how); #ifndef CONFIG_SMP #define SOCKOPS_WRAPPED(name) name diff --git a/net/rxrpc/ar-local.c b/net/rxrpc/ar-local.c index fe03f71..f3a2bd7 100644 --- a/net/rxrpc/ar-local.c +++ b/net/rxrpc/ar-local.c @@ -114,7 +114,7 @@ static int rxrpc_create_local(struct rxrpc_local *local) return 0; error: - local-socket-ops-shutdown(local-socket, 2); + kernel_sock_shutdown(local-socket, SHUT_RDWR); local-socket-sk-sk_user_data = NULL; sock_release(local-socket); local-socket = NULL; @@ -267,7 +267,7 @@ static void rxrpc_destroy_local(struct work_struct *work) /* finish cleaning up the local descriptor */ rxrpc_purge_queue(local-accept_queue); rxrpc_purge_queue(local-reject_queue); - local-socket-ops-shutdown(local-socket, 2); + kernel_sock_shutdown(local-socket, SHUT_RDWR); sock_release(local-socket); up_read(rxrpc_local_sem); diff --git a/net/socket.c b/net/socket.c index 5d879fd..1e41b15 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2319,6 +2319,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg) return err; } +int kernel_sock_shutdown(struct
Re: NFS mount gives ENETDOWN in -git15
On Sat, 2007-07-21 at 15:31 +0200, Andi Kleen wrote: I tried to mount another nfs mount on a system running with nfsroot. But I get # mount basil:/home /basil/home/ mount: Network is down The network is not down of course, the system is happily running with nfs root from that server. Userland is older SUSE 10.0 Does Al's patch help in any way? Cheers Trond ---BeginMessage--- Obviously broken on little-endian; fortunately, the option is not frequently used... Signed-off-by: Al Viro [EMAIL PROTECTED] --- diff --git a/fs/nfs/super.c b/fs/nfs/super.c index b34b7a7..b2a851c 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -732,7 +732,7 @@ static int nfs_parse_mount_options(char *raw, return 0; if (option 0 || option 65535) return 0; - mnt-nfs_server.address.sin_port = htonl(option); + mnt-nfs_server.address.sin_port = htons(option); break; case Opt_rsize: if (match_int(args, mnt-rsize)) ---End Message---
Re: networking busted in current -git ???
On Fri, 2007-06-08 at 19:06 -0700, David Miller wrote: From: Trond Myklebust [EMAIL PROTECTED] Date: Fri, 08 Jun 2007 17:43:27 -0400 It is not dhcp. I'm seeing the same bug with bog-standard ifup with a static address on an FC-6 machine. It appears to be something in the latest dump from davem to Linus, but I haven't yet had time to identify what. Linus's current tree should have this fixed. Let us know if this is not the case. It appears to be working for me again. Trond - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: networking busted in current -git ???
On Fri, 2007-06-08 at 23:07 +0200, Arkadiusz Miskiewicz wrote: On Friday 08 of June 2007, you wrote: Hello, I am using the current git tree: 85f6038f2170e3335dda09c3dfb0f83110e87019 . Git tree from two days ago (with the same config) works fine. Attempting to acquire an IP address via DHCP fails with: SIOCSIFADDR: No buffer space available Listening on LPF/eth0/00:19:b9:0c:9a:43 Sending on LPF/eth0/00:19:b9:0c:9a:43 Sending on Socket/fallback DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from xxx.xxx.xxx.xxx SIOCSIFADDR: No buffer space available SIOCSIFNETMASK: Cannot assign requested address SIOCSIFBRDADDR: Cannot assign requested address SIOCADDRT: Network is unreachable bound to xxx.xxx.xxx.xxx -- renewal in 98610 seconds. This is on a Dell 490 with tg3 network driver running Ubuntu 7.04 . .config and dmesg are appended. florin Here it requires few retries (stop dhcpcd, start again) to get the IP. git tree from few hours ago. tg3 driver. I also saw SIOCSIFADDR: No buffer space available once. (added netdev to the Cc list) It is not dhcp. I'm seeing the same bug with bog-standard ifup with a static address on an FC-6 machine. It appears to be something in the latest dump from davem to Linus, but I haven't yet had time to identify what. Cheers Trond - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] fix fs/nfs/nfsroot.c compile error
On Sat, 2007-05-05 at 18:44 +0200, Adrian Bunk wrote: On Sat, May 05, 2007 at 01:49:55AM -0700, Andrew Morton wrote: ... Changes since 2.6.21-rc7-mm2: ... git-net.patch ... git trees ... match_table_t was made const and gcc doesn't like const __initdata: Then please revert the fix to match_table_t... There is no reason for something like the nfsroot parser to be kept in memory after the system has booted. That would be code bloat. Trond - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [-mm patch] fix fs/nfs/nfsroot.c compile error
On Sat, 2007-05-05 at 13:20 -0400, Trond Myklebust wrote: On Sat, 2007-05-05 at 18:44 +0200, Adrian Bunk wrote: On Sat, May 05, 2007 at 01:49:55AM -0700, Andrew Morton wrote: ... Changes since 2.6.21-rc7-mm2: ... git-net.patch ... git trees ... match_table_t was made const and gcc doesn't like const __initdata: Then please revert the fix to match_table_t... There is no reason for something like the nfsroot parser to be kept in memory after the system has booted. That would be code bloat. Trond Alternatively, please change the nfsroot parser to use static struct match_token tokens[] __initdata = { That is in any case cleaner than using a typedef. Trond - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.19-rc1: known regressions (v2)
Cc: Linus Torvalds [EMAIL PROTECTED], Linux Kernel Mailing List linux-kernel@vger.kernel.org, [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Prakash Punnoor [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Steve Fox [EMAIL PROTECTED], netdev@vger.kernel.org, Michael S. Tsirkin [EMAIL PROTECTED], [EMAIL PROTECTED], linux-acpi@vger.kernel.org, Pavel Machek [EMAIL PROTECTED], Olaf Hering [EMAIL PROTECTED], Antonino Daplas [EMAIL PROTECTED], [EMAIL PROTECTED], Thierry Vignaud [EMAIL PROTECTED], [EMAIL PROTECTED], linux-ide@vger.kernel.org, Ernst Herzberg [EMAIL PROTECTED], Matthieu Castet [EMAIL PROTECTED], linux-usb-devel@lists.sourceforge.net, Jens Axboe [EMAIL PROTECTED], Benjamin Herrenschmidt [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Andreas Schwab [EMAIL PROTECTED], Mel Gorman [EMAIL PROTECTED], Alex Romosan [EMAIL PROTECTED], Sukadev Bhattiprolu [EMAIL PROTECTED] com, Dave Kleikamp [EMAIL PROTECTED], Torsten Kaiser [EMAIL PROTECTED], Magnus Damm [EMAIL PROTECTED], Vivek Goyal [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], Alistair John Strachan [EMAIL PROTECTED], Stefan Richter [EMAIL PROTECTED], [EMAIL PROTECTED] In-Reply-To: [EMAIL PROTECTED] References: [EMAIL PROTECTED] [EMAIL PROTECTED] Content-Type: text/plain Content-Transfer-Encoding: 7bit Organization: Network Appliance Inc Date: Sun, 08 Oct 2006 00:43:35 -0400 Message-Id: [EMAIL PROTECTED] Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 On Sat, 2006-10-07 at 23:46 +0200, Adrian Bunk wrote: Subject: NFSv4 fails to mount (timeout) References : http://bugzilla.kernel.org/show_bug.cgi?id=7274 Submitter : Torsten Kaiser [EMAIL PROTECTED] Guilty : Trond Myklebust [EMAIL PROTECTED] commit 51b6ded4d9a94a61035deba1d8f51a54e3a3dd86 Handled-By : Trond Myklebust [EMAIL PROTECTED] Patch : http://bugzilla.kernel.org/show_bug.cgi?id=7274 Status : patch available Thanks... Always nice to hear that you have been judged and found guilty. Now go and reread that fucking bug report... - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] nfs: deadlock prevention for NFS
Grumble... If your patches are targetting NFS, could you please at the very least Cc [EMAIL PROTECTED] and/or myself. On Fri, 2006-08-25 at 17:40 +0200, Peter Zijlstra wrote: Provide a proper a_ops-swapfile() implementation for NFS. This will set the NFS socket to SOCK_VMIO and put the socket reconnection under PF_MEMALLOC (I hope this is enough, otherwise more work needs to be done). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/file.c | 21 - include/linux/sunrpc/xprt.h |4 +++- net/sunrpc/xprtsock.c | 16 3 files changed, 39 insertions(+), 2 deletions(-) Index: linux-2.6/fs/nfs/file.c === --- linux-2.6.orig/fs/nfs/file.c +++ linux-2.6/fs/nfs/file.c @@ -27,6 +27,7 @@ #include linux/slab.h #include linux/pagemap.h #include linux/smp_lock.h +#include net/sock.h #include asm/uaccess.h #include asm/system.h @@ -317,7 +318,25 @@ static int nfs_release_page(struct page static int nfs_swapfile(struct address_space *mapping, int enable) { - return 0; + int err = -EINVAL; + struct rpc_clnt *client = NFS_CLIENT(mapping-host); + struct sock *sk = client-cl_xprt-inet; + + if (enable) { + client-cl_xprt-swapper = 1; + /* + * keep one extra sock reference so the reserve won't dip + * when the socket gets reconnected. + */ + sk_adjust_memalloc(1, 1); + err = sk_set_vmio(sk); + } else if (client-cl_xprt-swapper) { + client-cl_xprt-swapper = 0; + sk_adjust_memalloc(-1, -1); + err = sk_clear_vmio(sk); + } + + return err; } This all belongs in net/sunrpc/xprtsock.c. The NFS code has no business screwing around with the internals of the sunrpc transport. const struct address_space_operations nfs_file_aops = { Index: linux-2.6/net/sunrpc/xprtsock.c === --- linux-2.6.orig/net/sunrpc/xprtsock.c +++ linux-2.6/net/sunrpc/xprtsock.c @@ -1014,6 +1014,7 @@ static void xs_udp_connect_worker(void * { struct rpc_xprt *xprt = (struct rpc_xprt *) args; struct socket *sock = xprt-sock; + unsigned long pflags = current-flags; int err, status = -EIO; if (xprt-shutdown || xprt-addr.sin_port == 0) @@ -1021,6 +1022,9 @@ static void xs_udp_connect_worker(void * dprintk(RPC: xs_udp_connect_worker for xprt %p\n, xprt); + if (xprt-swapper) + current-flags |= PF_MEMALLOC; + /* Start by resetting any existing state */ xs_close(xprt); @@ -1054,6 +1058,9 @@ static void xs_udp_connect_worker(void * xprt-sock = sock; xprt-inet = sk; + if (xprt-swapper) + sk_set_vmio(sk); + write_unlock_bh(sk-sk_callback_lock); } xs_udp_do_set_buffer_size(xprt); @@ -1061,6 +1068,7 @@ static void xs_udp_connect_worker(void * out: xprt_wake_pending_tasks(xprt, status); xprt_clear_connecting(xprt); + current-flags = pflags; } /* @@ -1097,11 +1105,15 @@ static void xs_tcp_connect_worker(void * { struct rpc_xprt *xprt = (struct rpc_xprt *)args; struct socket *sock = xprt-sock; + unsigned long pflags = current-flags; int err, status = -EIO; if (xprt-shutdown || xprt-addr.sin_port == 0) goto out; + if (xprt-swapper) + current-flags |= PF_MEMALLOC; + dprintk(RPC: xs_tcp_connect_worker for xprt %p\n, xprt); if (!xprt-sock) { @@ -1170,10 +1182,14 @@ static void xs_tcp_connect_worker(void * break; } } + + if (xprt-swapper) + sk_set_vmio(xprt-inet); out: xprt_wake_pending_tasks(xprt, status); out_clear: xprt_clear_connecting(xprt); + current-flags = pflags; } How does this guarantee that the socket reconnection won't fail? Also, what about the case of rpc_malloc()? Can't that cause rpciod to deadlock when you add NFS swap into the equation? Cheers, Trond - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Patch] Dead code in net/sunrpc/auth_gss/auth_gss.c
On Tue, 2006-04-18 at 18:07 +0200, Eric Sesterhenn wrote: Hi, the coverity checker spotted that cred is always NULL when we jump to out_err ( there is just one case, when we fail to allocate the memory for cred ) This is Coverity ID #79 Signed-off-by: Eric Sesterhenn [EMAIL PROTECTED] I'll take it, but please send sunrpc patches to the NFS lists, not netdev. Cheers, Trond --- linux-2.6.17-rc1/net/sunrpc/auth_gss/auth_gss.c.orig 2006-04-18 13:23:22.0 +0200 +++ linux-2.6.17-rc1/net/sunrpc/auth_gss/auth_gss.c 2006-04-18 13:24:10.0 +0200 @@ -794,7 +794,6 @@ gss_create_cred(struct rpc_auth *auth, s out_err: dprintk(RPC: gss_create_cred failed with error %d\n, err); - if (cred) gss_destroy_cred(cred-gc_base); return ERR_PTR(err); } - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]
Sorry to be cross-posting, but does this bug ring any bells? I'm having trouble seeing how the sunrpc server code could be at fault. Cheers, Trond ---BeginMessage--- http://bugzilla.kernel.org/show_bug.cgi?id=5644 Summary: NFS v3 TCP 3-way handshake incorrect, iptables blocks access Kernel Version: 2.6.14 Status: NEW Severity: blocking Owner: [EMAIL PROTECTED] Submitter: [EMAIL PROTECTED] Most recent kernel where this bug did not occur: Distribution: Can't remember, possibly FC2. Hardware Environment: Software Environment: Problem Description: Steps to reproduce: 1. Boot NFS v3 TCP client running iptables mount NFS filesystem 2. Do a normal NFS client reboot try mounting the same filesystem again 3. Experience intermittent failure to read superblock The cause of this problem is NFS server's improper response to SYN packet sent by the client. This occurs *after* successful client authorization, when the client tries to open the connection (i.e. sends SYN to the server's nfs port) to read the superblock. The server (sometimes) responds with a pure ACK without the SYN bit set. This is blocked by iptables -- thus, mount fails with a could not read superblock message. Here is an excerpt from ethereal log: 3 0.021733client SERVER TCP 800 nfs [SYN] Seq=0 Ack=0 Win=5840 Len=0 MSS=1460 TSV=24095 TSER=0 WS=2 4 0.021846SERVER client TCP nfs 800 [ACK] Seq=9138391 Ack=3580883479 Win=16022 Len=0 TSV=244936050 TSER=1149400 5 0.021864client SERVER ICMP Destination unreachable (Host administratively prohibited) The above problem occurs with a very simple default+ssh iptables configuration. Disabling iptables on the client makes the problem go away. Even with iptables active, there is no problem when nfsd responds with a proper [SYN,ACK] instead of just pure ACK (this happens intermittently after the client reboot). Please fix nfsd so that it reliably responds to SYN packets with proper [SYN,ACK] packets instead of just ACK packets. Apparently, nfsd state doesn't get properly reset on client reboots. Other people have reported autofs failures which may be related (e.g. on remounts). --- You are receiving this mail because: --- You are the assignee for the bug, or are watching the assignee. ---End Message---