Hi.
I'm currently trying to optimize our NFS server. We're running in a
cluster setup with a single NFS server and some compute nodes pulling data
from it. Currently the dataset is less than 10GB so it fits in memory of
the NFS-server. (confirmed via vmstat 1).
Currently I'm getting around 500mb
On Tue, 5 Feb 2008 23:35:48 -0500
Christoph Hellwig <[EMAIL PROTECTED]> wrote:
> On Tue, Feb 05, 2008 at 02:37:57PM -0500, Jeff Layton wrote:
> > Because kthread_stop blocks until the kthread actually goes down,
> > we have to send the signal before calling it. This means that there
> > is a very
Hi,
On 02/06/2008 11:04:34 AM +0100, "Jesper Krogh" <[EMAIL PROTECTED]> wrote:
Hi.
I'm currently trying to optimize our NFS server. We're running in a
cluster setup with a single NFS server and some compute nodes pulling data
from it. Currently the dataset is less than 10GB so it fits in memory
On Wed, 2008-02-06 at 19:24 +1300, Andrew Dixie wrote:
> > The fact that the delegreturn call appears to have hit xprt_timer is
> > interesting. Under normal circumstances, timeouts should never occur
> > under NFSv4. Could you tell us what mount options you're using here?
> >
> > Also please coul
On Wed, Feb 06, 2008 at 10:00:21AM -0500, Trond Myklebust wrote:
>
> On Wed, 2008-02-06 at 19:24 +1300, Andrew Dixie wrote:
> > The following appears in the server logs:
> > Feb 4 08:28:01 devfile kernel: NFSD: setclientid: string in use by
> > client(clientid 47945499/1c88)
> > Feb 4 08:34
On Wed, 2008-02-06 at 10:07 -0500, J. Bruce Fields wrote:
> That went into 2.6.22:
>
> 21315edd4877b593d5bf.. "[PATCH] knfsd: nfsd4: demote "clientid
> in use" printk to a dprintk"
>
> It may suggest a problem if this is happening a lot, though, right?
The client should always be a
On Wed, 2008-02-06 at 15:37 +0100, Gabriel Barazer wrote:
> >
> > Should I go for NFSv2 (default if I dont change mount options) NFSv3 ? or
> > NFSv4
>
> NFSv2/3 have nearly the same performance
Only if you shoot yourself in the foot by setting the 'async' flag
in /etc/exports. Don't do that..
> Hi,
>> I'm currently trying to optimize our NFS server. We're running in a
>> cluster setup with a single NFS server and some compute nodes pulling
>> data from it. Currently the dataset is less than 10GB so it fits in
>> memory of the NFS-server. (confirmed via vmstat 1). Currently I'm
>> gettin
Now that it no longer does an RPC ping, lockd always ends up queueing
an RPC task for the GRANT_MSG callback. But, it also requeues the block
for later attempts. Since these are hard RPC tasks, if the client we're
calling back goes unresponsive the GRANT_MSG callbacks can stack up in
the RPC queue.
It's currently possible for an unresponsive NLM client to completely
lock up a server's lockd. The scenario is something like this:
1) client1 (or a process on the server) takes a lock on a file
2) client2 tries to take a blocking lock on the same file and
awaits the callback
3) client2 goes un
This patchset fixes the problem that Bruce pointed out last week when
we were discussing the lockd-kthread patches.
The main problem is described in patch #1 and that patch also fixes the
DoS. The remaining patches clean up how GRANT_MSG callbacks handle an
unresponsive client. The goal in those i
With the current scheme in nlmsvc_grant_blocked, we can end up with more
than one GRANT_MSG callback for a block in flight. Right now, we requeue
the block unconditionally so that a GRANT_MSG callback is done again in
30s. If the client is unresponsive, it can take more than 30s for the
call alread
It's possible for lockd to catch a SIGKILL while a GRANT_MSG callback
is in flight. If this happens we don't want lockd to insert the block
back into the nlm_blocked list.
This helps that situation, but there's still a possible race. Fixing
that will mean adding real locking for nlm_blocked.
Sign
On Wed, Feb 06, 2008 at 10:15:23AM -0500, Trond Myklebust wrote:
>
> On Wed, 2008-02-06 at 10:07 -0500, J. Bruce Fields wrote:
>
> > That went into 2.6.22:
> >
> > 21315edd4877b593d5bf.. "[PATCH] knfsd: nfsd4: demote "clientid
> > in use" printk to a dprintk"
> >
> > It may suggest a pr
On Wed, 2008-02-06 at 12:23 -0500, J. Bruce Fields wrote:
> On Wed, Feb 06, 2008 at 10:15:23AM -0500, Trond Myklebust wrote:
> >
> > On Wed, 2008-02-06 at 10:07 -0500, J. Bruce Fields wrote:
> >
> > > That went into 2.6.22:
> > >
> > > 21315edd4877b593d5bf.. "[PATCH] knfsd: nfsd4: demote "cli
This is the tenth iteration of the patchset to convert lockd to use the
kthread API. This patchset is smaller than the earlier ones since some
of the patches in those sets have already been taken into Bruce's tree.
This set only changes lockd to use the kthread API.
The only real difference betwee
Needed since the plan is to not have a svc_create_thread helper and to
have current users of that function just call kthread_run directly.
Signed-off-by: Jeff Layton <[EMAIL PROTECTED]>
Reviewed-by: NeilBrown <[EMAIL PROTECTED]>
Signed-off-by: J. Bruce Fields <[EMAIL PROTECTED]>
---
net/sunrpc/sv
Have lockd_up start lockd using kthread_run. With this change,
lockd_down now blocks until lockd actually exits, so there's no longer
need for the waitqueue code at the end of lockd_down. This also means
that only one lockd can be running at a time which simplifies the code
within lockd's main loop
On 02/06/2008 4:18:16 PM +0100, Trond Myklebust
<[EMAIL PROTECTED]> wrote:
On Wed, 2008-02-06 at 15:37 +0100, Gabriel Barazer wrote:
Should I go for NFSv2 (default if I dont change mount options) NFSv3 ? or
NFSv4
NFSv2/3 have nearly the same performance
Only if you shoot yourself in the foot
Hello all,
Thanks to Chuck's help i finally decided to proceed to a git bisect and
found the bad patch. Is there anybody that has an idea why it breaks
userspace nfs servers as we have seen ? Sorry for emailing directly
Chuck Lever and Andrew Morton but i really wanted to thank Chuck for his
On Wed, Feb 06, 2008 at 12:52:17PM -0500, Trond Myklebust wrote:
>
> On Wed, 2008-02-06 at 12:23 -0500, J. Bruce Fields wrote:
> > On Wed, Feb 06, 2008 at 10:15:23AM -0500, Trond Myklebust wrote:
> > >
> > > On Wed, 2008-02-06 at 10:07 -0500, J. Bruce Fields wrote:
> > >
> > > > That went into 2
On Wed, 2008-02-06 at 13:21 -0500, Jeff Layton wrote:
> Have lockd_up start lockd using kthread_run. With this change,
> lockd_down now blocks until lockd actually exits, so there's no longer
> need for the waitqueue code at the end of lockd_down. This also means
> that only one lockd can be runni
On Wed, 2008-02-06 at 19:24 +0100, Gabriel Barazer wrote:
> Oops (tm)! Fortunately I do mostly reads, but maybe the exports(5) man
> page should be updated. According to the man page, I thought that
> although writes aren't commited to the block devices, the server-side
> cache is correctly syn
On Wed, 06 Feb 2008 13:36:31 -0500
Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> On Wed, 2008-02-06 at 13:21 -0500, Jeff Layton wrote:
> > Have lockd_up start lockd using kthread_run. With this change,
> > lockd_down now blocks until lockd actually exits, so there's no
> > longer need for the wa
On Wed, 2008-02-06 at 13:47 -0500, Jeff Layton wrote:
> There's no guarantee that kthread_stop() won't wake up lockd before
> schedule_timeout() gets called, but after the last check for
> kthread_should_stop().
Doesn't the BKL pretty much eliminate this race? (assuming you transform
that call to
On Wed, 06 Feb 2008 13:52:34 -0500
Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> On Wed, 2008-02-06 at 13:47 -0500, Jeff Layton wrote:
> > There's no guarantee that kthread_stop() won't wake up lockd before
> > schedule_timeout() gets called, but after the last check for
> > kthread_should_stop(
Hi Gianluca-
On Feb 6, 2008, at 1:25 PM, Gianluca Alberici wrote:
Hello all,
Thanks to Chuck's help i finally decided to proceed to a git bisect
and found the bad patch. Is there anybody that has an idea why it
breaks userspace nfs servers as we have seen ? Sorry for emailing
directly Chu
On Wed, 6 Feb 2008 13:47:02 -0500
Jeff Layton <[EMAIL PROTECTED]> wrote:
> On Wed, 06 Feb 2008 13:36:31 -0500
> Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> >
> > On Wed, 2008-02-06 at 13:21 -0500, Jeff Layton wrote:
> > > Have lockd_up start lockd using kthread_run. With this change,
> > > lo
On 02/06/2008 4:59:39 PM +0100, "Jesper Krogh" <[EMAIL PROTECTED]> wrote:
I have a similar setup, and I'm very curious on how you can read an
"iowait" value from the clients: On my nodes (server 2.6.21.5/clients
2.6.23.14), the iowait counter is only incremented when dealing with
block devices,
Gabriel Barazer wrote:
On 02/06/2008 4:59:39 PM +0100, "Jesper Krogh" <[EMAIL PROTECTED]> wrote:
I have a similar setup, and I'm very curious on how you can read an
"iowait" value from the clients: On my nodes (server 2.6.21.5/clients
2.6.23.14), the iowait counter is only incremented when deal
> > + dotdot.d_name.name = "..";
> > + dotdot.d_name.len = 2;
> > +
> > + lock_kernel();
> > + if (!udf_find_entry(child->d_inode, &dotdot, &fibh, &cfi))
> > + goto out_unlock;
> Have you ever tried this? I think this could never work. UDF doesn't have
> entry named .. in a dire
On Thu, Feb 07, 2008 at 10:19:06AM +1300, Andrew Dixie wrote:
>
> > Oh, right, I was confusing client and server reboot and assuming the
> > client would forget the uniquifier on server reboot. That's obviously
> > wrong! The client will forget its own uniquifier on client reboot, but
> > that's
Hi Chuck,
I finally got it. Problem and solution have been found from 6 month but
nobody cared...up to now those servers have not been mantained, this
problem is not discussed anywhere else than the following link.
The bug (userspace server side i would say at this point) is well
described fro
On Wed, 06 Feb 2008 22:55:02 +0100
Gianluca Alberici <[EMAIL PROTECTED]> wrote:
> I finally got it. Problem and solution have been found from 6 month but
> nobody cared...up to now those servers have not been mantained, this
> problem is not discussed anywhere else than the following link.
> The
> Oh, right, I was confusing client and server reboot and assuming the
> client would forget the uniquifier on server reboot. That's obviously
> wrong! The client will forget its own uniquifier on client reboot, but
> that's alright since it's happy enough just to let that old state time
> out a
> What is rpciod doing while the machine hangs?
> Does 'netstat -t' show an active tcp connection to the server?
> Does tcpdump show any traffic going on the wire?
> What server are you running against? From the error messages below, I
> see it is a Linux machine, but which kernel is it run
On Thu, 2008-02-07 at 11:40 +1300, Andrew Dixie wrote:
> > What is rpciod doing while the machine hangs?
> > Does 'netstat -t' show an active tcp connection to the server?
> > Does tcpdump show any traffic going on the wire?
> > What server are you running against? From the error messages
On Wed, 2008-02-06 at 14:09 -0500, Jeff Layton wrote:
> On Wed, 06 Feb 2008 13:52:34 -0500
> Trond Myklebust <[EMAIL PROTECTED]> wrote:
>
> >
> > On Wed, 2008-02-06 at 13:47 -0500, Jeff Layton wrote:
> > > There's no guarantee that kthread_stop() won't wake up lockd before
> > > schedule_timeout
2.6.23-stable review patch. If anyone has any objections, please let us know.
--
From: NeilBrown <[EMAIL PROTECTED]>
patch ba67a39efde8312e386c6f603054f8945433d91f in mainline.
When RPCSEC/GSS and krb5i is used, requests are padded, typically to a multiple
of 8 bytes. This can
On Feb 5, 2008, at 9:12 PM, Kevin Coffman wrote:
If the Mac server code can support other encryption types like Triple
DES and ArcFour, you shouldn't need to limit it to only the
des-cbc-crc key. The Linux nfs-utils code on the client should be
limiting the negotiated encryption type to des.
I
Hi:
I did some extensive digging into the codebase and I believe I have the
reason
why exportfs -a flushes out the caches after NFS clients have mounted
the NFS filesystem.
The analysis is complicated, but here's
the crux of the matter:
There is a difference in the /etc/exports and the kernel m
Hi,
I've been looking at NLM_HOST_MAX in fs/lockd/host.c, as we have a
patch in SLES that makes it configurable, and the patch needs to
either go upstream or out the window...
But the code that uses NLM_HOST_MAX is weird! Look:
#define NLM_HOST_EXPIRE ((nrhosts > NLM_HOST_MAX)? 300
On Wed, Feb 06, 2008 at 09:58:02PM +0100, Rasmus Rohde wrote:
> Probably not. I just tested that I could read files and navigate the
> directory structure. However looking into UDF I think you are right - it
> will fail.
> I have extended udf_find_entry() to do an explicit check based on
> fileChar
At a higher level, in general, I think the kernel exports table need not
match /etc/exports at all. When we run "exportfs -a" again, what the
codebase intends to do is the following:
1. Scan /etc/exports and verify that an entry exists (create one if not)
in its in core exports table. Mark each of
Hi:
I did some extensive digging into the codebase and I believe I have the
reason why exportfs -a flushes out the caches after NFS clients have
mounted the NFS filesystem.
The analysis is complicated, but here's
the crux of the matter:
There is a difference in the /etc/exports and the kernel m
On Wednesday February 6, [EMAIL PROTECTED] wrote:
> > > + dotdot.d_name.name = "..";
> > > + dotdot.d_name.len = 2;
> > > +
> > > + lock_kernel();
> > > + if (!udf_find_entry(child->d_inode, &dotdot, &fibh, &cfi))
> > > + goto out_unlock;
> > Have you ever tried this? I think this could n
On Thursday January 31, [EMAIL PROTECTED] wrote:
> >
> > Does the MIPS box have the /proc/fs/nfsd/ filesystem mounted?
>
> Ahh, I see what you mean. Yes, it is mounted, both /proc/fs/nfsd and
> /proc/fs/nfs. However, I can see from the code that check_new_cache()
> checks for a file "filehandle"
On Wed, Feb 06, 2008 at 09:58:02PM +0100, Rasmus Rohde wrote:
> > > + dotdot.d_name.name = "..";
> > > + dotdot.d_name.len = 2;
> > > +
> > > + lock_kernel();
> > > + if (!udf_find_entry(child->d_inode, &dotdot, &fibh, &cfi))
> > > + goto out_unlock;
> > Have you ever tried this? I think
Hi Neil:
Thanks for responding. My response goes below:
> -Original Message-
> From: Neil Brown [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 06, 2008 9:22 PM
> To: Anirban Sinha
> Cc: Greg Banks; linux-nfs@vger.kernel.org
> Subject: RE: kernel exports table flushes out on running
Ok - I have checked get_parent and it works as expected.
I used the "Neil Brown"-test mentioned elsewhere in this thread and
added a few printk's to make sure we actually got the code covered.
> There's still a few trivial warnings from scripts/checkpatch.pl that
> should be fixed up:
Fixed that.
Sorry, I does look like it indeed solved the problem. Clearly, I have
missed something in my analysis of the codebase. In any case, thanks a
lot.
Good night,
Ani
> -Original Message-
> From: Neil Brown [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 06, 2008 9:22 PM
> To: Anirban
51 matches
Mail list logo