subject:"RE\: Trouble with NFSd under 6.1\-Stable, any ideas\?"

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-06-01 Thread Konstantin Belousov

On Thu, Jun 01, 2006 at 01:06:44AM +0300, Dmitry Pryanishnikov wrote:
> 
> Hello!
> 
> On Thu, 25 May 2006, Konstantin Belousov wrote:
> > KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
> > ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));
> >
> >from nfsserver/nfs_syscalls.c, line 570.
> >
> >As I understand the problem, kern/vfs_lookup.c:lookup() could
> >aquire additional locks on Giant, indicating this by GIANTHELD
> >flag in nd. All processing in nfsserver already goes with Giant held,
> >so, I just dropped that excessive locks after return from lookup.
> >System with patch applied survived smoke test (client did
> >du on mounted dir, patch was generated from exported fs, etc.).
> >nfsd eats no more than 25% of CPU (with INVARIANTS).
> >
> >Please, users who reported the problem and willing to help,
> >try the patch (generated against STABLE) and give the feedback.
> 
>   Thank you very much. Your patch actually fixes "nfssvc_nfsd(): 
> debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr".
> Oddly enough, NFS mount of server's "/" doesn't panic the server.

Because conditions leading to Giant leak usually hold true for
lookup of ".." :)


pgpJ3iLod20xN.pgp
Description: PGP signature

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-31 Thread Kris Kennaway

On Thu, Jun 01, 2006 at 01:06:44AM +0300, Dmitry Pryanishnikov wrote:
> 
> Hello!
> 
> On Thu, 25 May 2006, Konstantin Belousov wrote:
> > KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
> > ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));
> >
> >from nfsserver/nfs_syscalls.c, line 570.
> >
> >As I understand the problem, kern/vfs_lookup.c:lookup() could
> >aquire additional locks on Giant, indicating this by GIANTHELD
> >flag in nd. All processing in nfsserver already goes with Giant held,
> >so, I just dropped that excessive locks after return from lookup.
> >System with patch applied survived smoke test (client did
> >du on mounted dir, patch was generated from exported fs, etc.).
> >nfsd eats no more than 25% of CPU (with INVARIANTS).
> >
> >Please, users who reported the problem and willing to help,
> >try the patch (generated against STABLE) and give the feedback.
> 
>   Thank you very much. Your patch actually fixes "nfssvc_nfsd(): 
> debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr".
> Oddly enough, NFS mount of server's "/" doesn't panic the server.
> My kernel config contains "options QUOTA", however quotas are not enabled.
> Please commit the fix, IMHO long-term breakage of such a basic functionality
> (NFS server + quotas) in -STABLE branch isn't a Good Thing (TM).

FYI, if you're not using quotas then you should remove the option from
your kernel config to avoid trashing your performance.

Kris


pgpHaBFWdNItK.pgp
Description: PGP signature

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-31 Thread Dmitry Pryanishnikov



Hello!

On Thu, 25 May 2006, Konstantin Belousov wrote:

KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.


  Thank you very much. Your patch actually fixes "nfssvc_nfsd(): 
debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr".

Oddly enough, NFS mount of server's "/" doesn't panic the server.
My kernel config contains "options QUOTA", however quotas are not enabled.
Please commit the fix, IMHO long-term breakage of such a basic functionality
(NFS server + quotas) in -STABLE branch isn't a Good Thing (TM).

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail:  [EMAIL PROTECTED]
nic-hdl: LYNX-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-26 Thread Dmitriy Kirhlarov

Hi!

On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote:

> Please, users who reported the problem and willing to help,
> try the patch (generated against STABLE) and give the feedback.

I test it with RELENG_6 from 25 May 2006. It's work fine. Thank you.

WBR
-- 
Dmitriy Kirhlarov
OILspace, 26 Leninskaya sloboda, bld. 2, 2nd floor, 115280 Moscow, Russia
P:+7 495 105 7247 ext.203 F:+7 495 105 7246 E:[EMAIL PROTECTED]
OILspace - The resource enriched - www.oilspace.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Kris Kennaway

On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote:

> +options  QUOTA
>  options  UFS_ACL # Support for access control lists
>  options  UFS_DIRHASH # Improve performance on big directories
>  options  MD_ROOT # MD is a potential root device
> 
> After that, server machine easily panics on 
> 
>   KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
>   ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));
> 
> from nfsserver/nfs_syscalls.c, line 570.

OK, I am also seeing this panic when I try and export a non-mpsafe
filesystem (e.g. cd9660).  I can't test the patch because my NFS
server subsequently blew up :-(

Kris


pgpQYaxj9UfkJ.pgp
Description: PGP signature

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Rong-en Fan

On 5/25/06, Konstantin Belousov <[EMAIL PROTECTED]> wrote:

On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote:
> On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:
>
> > So what's changed at that delta, under the one that works vfs_lookup.c is:
> >
> >  Edit src/sys/kern/vfs_lookup.c
> >   Add delta 1.80.2.6 2006.03.31.07.39.24 kris
> >
> >
> > Under the one that fails the vfs_lookup.c is:
> >
> >  Edit src/sys/kern/vfs_lookup.c
> >   Add delta 1.80.2.7 2006.04.30.03.57.46 kris
> >
> >
> >
> >  So I stand corrected on my last post, the issue is in fact in this module, 
as
> > just taking that module back to 1.80.2.6 fixes the problem with my server.  
 I
> > even took multiple NFS clients and gave them a heavy workload, and CPU still
> > remained reasonable, and very responsive.  As soon as I rev to the new
> > version, NFS breaks badly and even a single client doing something like a du
> > of a directory structure results in sluggishness and extreme CPU usage.
>
> Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
> said he should have time to look at it next week.
>
> Kris

I tried to debug the problem. First, I have to admit that I cannot
reproduce the problem on GENERIC kernel. Only after QUOTAS where added,
and, correspondingly, UFS started to require Giant,
I get described behaviour. Below are the changes to GENERIC config file
I made to reproduce problem.

[...]

After that, server machine easily panics on

KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.

[...]

Hi Konstantin and others,

I'm now running RELENG_6_1 as of Apr 30 04:00 UTC source + your
patch. The nfsd is quite happy! After client's du finishes, it
stays idle as expected (eats 0.00% CPU).

Thank you very much.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

[patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Konstantin Belousov

On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote:
> On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:
> 
> > So what's changed at that delta, under the one that works vfs_lookup.c is:
> > 
> >  Edit src/sys/kern/vfs_lookup.c
> >   Add delta 1.80.2.6 2006.03.31.07.39.24 kris
> > 
> > 
> > Under the one that fails the vfs_lookup.c is:
> > 
> >  Edit src/sys/kern/vfs_lookup.c
> >   Add delta 1.80.2.7 2006.04.30.03.57.46 kris
> > 
> > 
> > 
> >  So I stand corrected on my last post, the issue is in fact in this module, 
> > as
> > just taking that module back to 1.80.2.6 fixes the problem with my server.  
> >  I
> > even took multiple NFS clients and gave them a heavy workload, and CPU still
> > remained reasonable, and very responsive.  As soon as I rev to the new
> > version, NFS breaks badly and even a single client doing something like a du
> > of a directory structure results in sluggishness and extreme CPU usage.
> 
> Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
> said he should have time to look at it next week.
> 
> Kris

I tried to debug the problem. First, I have to admit that I cannot
reproduce the problem on GENERIC kernel. Only after QUOTAS where added,
and, correspondingly, UFS started to require Giant,
I get described behaviour. Below are the changes to GENERIC config file
I made to reproduce problem.

Index: amd64/conf/GENERIC
===
RCS file: /usr/local/arch/ncvs/src/sys/amd64/conf/GENERIC,v
retrieving revision 1.439.2.11
diff -u -r1.439.2.11 GENERIC
--- amd64/conf/GENERIC  30 Apr 2006 17:39:43 -  1.439.2.11
+++ amd64/conf/GENERIC  25 May 2006 14:44:14 -
@@ -26,6 +26,19 @@
 #hints "GENERIC.hints" # Default places to look for devices.
 
 makeoptionsDEBUG=-g# Build kernel with gdb(1) debug symbols
+optionsKDB
+optionsKDB_TRACE
+#options   KDB_UNATTENDED
+optionsDDB
+optionsDDB_NUMSYM
+optionsBREAK_TO_DEBUGGER
+options INVARIANTS
+options INVARIANT_SUPPORT
+options WITNESS
+options DEBUG_LOCKS
+options DEBUG_VFS_LOCKS
+options DIAGNOSTIC
+optionsMUTEX_PROFILING
 
 #options   SCHED_ULE   # ULE scheduler
 optionsSCHED_4BSD  # 4BSD scheduler
@@ -34,6 +47,7 @@
 optionsINET6   # IPv6 communications protocols
 optionsFFS # Berkeley Fast Filesystem
 optionsSOFTUPDATES # Enable FFS soft updates support
+optionsQUOTA
 optionsUFS_ACL # Support for access control lists
 optionsUFS_DIRHASH # Improve performance on big directories
 optionsMD_ROOT # MD is a potential root device

After that, server machine easily panics on 

KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)),
("nfssvc_nfsd(): debug.mpsafenet=1 && Giant"));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.

Index: nfsserver/nfs_serv.c
===
RCS file: /usr/local/arch/ncvs/src/sys/nfsserver/nfs_serv.c,v
retrieving revision 1.156.2.2
diff -u -r1.156.2.2 nfs_serv.c
--- nfsserver/nfs_serv.c13 Mar 2006 03:06:49 -  1.156.2.2
+++ nfsserver/nfs_serv.c25 May 2006 14:44:25 -
@@ -569,6 +569,10 @@
 
error = lookup(&ind);
ind.ni_dvp = NULL;
+   if (ind.ni_cnd.cn_flags & GIANTHELD) {
+   mtx_unlock(&Giant);
+   ind.ni_cnd.cn_flags &= ~GIANTHELD;
+   }
 
if (error == 0) {
/*
@@ -1915,6 +1919,10 @@
 
error = lookup(&nd);
nd.ni_dvp = NULL;
+   if (nd.ni_cnd.cn_flags & GIANTHELD) {
+   mtx_unlock(&Giant);
+   nd.ni_cnd.cn_flags &= ~GIANTHELD;
+   }
if (error)
goto ereply;
 
@@ -2141,6 +2149,10 @@
 
error = lookup(&nd);
nd.ni_dvp = NULL;
+   if (nd.ni_cnd.cn_flags & GIANTHELD) {
+   mtx_unlock(&

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Kris Kennaway

On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:

> So what's changed at that delta, under the one that works vfs_lookup.c is:
> 
>  Edit src/sys/kern/vfs_lookup.c
>   Add delta 1.80.2.6 2006.03.31.07.39.24 kris
> 
> 
> Under the one that fails the vfs_lookup.c is:
> 
>  Edit src/sys/kern/vfs_lookup.c
>   Add delta 1.80.2.7 2006.04.30.03.57.46 kris
> 
> 
> 
>  So I stand corrected on my last post, the issue is in fact in this module, as
> just taking that module back to 1.80.2.6 fixes the problem with my server.   I
> even took multiple NFS clients and gave them a heavy workload, and CPU still
> remained reasonable, and very responsive.  As soon as I rev to the new
> version, NFS breaks badly and even a single client doing something like a du
> of a directory structure results in sluggishness and extreme CPU usage.

Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
said he should have time to look at it next week.

Kris


pgpjfHm2NRHm6.pgp
Description: PGP signature

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Howard Leadmon

 I need to follow up to the below, as I am not sure why the below test with
the vfs_lookup.c didn't pan out the first time, but with my new found
knowledge on cvs I was determined to regress the system till I found the
smoking gun so to speak, which I have done.

 First let me say that instead of running RELENG_6_1 like Rong-en is, I am
running the RELENG_6 tree that I know updates more often, but seems to work
well for me. 

 OK, so as I said above I started to regress the system a couple days at a
time, till suddenly NFS stared working again, so I knew at that point it was a
change that was made.  So then I started to narrow the time range, till I got
to the point that it broke.   Sure enough under the RELENG_6 branch, this time
was as follows:

*default tag=RELENG_6 date=2006.04.30.03.57.00   (Works OK)

*default tag=RELENG_6 date=2006.04.30.03.58.00   (Broken)

So what's changed at that delta, under the one that works vfs_lookup.c is:

 Edit src/sys/kern/vfs_lookup.c
  Add delta 1.80.2.6 2006.03.31.07.39.24 kris

Under the one that fails the vfs_lookup.c is:

 Edit src/sys/kern/vfs_lookup.c
  Add delta 1.80.2.7 2006.04.30.03.57.46 kris

 So I stand corrected on my last post, the issue is in fact in this module, as
just taking that module back to 1.80.2.6 fixes the problem with my server.   I
even took multiple NFS clients and gave them a heavy workload, and CPU still
remained reasonable, and very responsive.  As soon as I rev to the new
version, NFS breaks badly and even a single client doing something like a du
of a directory structure results in sluggishness and extreme CPU usage.

 I am not a coder, so not sure why this module was changed, but unless there
is some good reason why the changes were needed I would suspect it needs to be
rolled back, or something fixed.   So Rong-en Fan, I think you were dead on
with your analysis that the issue is in fact inside the vfs_lookup.c module.

I hope this helps...

---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Howard Leadmon
> Sent: Wednesday, May 24, 2006 1:23 PM
> To: 'Rong-en Fan'
> Cc: 'Konstantin Belousov'; freebsd-stable@freebsd.org
> Subject: RE: Trouble with NFSd under 6.1-Stable, any ideas?
> 
> 
>Hello Rong-en,
> 
> 
> As an update, I did the below, and I still had the issue with 
> either version
> of vfs_lookup.c compiled in and running.   
> 
> On the bright side, I didn't realize you could step through 
> the cvs by date, guess I just never paid attention.  So I 
> just stepped back to 'tag=RELENG_6 date=2006.04.20.00.00.00' 
> on my server, rebuilt and violla nfs is now running
> perfect.   
> 
>  So backing out something has fixed my problem, now to figure 
> out just what it
> was.   As I don't know what has caused this, I have done 
> complete buildworlds
> to make sure everything updates which takes a few hours.I 
> am going to
> start moving the cvs date forward till I get the problem 
> back, once I nail this down a bit more, I'll let you know 
> what I come up with.
> 
> 
> 
> ---
> Howard Leadmon
> http://www.leadmon.net
> 
>  
> 
> > -Original Message-
> > From: Rong-en Fan [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, May 23, 2006 3:09 PM
> > To: Howard Leadmon
> > Cc: freebsd-stable@freebsd.org
> > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
> > 
> > On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote:
> > >
> > >Hello Rong-en,
> > >
> > >  Thanks for the info on getting the debugger configured,
> > and on the serial
> > > console.   I will have to try and play with the serial 
> > console thing more, I
> > > just tried putting in the flags and the damn thing hung, I
> > had to boot
> > > from CD and take the stuff back out.
> > >
> > >  One thing you mention below that concerns me is that you
> > have version 1.90 of
> > > the vfs_lookup.c file.   I just did a less on 
> > /usr/src/sys/kern/vfs_lookup.c
> > > and I see the following:
> > >
> > > FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30
> > 03:57:46 kris
> > > Exp
> > >
> > >
> > >  I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure
> > I had the
> > > current stuff before rebuilding the kernel just now, and
> > still I see the same thing.
> > > Is something fishy going on here, or did you by chance 
> make a typo??
> > 
> > Sorry for the confusion. rev 1.90 is the number for -HEAD. 
> To back out 
> > this MFC'ed change for RELE

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Mark Morley

Another data point:

One of our NFS servers is an amd64 based system serving a cluster of web and
email servers.  Under 6.1-RCx it gave us the same (or better) performance than
the server it replaced (which was 4.11).  The server load hovered between 0.x
and 1.x

But after upping it to 6.1-STABLE the load now hovers between 5.x and 6.x with
spikes as high as 8.x, and there has been no change at all in the NFS client
traffic or other loading factors that we can tell.  This in turn makes for 
slower
NFS client accesses.

I am going to try reverting to an earlier src tree and see if that helps.

Mark

--
Mark Morley
Owner / Administrator
Islandnet.com


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Howard Leadmon


   Hello Rong-en,


As an update, I did the below, and I still had the issue with either version
of vfs_lookup.c compiled in and running.   

On the bright side, I didn't realize you could step through the cvs by date,
guess I just never paid attention.  So I just stepped back to 'tag=RELENG_6
date=2006.04.20.00.00.00' on my server, rebuilt and violla nfs is now running
perfect.   

 So backing out something has fixed my problem, now to figure out just what it
was.   As I don't know what has caused this, I have done complete buildworlds
to make sure everything updates which takes a few hours.I am going to
start moving the cvs date forward till I get the problem back, once I nail
this down a bit more, I'll let you know what I come up with.



---
Howard Leadmon 
http://www.leadmon.net

 

> -Original Message-
> From: Rong-en Fan [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 23, 2006 3:09 PM
> To: Howard Leadmon
> Cc: freebsd-stable@freebsd.org
> Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
> 
> On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote:
> >
> >Hello Rong-en,
> >
> >  Thanks for the info on getting the debugger configured, 
> and on the serial
> > console.   I will have to try and play with the serial 
> console thing more, I
> > just tried putting in the flags and the damn thing hung, I 
> had to boot 
> > from CD and take the stuff back out.
> >
> >  One thing you mention below that concerns me is that you 
> have version 1.90 of
> > the vfs_lookup.c file.   I just did a less on 
> /usr/src/sys/kern/vfs_lookup.c
> > and I see the following:
> >
> > FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 
> 03:57:46 kris 
> > Exp
> >
> >
> >  I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure 
> I had the 
> > current stuff before rebuilding the kernel just now, and 
> still I see the same thing.
> > Is something fishy going on here, or did you by chance make a typo??
> 
> Sorry for the confusion. rev 1.90 is the number for -HEAD. To 
> back out this MFC'ed change for RELENG_6_1, please cvsup to
> RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is
> 
> 1.80.2.6 2006/03/31 07:39:24 kris
> 
> To verify the effect of this revision. Please run RELENG_6_1 
> with 2006.04.30.03.57.00 and 2006.04.30.04.00.00.
> 
> Regards,
> Rong-En Fan
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Joerg Lehners



"Rong-en Fan" <[EMAIL PROTECTED]> wrote:

On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:



[...]

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris


Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread "6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu" on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

[...]

Confirmed!

I can create the problem here at will.

Setup 1: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.7, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

/usr/src from testido mounted on /mnt on schurks.
running 'cd /mnt ; du >/dev/null' two times (first after fresh boot of
testido second when all served data is in memory of testido):

joerg @ schurks> cd /mnt
joerg @ schurks> time du >/dev/null
   86.09s real 0.14s user 1.91s system
joerg @ schurks> time du >/dev/null
  205.10s real 0.20s user 1.92s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido stopped
responding to screen output sometimes, especially during the
second test):

last pid:   329;  load averages:  4.14,  2.77,  1.25up 0+00:07:30  18:44:47
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8420K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
  201 root1   40  1232K   792K -4:42 116.31% nfsd
  329 joerg   1  960  2404K  1676K RUN  0:00  0.00% top
  168 root1 1150  2456K  1760K select   0:00  0.00% sshd
  313 root1  960  1428K  1168K select   0:00  0.00% rlogind
  194 root1 1150  1556K  1256K select   0:00  0.00% mountd
  299 root1   80  1720K  1436K wait 0:00  0.00% login
  314 root1   80  1748K  1460K wait 0:00  0.00% login
  298 root1  960  1304K  1048K select   0:00  0.00% rlogind
  199 root1   40  1356K  1040K accept   0:00  0.00% nfsd
  256 root1  960  2892K  1760K select   0:00  0.00% ntpd
  315 joerg   1  200  1448K  1020K pause0:00  0.00% ksh
  300 root1   50  1448K   996K ttyin0:00  0.00% ksh
  158 root1  960  1332K   940K select   0:00  0.00% syslogd
  163 root1  960  1448K  1128K select   0:00  0.00% inetd
  176 root1  960  1408K  1044K select   0:00  0.00% rpcbind
  185 root1  960  1476K  1148K select   0:00  0.00% ypbind
  261 root1 1150  1304K   952K select   0:00  0.00% lpd

Setup 2: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.6, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

Same tests as before:

joerg @ schurks> time du >/dev/null
   22.63s real 0.15s user 1.82s system
joerg @ schurks> time du >/dev/null
   16.52s real 0.17s user 1.68s system
joerg @ schurks>

Screenfull output of top on testido AFTER both tests (testido responded
fine during both tests):

last pid:   329;  load averages:  0.49,  0.26,  0.10up 0+00:01:50  18:35:30
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8424K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
  201 root1   40  1232K   792K -0:03  3.76% nfsd
  168 root1 1150  2456K  1760K select   0:00  0.00% sshd
  329 joerg   1  960  2404K  1676K RUN  0:00  0.00% top
  313 root1  960  1428K  1168K select   0:00  0.00% rlogind
  194 root1 1150  1556K  1256K select   0:00  0.00% mountd
  299 root1   80  1720K  1440K wait 0:00  0.00% login
  314 root1   80  1748K  1464K wait 0:00  0.00% login
  298 root1  960  1304K  1048K select   0:00  0.00% rlogind
  199 root1   40  1356K  1040K accept   0:00  0.00% nfsd
  315 joerg   1  200  1448K  1020K pause0:00  0.00% ksh
  256 root1  960  2892K  1760K select   0:00  0.00% ntpd
  300 root1   50  1448K   996K ttyin0:00  0.00% ksh
  158 root1  960  1332K   940K select   0:00  0.00% syslogd
  163 root1  960  1448K  1128

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan


On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote:


   Hello Rong-en,

 Thanks for the info on getting the debugger configured, and on the serial
console.   I will have to try and play with the serial console thing more, I
just tried putting in the flags and the damn thing hung, I had to boot from CD
and take the stuff back out.

 One thing you mention below that concerns me is that you have version 1.90 of
the vfs_lookup.c file.   I just did a less on /usr/src/sys/kern/vfs_lookup.c
and I see the following:

FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp


 I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current
stuff before rebuilding the kernel just now, and still I see the same thing.
Is something fishy going on here, or did you by chance make a typo??


Sorry for the confusion. rev 1.90 is the number for -HEAD. To back
out this MFC'ed change for RELENG_6_1, please cvsup to
RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it
is

1.80.2.6 2006/03/31 07:39:24 kris

To verify the effect of this revision. Please run RELENG_6_1 with
2006.04.30.03.57.00 and 2006.04.30.04.00.00.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Howard Leadmon


   Hello Rong-en,

 Thanks for the info on getting the debugger configured, and on the serial
console.   I will have to try and play with the serial console thing more, I
just tried putting in the flags and the damn thing hung, I had to boot from CD
and take the stuff back out.

 One thing you mention below that concerns me is that you have version 1.90 of
the vfs_lookup.c file.   I just did a less on /usr/src/sys/kern/vfs_lookup.c
and I see the following:

FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp


 I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current
stuff before rebuilding the kernel just now, and still I see the same thing.
Is something fishy going on here, or did you by chance make a typo??


---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

 

> -Original Message-
> From: Rong-en Fan [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, May 23, 2006 10:19 AM
> To: Howard Leadmon
> Cc: Konstantin Belousov; Kris Kennaway; freebsd-stable@freebsd.org
> Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
> 
> On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote:
> >
> >
> > > > > If there are any thing I can provide to help tracking 
> this down.
> > > > > Please let me know. By the way, I tried with truss/kdump
> > > to see what
> > > > > happens when nfsd eats lot of CPUs, but in vain. They do
> > > not return anything.
> > > > >
> > > > I tried your recipe on 7-CURRENT with locally exported fs,
> > > remounted
> > > > over nfs. I did not get the behaviour your described.
> > >
> > > As noted in my previous thread, I have another 6.1-RELEASE nfs 
> > > server, which does not have this problem.
> > >
> > > > Could you, please, provide the backtrace for the nfsd that eats 
> > > > the CPU (from the ddb). I think it would be helpful to 
> get several 
> > > > backtraces (i.e., bt , cont, bt  ...)
> > > to see where
> > > > it running.
> > >
> > > I'm afraid that I can not do that. Last time I tried 
> breaking into 
> > > ddb (on 5.x), it hangs my serial console and the server is miles 
> > > away :-( . Perhaps we can ask Howard to do that?
> >
> >  I am more than willing to do that, as this machine runs 
> here with me, 
> > so if needed I can easily get on a console, or perform a 
> reboot.  Can 
> > one of you shed a little light on exactly what I need to 
> do, and how 
> > to do this?  I ask as I have never used this ddb stuff, so not clue 
> > one on how to go about getting the information your looking 
> to find.  
> > Guess I have been lucky, and just never had an issue that 
> took things to this level.
> 
> At least you have to add the following to your kernel:
> 
> options KDB
> options DDB
> 
> Recompile it, reboot. You would better to setup a serial 
> console so you can easily copy thing from ddb output. To do 
> it, you have to put "device sio" in your kernel configuration 
> and some files
> below:
> 
> /boot.config
> -Dh
> 
> /boot/loader.conf
> comconsole_speed=115200
> machdep.conspeed=115200
> 
> /etc/ttys
> ttyd0   "/usr/libexec/getty std.115200" cons25  on secure
> 
> On the other machine, /etc/remote:
> com1:dv=/dev/cuad0:br#115200:pa=none:
> 
> Then, use "tip com1" to attach the nfs server. The above 
> settings assume your serial console on nfs server is on COM1 
> and on the client side is also COM1. If that's not the case, 
> please follow Handbook for howto setup a serial console other 
> than COM1. To break into ddb, either use ctrl+alt+esc or send 
> a BREAK (I think ^b will do) via serial line. After that, you 
> should see
> 
> db>
> 
> Then you first use "ps" to find out the nfsd pid (better to 
> remember the pid which eats  lots of cpu before enter ddb). 
> After that, do what Konstantin suggests. I have never tried 
> "cont" in db. I guess that will return the execution back to 
> kernel and you need to break into ddb again to do another "bt ".
> 
> By the way, could you verify that backing out vfs_lookup.c 
> rev 1.90 helps in your situation? If not, maybe we are seeing 
> different problems, and then I have to figure out how to make 
> my serial console work here.
> 
> Thanks,
> Rong-En Fan
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan

On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote:

> > > If there are any thing I can provide to help tracking this down.
> > > Please let me know. By the way, I tried with truss/kdump
> to see what
> > > happens when nfsd eats lot of CPUs, but in vain. They do
> not return anything.
> > >
> > I tried your recipe on 7-CURRENT with locally exported fs,
> remounted
> > over nfs. I did not get the behaviour your described.
>
> As noted in my previous thread, I have another 6.1-RELEASE
> nfs server, which does not have this problem.
>
> > Could you, please, provide the backtrace for the nfsd that eats the
> > CPU (from the ddb). I think it would be helpful to get several
> > backtraces (i.e., bt , cont, bt  ...)
> to see where
> > it running.
>
> I'm afraid that I can not do that. Last time I tried breaking
> into ddb (on 5.x), it hangs my serial console and the server
> is miles away :-( . Perhaps we can ask Howard to do that?

 I am more than willing to do that, as this machine runs here with me, so if
needed I can easily get on a console, or perform a reboot.  Can one of you
shed a little light on exactly what I need to do, and how to do this?  I ask
as I have never used this ddb stuff, so not clue one on how to go about
getting the information your looking to find.  Guess I have been lucky, and
just never had an issue that took things to this level.

At least you have to add the following to your kernel:

options KDB
options DDB

Recompile it, reboot. You would better to setup a serial console
so you can easily copy thing from ddb output. To do it, you have
to put "device sio" in your kernel configuration and some files
below:

/boot.config
-Dh

/boot/loader.conf
comconsole_speed=115200
machdep.conspeed=115200

/etc/ttys
ttyd0   "/usr/libexec/getty std.115200" cons25  on secure

On the other machine, /etc/remote:
com1:dv=/dev/cuad0:br#115200:pa=none:

Then, use "tip com1" to attach the nfs server. The above settings
assume your serial console on nfs server is on COM1 and on the
client side is also COM1. If that's not the case, please follow
Handbook for howto setup a serial console other than COM1. To
break into ddb, either use ctrl+alt+esc or send a BREAK (I think ^b
will do) via serial line. After that, you should see

db>

Then you first use "ps" to find out the nfsd pid (better to remember
the pid which eats  lots of cpu before enter ddb). After that, do
what Konstantin suggests. I have never tried "cont" in db. I guess
that will return the execution back to kernel and you need to break
into ddb again to do another "bt ".

By the way, could you verify that backing out vfs_lookup.c rev 1.90
helps in your situation? If not, maybe we are seeing different problems,
and then I have to figure out how to make my serial console work
here.

Thanks,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan


On 5/23/06, Konstantin Belousov <[EMAIL PROTECTED]> wrote:

On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
> On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:
> >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> >>
> >>Hello All,
> >>
> >>  I have been running FBSD a long while, and actually running since the
> >5.x
> >> releases on the server I am having troubles with.   I basically have a
> >small
> >> network and just use NIS/NFS to link my various FBSD and Solaris machines
> >> together.
> >>
> >>  This has all been running fine up till a few days ago, when all of a
> >sudden
> >> NFS came to a crawl, and CPU usage so high the box appears to freeze
> >almost.
> >> When I had 6.1-RC running all seemed well, then came the announcement
> >for the
> >> official 6.1 release, so I did the cvs updates, made world, kernel, and
> >ran
> >> mergemaster to get everything up to the 6.1 stable version.
> >>
> >>  Now after doing this, something is wrong with NFS.   It works, it will
> >return
> >> information and open files, just it's very very slow, and while
> >performing a
> >> request the CPU spike is astounding.  A simple du of my home directory
> >can
> >> take minutes, and machine all but locks up if the request is done over
> >NFS.
> >> Here is top snip:
> >>
> >>   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU
> >COMMAND
> >>   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
> >>
> >>
> >>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM
> >on a
> >> disk array, and locally is screams, heck NFS used to scream till I
> >updated.  I
> >> am not really sure what info would be useful in debugging, so won't post
> >tons
> >> of misc junk in this eMail, but if anyone has any ideas as to how best to
> >> figure out and resolve this issue it would sure be appreicated...
> >
> >Use tcpdump and related tools to find out what traffic is being sent.
> >
> >Also verify that you did not change your system configuration in any
> >way: there have been no changes to NFS since the release, so it is
> >unclear why an update would cause the problem to suddenly occur.
> >
> >Kris
>
> Hi Kris and Howard,
>
> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
>
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
>
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.
>
> Let me refresh what problems I'm seeing
>
> 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
>   a nfs directory
> 2. on server-side, nfsd starts to eats lots of CPU
> 3. the du finishes
> 4. on server-side, nfsd still eats lots of CPU, but there is no
>   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
>   "running" and eats lots of CPU.
>
> On FreeBSD 6.1R client, it uses UDP mount and fstab is like
> "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
> fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
> The server's kernel conf is at
>
> http://www.rafan.org/FreeBSD/nfs/KERNEL
>
> Some related configuration files:
>
> /etc/export
>  /export/dir1 host1 host2...
>  /export/dir2 host1 host2...
>
> /etc/rc.conf
> nfs_server_enable="YES"
> nfs_server_flags="-u -t -n 16"
> mountd_enable="YES"
> mountd_flags="-r -l -n"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
> rpcbind_enable="YES"
>
> /etc/fstab:
> /dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
> /dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
>
> The NFS server is also using amd to mount some backup directories
> from another NFS server. the amd.conf is
>
> [global]
> browsable_dirs = yes
> map_type = file
> mount_type = nfs
> auto_dir = /nfs
> fully_qualified_hosts = no
> log_file = syslog
> nfs_proto = udp
> nfs_allow_insecure_port = no
> nfs_vers = 3
> # plock = yes
> selectors_on_default = yes
> restart_mounts = yes
>
> [/backup]
> map_options = type:=direct
> map_name = /etc/amd.direct
>
> /etc/amd.direct:
> /defaults
> opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
> backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}
>
>
> If there are any thing I can provide to help tracking this down. Please
> let me know. By the way, I tried with truss/kdump to see what happens
> when nfsd eats lot of CPUs, but in vain. They do not return anything.
>
I tried your recipe on 7-CURRENT with locally exported fs, remounted
over nfs. I did not get the behaviour your described.


As noted in my previous thread, I have another 6.1-RELEASE nfs server,
which does not have this problem.


Could you, please, provide the backtrace for the nfsd that
eats the CPU (from the ddb). I think it would be helpful to get seve

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Howard Leadmon



> > > If there are any thing I can provide to help tracking this down. 
> > > Please let me know. By the way, I tried with truss/kdump 
> to see what 
> > > happens when nfsd eats lot of CPUs, but in vain. They do 
> not return anything.
> > >
> > I tried your recipe on 7-CURRENT with locally exported fs, 
> remounted 
> > over nfs. I did not get the behaviour your described.
> 
> As noted in my previous thread, I have another 6.1-RELEASE 
> nfs server, which does not have this problem.
> 
> > Could you, please, provide the backtrace for the nfsd that eats the 
> > CPU (from the ddb). I think it would be helpful to get several 
> > backtraces (i.e., bt , cont, bt  ...) 
> to see where 
> > it running.
> 
> I'm afraid that I can not do that. Last time I tried breaking 
> into ddb (on 5.x), it hangs my serial console and the server 
> is miles away :-( . Perhaps we can ask Howard to do that?

 I am more than willing to do that, as this machine runs here with me, so if
needed I can easily get on a console, or perform a reboot.  Can one of you
shed a little light on exactly what I need to do, and how to do this?  I ask
as I have never used this ddb stuff, so not clue one on how to go about
getting the information your looking to find.  Guess I have been lucky, and
just never had an issue that took things to this level.

 
> > Also, just in case, does filesystem that is exported and shows 
> > problem, have quotas enabled ? One line of your fstab has 
> userquotas, other does not.

As to userquotas, I just tried accessing the NFS mounts here, as some have
filesystems with quotas, and some don't, and both are exibiting the exact same
problem.  So using quotas is for sure not the problem, or should I say not the
trigger to the problem.
 

> Regards,
> Rong-En Fan



---
Howard Leadmon
http://www.leadmon.net
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Konstantin Belousov

On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
> On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:
> >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> >>
> >>Hello All,
> >>
> >>  I have been running FBSD a long while, and actually running since the 
> >5.x
> >> releases on the server I am having troubles with.   I basically have a 
> >small
> >> network and just use NIS/NFS to link my various FBSD and Solaris machines
> >> together.
> >>
> >>  This has all been running fine up till a few days ago, when all of a 
> >sudden
> >> NFS came to a crawl, and CPU usage so high the box appears to freeze 
> >almost.
> >> When I had 6.1-RC running all seemed well, then came the announcement 
> >for the
> >> official 6.1 release, so I did the cvs updates, made world, kernel, and 
> >ran
> >> mergemaster to get everything up to the 6.1 stable version.
> >>
> >>  Now after doing this, something is wrong with NFS.   It works, it will 
> >return
> >> information and open files, just it's very very slow, and while 
> >performing a
> >> request the CPU spike is astounding.  A simple du of my home directory 
> >can
> >> take minutes, and machine all but locks up if the request is done over 
> >NFS.
> >> Here is top snip:
> >>
> >>   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU 
> >COMMAND
> >>   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
> >>
> >>
> >>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM 
> >on a
> >> disk array, and locally is screams, heck NFS used to scream till I 
> >updated.  I
> >> am not really sure what info would be useful in debugging, so won't post 
> >tons
> >> of misc junk in this eMail, but if anyone has any ideas as to how best to
> >> figure out and resolve this issue it would sure be appreicated...
> >
> >Use tcpdump and related tools to find out what traffic is being sent.
> >
> >Also verify that you did not change your system configuration in any
> >way: there have been no changes to NFS since the release, so it is
> >unclear why an update would cause the problem to suddenly occur.
> >
> >Kris
> 
> Hi Kris and Howard,
> 
> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
> 
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
> 
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.
> 
> Let me refresh what problems I'm seeing
> 
> 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
>   a nfs directory
> 2. on server-side, nfsd starts to eats lots of CPU
> 3. the du finishes
> 4. on server-side, nfsd still eats lots of CPU, but there is no
>   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
>   "running" and eats lots of CPU.
> 
> On FreeBSD 6.1R client, it uses UDP mount and fstab is like
> "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
> fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
> The server's kernel conf is at
> 
> http://www.rafan.org/FreeBSD/nfs/KERNEL
> 
> Some related configuration files:
> 
> /etc/export
>  /export/dir1 host1 host2...
>  /export/dir2 host1 host2...
> 
> /etc/rc.conf
> nfs_server_enable="YES"
> nfs_server_flags="-u -t -n 16"
> mountd_enable="YES"
> mountd_flags="-r -l -n"
> rpc_lockd_enable="YES"
> rpc_statd_enable="YES"
> rpcbind_enable="YES"
> 
> /etc/fstab:
> /dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
> /dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
> 
> The NFS server is also using amd to mount some backup directories
> from another NFS server. the amd.conf is
> 
> [global]
> browsable_dirs = yes
> map_type = file
> mount_type = nfs
> auto_dir = /nfs
> fully_qualified_hosts = no
> log_file = syslog
> nfs_proto = udp
> nfs_allow_insecure_port = no
> nfs_vers = 3
> # plock = yes
> selectors_on_default = yes
> restart_mounts = yes
> 
> [/backup]
> map_options = type:=direct
> map_name = /etc/amd.direct
> 
> /etc/amd.direct:
> /defaults
> opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
> backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}
> 
> 
> If there are any thing I can provide to help tracking this down. Please
> let me know. By the way, I tried with truss/kdump to see what happens
> when nfsd eats lot of CPUs, but in vain. They do not return anything.
> 
I tried your recipe on 7-CURRENT with locally exported fs, remounted
over nfs. I did not get the behaviour your described.

Could you, please, provide the backtrace for the nfsd that
eats the CPU (from the ddb). I think it would be helpful to get several
backtraces (i.e., bt , cont, bt  ...) to
see where it running.

Also, just in case, does filesystem that is exported and shows probl

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-22 Thread Kris Kennaway

On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:

> As I posted few days ago, I have similar problems like Howard's
> (some details in the thread "6.1-RELEASE, em0 high interrupt rate
> and nfsd eats lots of cpu" on stable@). After binary searching
> the source tree, I found that
> 
> RELENG_6_1, 2006.04.30.03.57 ok
> RELENG_6_1, 2006.04.30.04.00 bad
> 
> The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
> With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
> the same problem occurs.

Thanks for tracking this down, I'll see what Jeff has to say.

Kris


pgpxP6IYZlVir.pgp
Description: PGP signature

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-22 Thread Rong-en Fan

On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote:

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
>
>Hello All,
>
>  I have been running FBSD a long while, and actually running since the 5.x
> releases on the server I am having troubles with.   I basically have a small
> network and just use NIS/NFS to link my various FBSD and Solaris machines
> together.
>
>  This has all been running fine up till a few days ago, when all of a sudden
> NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
> When I had 6.1-RC running all seemed well, then came the announcement for the
> official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> mergemaster to get everything up to the 6.1 stable version.
>
>  Now after doing this, something is wrong with NFS.   It works, it will return
> information and open files, just it's very very slow, and while performing a
> request the CPU spike is astounding.  A simple du of my home directory can
> take minutes, and machine all but locks up if the request is done over NFS.
> Here is top snip:
>
>   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
>   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
>
>
>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
> disk array, and locally is screams, heck NFS used to scream till I updated.  I
> am not really sure what info would be useful in debugging, so won't post tons
> of misc junk in this eMail, but if anyone has any ideas as to how best to
> figure out and resolve this issue it would sure be appreicated...

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris

Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread "6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu" on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

Let me refresh what problems I'm seeing

1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
  a nfs directory
2. on server-side, nfsd starts to eats lots of CPU
3. the du finishes
4. on server-side, nfsd still eats lots of CPU, but there is no
  nfs traffic. Wait for 5 minutes, you can still see that nfsd is
  "running" and eats lots of CPU.

On FreeBSD 6.1R client, it uses UDP mount and fstab is like
"rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and
fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192".
The server's kernel conf is at

http://www.rafan.org/FreeBSD/nfs/KERNEL

Some related configuration files:

/etc/export
 /export/dir1 host1 host2...
 /export/dir2 host1 host2...

/etc/rc.conf
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 16"
mountd_enable="YES"
mountd_flags="-r -l -n"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
rpcbind_enable="YES"

/etc/fstab:
/dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
/dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2

The NFS server is also using amd to mount some backup directories
from another NFS server. the amd.conf is

[global]
browsable_dirs = yes
map_type = file
mount_type = nfs
auto_dir = /nfs
fully_qualified_hosts = no
log_file = syslog
nfs_proto = udp
nfs_allow_insecure_port = no
nfs_vers = 3
# plock = yes
selectors_on_default = yes
restart_mounts = yes

[/backup]
map_options = type:=direct
map_name = /etc/amd.direct

/etc/amd.direct:
/defaults
opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}

If there are any thing I can provide to help tracking this down. Please
let me know. By the way, I tried with truss/kdump to see what happens
when nfsd eats lot of CPUs, but in vain. They do not return anything.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-20 Thread Howard Leadmon

 Sorry for delay, ended up sick.. :(

 You say use tcpdump, is there something I should be looking out for?   As NFS
is serving files, even more strange is if I kill off the nfsd process it's
zippy fast for a moment and then the CPU load goes through the roof, and it
starts serving files slowly.   So it's actually working, outside of it
consumes all available CPU and brings the machine to it knees quickly.
Doesn't matter if I access it from my Solaris box, my other FBSD boxes, and so
on, it still dogs down terribly and never used.

 Anything anyone can think of config wise that might cause that, it would be
nice to know.  I have the following that I can think of that affects NFS
configs:

#
# NFS 
#
nfs_client_enable="NO"  # This host is an NFS client (or NO).
nfs_access_cache="2"# Client cache timeout in seconds
nfs_server_enable="YES" # This host is an NFS server (or NO).
nfs_server_flags="-u -t -n 5"   # Flags to nfsd (if enabled).
mountd_enable="YES" # Run mountd (or NO).
mountd_flags="-r"   # Flags to mountd (if NFS server enabled).
weak_mountd_authentication="NO" # Allow non-root mount requests to be served.
nfs_reserved_port_only="YES"# Provide NFS only on secure port (or NO).
nfs_bufpackets=""   # bufspace (in packets) for client
rpc_lockd_enable="YES"  # Run NFS rpc.lockd needed for client/server.
rpc_statd_enable="YES"  # Run NFS rpc.statd needed for client/server.
rpcbind_enable="YES"# Run the portmapper service (YES/NO).
rpcbind_program="/usr/sbin/rpcbind" # path to rpcbind, if you want a
differe
rpcbind_flags=""# Flags to rpcbind (if enabled).

 I can't think of anything that should have changed, unless mergemaster
updating the default files might have changed something that would have an
effect.

---
Howard Leadmon
http://www.leadmon.net

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Kris Kennaway
> Sent: Sunday, May 14, 2006 10:50 PM
> To: Howard Leadmon
> Cc: freebsd-stable@freebsd.org
> Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
> 
> On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> > 
> >Hello All,
> > 
> >  I have been running FBSD a long while, and actually 
> running since the 5.x
> > releases on the server I am having troubles with.   I 
> basically have a small
> > network and just use NIS/NFS to link my various FBSD and Solaris 
> > machines together.
> > 
> >  This has all been running fine up till a few days ago, 
> when all of a 
> > sudden NFS came to a crawl, and CPU usage so high the box 
> appears to freeze almost.
> > When I had 6.1-RC running all seemed well, then came the 
> announcement 
> > for the official 6.1 release, so I did the cvs updates, made world, 
> > kernel, and ran mergemaster to get everything up to the 6.1 
> stable version.
> > 
> >  Now after doing this, something is wrong with NFS.   It 
> works, it will return
> > information and open files, just it's very very slow, and while 
> > performing a request the CPU spike is astounding.  A simple 
> du of my 
> > home directory can take minutes, and machine all but locks 
> up if the request is done over NFS.
> > Here is top snip:
> > 
> >   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME 
>   WCPU COMMAND
> >   497 root 1   40  1252K   780K -  2  50:42 
> 188.48% nfsd
> > 
> > 
> >  This is a nice IBM eServer with dual P4-XEON's and a 
> couple GB or RAM 
> > on a disk array, and locally is screams, heck NFS used to 
> scream till 
> > I updated.  I am not really sure what info would be useful in 
> > debugging, so won't post tons of misc junk in this eMail, but if 
> > anyone has any ideas as to how best to figure out and 
> resolve this issue it would sure be appreicated...
> 
> Use tcpdump and related tools to find out what traffic is being sent.
> 
> Also verify that you did not change your system configuration in any
> way: there have been no changes to NFS since the release, so 
> it is unclear why an update would cause the problem to suddenly occur.
> 
> Kris
> 
> 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Stephen Hurd


Howard Leadmon wrote:

Would this just be lockd, or should I disable both lockd and statd?  I notice
in the rc.conf it claims they are both supposed to be enabled, so not sure
what issues I run into if I disable them, if any.
  
No need to disable rpc.statd though I don't know if any other programs 
request monitoring.  The issues you'll run into is simply a lack of any 
locks on the mounted drive.  This can easily lead to file corruption if 
multiple programs or multiple instances of a single program change the 
same file at the same time.  Many programs will use NFS-safe lockfiles 
if configure to do so, which is often a useable workaround in a 
lockd-free world.  If you're only using the files read-only, or only a 
single process uses files on the mount at a time (on *all* systems) then 
there's no issue at all.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Howard Leadmon


Would this just be lockd, or should I disable both lockd and statd?  I notice
in the rc.conf it claims they are both supposed to be enabled, so not sure
what issues I run into if I disable them, if any.

As to my servers, I have a bunch of stuff running actually.  The Dual XEON
server being my fastest machine by far is my main server, off of that I have a
SPARC running Solaris10, I have another SPARC running FreeBSD 6.1, I have a
DEC Alpha running FreeBSD 4.11, and another old Dual XEON machine running
FreeBSD 4.11, and finally another x86 machine running FreeBSD 7 current. 

This stuff has been doing great, I personally love FBSD, as you can tell from
what is loaded on most of the servers.   It just blows my mind that now after
years of running the network like this, now something breaks.  I get no errors
in syslog, and it IS serving requests, as I have some web pages on the old
4.11 machine and they come up, just slowly compared to what they used to do.
Oh well as they say, in the world of computers it's never easy..  LOL



---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

 

> -Original Message-
> From: Stephen Hurd [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, May 14, 2006 5:54 PM
> To: Howard Leadmon
> Cc: freebsd-stable@freebsd.org
> Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
> 
> Howard Leadmon wrote:
> >Hello All,
> >
> >  I have been running FBSD a long while, and actually 
> running since the 5.x
> > releases on the server I am having troubles with.   I 
> basically have a small
> > network and just use NIS/NFS to link my various FBSD and Solaris 
> > machines together.
> >
> >  This has all been running fine up till a few days ago, 
> when all of a 
> > sudden NFS came to a crawl, and CPU usage so high the box 
> appears to freeze almost.
> > When I had 6.1-RC running all seemed well, then came the 
> announcement 
> > for the official 6.1 release, so I did the cvs updates, made world, 
> > kernel, and ran mergemaster to get everything up to the 6.1 
> stable version.
> >
> >  Now after doing this, something is wrong with NFS.   It 
> works, it will return
> > information and open files, just it's very very slow, and while 
> > performing a request the CPU spike is astounding.  A simple 
> du of my 
> > home directory can take minutes, and machine all but locks 
> up if the request is done over NFS.
> > Here is top snip:
> >
> >   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME 
>   WCPU COMMAND
> >   497 root 1   40  1252K   780K -  2  50:42 
> 188.48% nfsd
> >
> >
> >  This is a nice IBM eServer with dual P4-XEON's and a 
> couple GB or RAM 
> > on a disk array, and locally is screams, heck NFS used to 
> scream till 
> > I updated.  I am not really sure what info would be useful in 
> > debugging, so won't post tons of misc junk in this eMail, but if 
> > anyone has any ideas as to how best to figure out and 
> resolve this issue it would sure be appreicated...
> >   
> Are you running rpc.lockd?  I've had very bad luck with it 
> since sometime in the 5.x series... especially with it 
> interoperating with Solaris.  I submitted a PR on it, but 
> it's apparently broken in about X ways.  If possible, I would 
> suggest living without rpc.lockd for now (if you're currently 
> living with it that is)
> 
> Other than that issue, NFS itself has been working nicely for me.
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Kris Kennaway

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
> 
>Hello All, 
> 
>  I have been running FBSD a long while, and actually running since the 5.x
> releases on the server I am having troubles with.   I basically have a small
> network and just use NIS/NFS to link my various FBSD and Solaris machines
> together.
> 
>  This has all been running fine up till a few days ago, when all of a sudden
> NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
> When I had 6.1-RC running all seemed well, then came the announcement for the
> official 6.1 release, so I did the cvs updates, made world, kernel, and ran
> mergemaster to get everything up to the 6.1 stable version.
> 
>  Now after doing this, something is wrong with NFS.   It works, it will return
> information and open files, just it's very very slow, and while performing a
> request the CPU spike is astounding.  A simple du of my home directory can
> take minutes, and machine all but locks up if the request is done over NFS.
> Here is top snip:
> 
>   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
>   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
> 
> 
>  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
> disk array, and locally is screams, heck NFS used to scream till I updated.  I
> am not really sure what info would be useful in debugging, so won't post tons
> of misc junk in this eMail, but if anyone has any ideas as to how best to
> figure out and resolve this issue it would sure be appreicated...

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris



pgpiLgbpawelN.pgp
Description: PGP signature

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Michel Talon

> Are you running rpc.lockd?  I've had very bad luck with it since 
> sometime in the 5.x series... especially with it interoperating with 
> Solaris.  I submitted a PR on it, but it's apparently broken in about X 
> ways.  If possible, I would suggest living without rpc.lockd for now (if 
> you're currently living with it that is)

On the contrary NFS problems interoperating with Linux have been cleared for
me since upgrading Linux to Fedora Core 5 and FreeBSD to 6.1. In particular
rpc.lockd works, everything is OK, performance is fine. I had very bad
problems in the past, when we were running Fedora Core 3.


-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Stephen Hurd


Howard Leadmon wrote:
   Hello All, 


 I have been running FBSD a long while, and actually running since the 5.x
releases on the server I am having troubles with.   I basically have a small
network and just use NIS/NFS to link my various FBSD and Solaris machines
together.

 This has all been running fine up till a few days ago, when all of a sudden
NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
When I had 6.1-RC running all seemed well, then came the announcement for the
official 6.1 release, so I did the cvs updates, made world, kernel, and ran
mergemaster to get everything up to the 6.1 stable version.

 Now after doing this, something is wrong with NFS.   It works, it will return
information and open files, just it's very very slow, and while performing a
request the CPU spike is astounding.  A simple du of my home directory can
take minutes, and machine all but locks up if the request is done over NFS.
Here is top snip:

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd


 This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
disk array, and locally is screams, heck NFS used to scream till I updated.  I
am not really sure what info would be useful in debugging, so won't post tons
of misc junk in this eMail, but if anyone has any ideas as to how best to
figure out and resolve this issue it would sure be appreicated...
  
Are you running rpc.lockd?  I've had very bad luck with it since 
sometime in the 5.x series... especially with it interoperating with 
Solaris.  I submitted a PR on it, but it's apparently broken in about X 
ways.  If possible, I would suggest living without rpc.lockd for now (if 
you're currently living with it that is)


Other than that issue, NFS itself has been working nicely for me.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

[patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

RE: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

Re: Trouble with NFSd under 6.1-Stable, any ideas?

26 matches

Site Navigation

Mail list logo

Footer information