Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-31 Thread Dmitry Pryanishnikov


Hello!

On Thu, 25 May 2006, Konstantin Belousov wrote:

KASSERT(!(debug_mpsafenet == 1  mtx_owned(Giant)),
(nfssvc_nfsd(): debug.mpsafenet=1  Giant));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.


  Thank you very much. Your patch actually fixes nfssvc_nfsd(): 
debug.mpsafenet=1  Giant panic during NFS mount of server's /usr.

Oddly enough, NFS mount of server's / doesn't panic the server.
My kernel config contains options QUOTA, however quotas are not enabled.
Please commit the fix, IMHO long-term breakage of such a basic functionality
(NFS server + quotas) in -STABLE branch isn't a Good Thing (TM).

Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail:  [EMAIL PROTECTED]
nic-hdl: LYNX-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-31 Thread Kris Kennaway
On Thu, Jun 01, 2006 at 01:06:44AM +0300, Dmitry Pryanishnikov wrote:
 
 Hello!
 
 On Thu, 25 May 2006, Konstantin Belousov wrote:
  KASSERT(!(debug_mpsafenet == 1  mtx_owned(Giant)),
  (nfssvc_nfsd(): debug.mpsafenet=1  Giant));
 
 from nfsserver/nfs_syscalls.c, line 570.
 
 As I understand the problem, kern/vfs_lookup.c:lookup() could
 aquire additional locks on Giant, indicating this by GIANTHELD
 flag in nd. All processing in nfsserver already goes with Giant held,
 so, I just dropped that excessive locks after return from lookup.
 System with patch applied survived smoke test (client did
 du on mounted dir, patch was generated from exported fs, etc.).
 nfsd eats no more than 25% of CPU (with INVARIANTS).
 
 Please, users who reported the problem and willing to help,
 try the patch (generated against STABLE) and give the feedback.
 
   Thank you very much. Your patch actually fixes nfssvc_nfsd(): 
 debug.mpsafenet=1  Giant panic during NFS mount of server's /usr.
 Oddly enough, NFS mount of server's / doesn't panic the server.
 My kernel config contains options QUOTA, however quotas are not enabled.
 Please commit the fix, IMHO long-term breakage of such a basic functionality
 (NFS server + quotas) in -STABLE branch isn't a Good Thing (TM).

FYI, if you're not using quotas then you should remove the option from
your kernel config to avoid trashing your performance.

Kris


pgpHaBFWdNItK.pgp
Description: PGP signature


Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-26 Thread Dmitriy Kirhlarov
Hi!

On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote:

 Please, users who reported the problem and willing to help,
 try the patch (generated against STABLE) and give the feedback.

I test it with RELENG_6 from 25 May 2006. It's work fine. Thank you.

WBR
-- 
Dmitriy Kirhlarov
OILspace, 26 Leninskaya sloboda, bld. 2, 2nd floor, 115280 Moscow, Russia
P:+7 495 105 7247 ext.203 F:+7 495 105 7246 E:[EMAIL PROTECTED]
OILspace - The resource enriched - www.oilspace.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


[patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Konstantin Belousov
On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote:
 On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:
 
  So what's changed at that delta, under the one that works vfs_lookup.c is:
  
   Edit src/sys/kern/vfs_lookup.c
Add delta 1.80.2.6 2006.03.31.07.39.24 kris
  
  
  Under the one that fails the vfs_lookup.c is:
  
   Edit src/sys/kern/vfs_lookup.c
Add delta 1.80.2.7 2006.04.30.03.57.46 kris
  
  
  
   So I stand corrected on my last post, the issue is in fact in this module, 
  as
  just taking that module back to 1.80.2.6 fixes the problem with my server.  
   I
  even took multiple NFS clients and gave them a heavy workload, and CPU still
  remained reasonable, and very responsive.  As soon as I rev to the new
  version, NFS breaks badly and even a single client doing something like a du
  of a directory structure results in sluggishness and extreme CPU usage.
 
 Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
 said he should have time to look at it next week.
 
 Kris

I tried to debug the problem. First, I have to admit that I cannot
reproduce the problem on GENERIC kernel. Only after QUOTAS where added,
and, correspondingly, UFS started to require Giant,
I get described behaviour. Below are the changes to GENERIC config file
I made to reproduce problem.

Index: amd64/conf/GENERIC
===
RCS file: /usr/local/arch/ncvs/src/sys/amd64/conf/GENERIC,v
retrieving revision 1.439.2.11
diff -u -r1.439.2.11 GENERIC
--- amd64/conf/GENERIC  30 Apr 2006 17:39:43 -  1.439.2.11
+++ amd64/conf/GENERIC  25 May 2006 14:44:14 -
@@ -26,6 +26,19 @@
 #hints GENERIC.hints # Default places to look for devices.
 
 makeoptionsDEBUG=-g# Build kernel with gdb(1) debug symbols
+optionsKDB
+optionsKDB_TRACE
+#options   KDB_UNATTENDED
+optionsDDB
+optionsDDB_NUMSYM
+optionsBREAK_TO_DEBUGGER
+options INVARIANTS
+options INVARIANT_SUPPORT
+options WITNESS
+options DEBUG_LOCKS
+options DEBUG_VFS_LOCKS
+options DIAGNOSTIC
+optionsMUTEX_PROFILING
 
 #options   SCHED_ULE   # ULE scheduler
 optionsSCHED_4BSD  # 4BSD scheduler
@@ -34,6 +47,7 @@
 optionsINET6   # IPv6 communications protocols
 optionsFFS # Berkeley Fast Filesystem
 optionsSOFTUPDATES # Enable FFS soft updates support
+optionsQUOTA
 optionsUFS_ACL # Support for access control lists
 optionsUFS_DIRHASH # Improve performance on big directories
 optionsMD_ROOT # MD is a potential root device

After that, server machine easily panics on 

KASSERT(!(debug_mpsafenet == 1  mtx_owned(Giant)),
(nfssvc_nfsd(): debug.mpsafenet=1  Giant));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.

Index: nfsserver/nfs_serv.c
===
RCS file: /usr/local/arch/ncvs/src/sys/nfsserver/nfs_serv.c,v
retrieving revision 1.156.2.2
diff -u -r1.156.2.2 nfs_serv.c
--- nfsserver/nfs_serv.c13 Mar 2006 03:06:49 -  1.156.2.2
+++ nfsserver/nfs_serv.c25 May 2006 14:44:25 -
@@ -569,6 +569,10 @@
 
error = lookup(ind);
ind.ni_dvp = NULL;
+   if (ind.ni_cnd.cn_flags  GIANTHELD) {
+   mtx_unlock(Giant);
+   ind.ni_cnd.cn_flags = ~GIANTHELD;
+   }
 
if (error == 0) {
/*
@@ -1915,6 +1919,10 @@
 
error = lookup(nd);
nd.ni_dvp = NULL;
+   if (nd.ni_cnd.cn_flags  GIANTHELD) {
+   mtx_unlock(Giant);
+   nd.ni_cnd.cn_flags = ~GIANTHELD;
+   }
if (error)
goto ereply;
 
@@ -2141,6 +2149,10 @@
 
error = lookup(nd);
nd.ni_dvp = NULL;
+   if (nd.ni_cnd.cn_flags  GIANTHELD) {
+   mtx_unlock(Giant);
+   nd.ni_cnd.cn_flags = ~GIANTHELD;
+   

Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Rong-en Fan

On 5/25/06, Konstantin Belousov [EMAIL PROTECTED] wrote:

On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote:
 On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:

  So what's changed at that delta, under the one that works vfs_lookup.c is:
 
   Edit src/sys/kern/vfs_lookup.c
Add delta 1.80.2.6 2006.03.31.07.39.24 kris
 
 
  Under the one that fails the vfs_lookup.c is:
 
   Edit src/sys/kern/vfs_lookup.c
Add delta 1.80.2.7 2006.04.30.03.57.46 kris
 
 
 
   So I stand corrected on my last post, the issue is in fact in this module, 
as
  just taking that module back to 1.80.2.6 fixes the problem with my server.  
 I
  even took multiple NFS clients and gave them a heavy workload, and CPU still
  remained reasonable, and very responsive.  As soon as I rev to the new
  version, NFS breaks badly and even a single client doing something like a du
  of a directory structure results in sluggishness and extreme CPU usage.

 Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
 said he should have time to look at it next week.

 Kris

I tried to debug the problem. First, I have to admit that I cannot
reproduce the problem on GENERIC kernel. Only after QUOTAS where added,
and, correspondingly, UFS started to require Giant,
I get described behaviour. Below are the changes to GENERIC config file
I made to reproduce problem.


[...]

After that, server machine easily panics on

KASSERT(!(debug_mpsafenet == 1  mtx_owned(Giant)),
(nfssvc_nfsd(): debug.mpsafenet=1  Giant));

from nfsserver/nfs_syscalls.c, line 570.

As I understand the problem, kern/vfs_lookup.c:lookup() could
aquire additional locks on Giant, indicating this by GIANTHELD
flag in nd. All processing in nfsserver already goes with Giant held,
so, I just dropped that excessive locks after return from lookup.
System with patch applied survived smoke test (client did
du on mounted dir, patch was generated from exported fs, etc.).
nfsd eats no more than 25% of CPU (with INVARIANTS).

Please, users who reported the problem and willing to help,
try the patch (generated against STABLE) and give the feedback.


[...]

Hi Konstantin and others,

I'm now running RELENG_6_1 as of Apr 30 04:00 UTC source + your
patch. The nfsd is quite happy! After client's du finishes, it
stays idle as expected (eats 0.00% CPU).

Thank you very much.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-25 Thread Kris Kennaway
On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote:

 +options  QUOTA
  options  UFS_ACL # Support for access control lists
  options  UFS_DIRHASH # Improve performance on big directories
  options  MD_ROOT # MD is a potential root device
 
 After that, server machine easily panics on 
 
   KASSERT(!(debug_mpsafenet == 1  mtx_owned(Giant)),
   (nfssvc_nfsd(): debug.mpsafenet=1  Giant));
 
 from nfsserver/nfs_syscalls.c, line 570.

OK, I am also seeing this panic when I try and export a non-mpsafe
filesystem (e.g. cd9660).  I can't test the patch because my NFS
server subsequently blew up :-(

Kris


pgpQYaxj9UfkJ.pgp
Description: PGP signature


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Joerg Lehners


Rong-en Fan [EMAIL PROTECTED] wrote:

On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote:

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:



[...]

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris


Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread 6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

[...]

Confirmed!

I can create the problem here at will.

Setup 1: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.7, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

/usr/src from testido mounted on /mnt on schurks.
running 'cd /mnt ; du /dev/null' two times (first after fresh boot of
testido second when all served data is in memory of testido):

joerg @ schurks cd /mnt
joerg @ schurks time du /dev/null
   86.09s real 0.14s user 1.91s system
joerg @ schurks time du /dev/null
  205.10s real 0.20s user 1.92s system
joerg @ schurks

Screenfull output of top on testido AFTER both tests (testido stopped
responding to screen output sometimes, especially during the
second test):

last pid:   329;  load averages:  4.14,  2.77,  1.25up 0+00:07:30  18:44:47
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8420K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
  201 root1   40  1232K   792K -4:42 116.31% nfsd
  329 joerg   1  960  2404K  1676K RUN  0:00  0.00% top
  168 root1 1150  2456K  1760K select   0:00  0.00% sshd
  313 root1  960  1428K  1168K select   0:00  0.00% rlogind
  194 root1 1150  1556K  1256K select   0:00  0.00% mountd
  299 root1   80  1720K  1436K wait 0:00  0.00% login
  314 root1   80  1748K  1460K wait 0:00  0.00% login
  298 root1  960  1304K  1048K select   0:00  0.00% rlogind
  199 root1   40  1356K  1040K accept   0:00  0.00% nfsd
  256 root1  960  2892K  1760K select   0:00  0.00% ntpd
  315 joerg   1  200  1448K  1020K pause0:00  0.00% ksh
  300 root1   50  1448K   996K ttyin0:00  0.00% ksh
  158 root1  960  1332K   940K select   0:00  0.00% syslogd
  163 root1  960  1448K  1128K select   0:00  0.00% inetd
  176 root1  960  1408K  1044K select   0:00  0.00% rpcbind
  185 root1  960  1476K  1148K select   0:00  0.00% ypbind
  261 root1 1150  1304K   952K select   0:00  0.00% lpd

Setup 2: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006
with sys/kern/vfs_lookup.c 1.80.2.6, NFS schurks FreeBSD 6.1-STABLE as of
15. May 2006.

Same tests as before:

joerg @ schurks time du /dev/null
   22.63s real 0.15s user 1.82s system
joerg @ schurks time du /dev/null
   16.52s real 0.17s user 1.68s system
joerg @ schurks

Screenfull output of top on testido AFTER both tests (testido responded
fine during both tests):

last pid:   329;  load averages:  0.49,  0.26,  0.10up 0+00:01:50  18:35:30
29 processes:  1 running, 28 sleeping
CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 8424K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free
Swap: 4000M Total, 4000M Free

  PID USERNAME  THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
  201 root1   40  1232K   792K -0:03  3.76% nfsd
  168 root1 1150  2456K  1760K select   0:00  0.00% sshd
  329 joerg   1  960  2404K  1676K RUN  0:00  0.00% top
  313 root1  960  1428K  1168K select   0:00  0.00% rlogind
  194 root1 1150  1556K  1256K select   0:00  0.00% mountd
  299 root1   80  1720K  1440K wait 0:00  0.00% login
  314 root1   80  1748K  1464K wait 0:00  0.00% login
  298 root1  960  1304K  1048K select   0:00  0.00% rlogind
  199 root1   40  1356K  1040K accept   0:00  0.00% nfsd
  315 joerg   1  200  1448K  1020K pause0:00  0.00% ksh
  256 root1  960  2892K  1760K select   0:00  0.00% ntpd
  300 root1   50  1448K   996K ttyin0:00  0.00% ksh
  158 root1  960  1332K   940K select   0:00  0.00% syslogd
  163 root1  960  1448K  1128K select   0:00  

RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Howard Leadmon

   Hello Rong-en,


As an update, I did the below, and I still had the issue with either version
of vfs_lookup.c compiled in and running.   

On the bright side, I didn't realize you could step through the cvs by date,
guess I just never paid attention.  So I just stepped back to 'tag=RELENG_6
date=2006.04.20.00.00.00' on my server, rebuilt and violla nfs is now running
perfect.   

 So backing out something has fixed my problem, now to figure out just what it
was.   As I don't know what has caused this, I have done complete buildworlds
to make sure everything updates which takes a few hours.I am going to
start moving the cvs date forward till I get the problem back, once I nail
this down a bit more, I'll let you know what I come up with.



---
Howard Leadmon 
http://www.leadmon.net

 

 -Original Message-
 From: Rong-en Fan [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 23, 2006 3:09 PM
 To: Howard Leadmon
 Cc: freebsd-stable@freebsd.org
 Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
 
 On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote:
 
 Hello Rong-en,
 
   Thanks for the info on getting the debugger configured, 
 and on the serial
  console.   I will have to try and play with the serial 
 console thing more, I
  just tried putting in the flags and the damn thing hung, I 
 had to boot 
  from CD and take the stuff back out.
 
   One thing you mention below that concerns me is that you 
 have version 1.90 of
  the vfs_lookup.c file.   I just did a less on 
 /usr/src/sys/kern/vfs_lookup.c
  and I see the following:
 
  FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 
 03:57:46 kris 
  Exp
 
 
   I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure 
 I had the 
  current stuff before rebuilding the kernel just now, and 
 still I see the same thing.
  Is something fishy going on here, or did you by chance make a typo??
 
 Sorry for the confusion. rev 1.90 is the number for -HEAD. To 
 back out this MFC'ed change for RELENG_6_1, please cvsup to
 RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is
 
 1.80.2.6 2006/03/31 07:39:24 kris
 
 To verify the effect of this revision. Please run RELENG_6_1 
 with 2006.04.30.03.57.00 and 2006.04.30.04.00.00.
 
 Regards,
 Rong-En Fan
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Mark Morley
Another data point:

One of our NFS servers is an amd64 based system serving a cluster of web and
email servers.  Under 6.1-RCx it gave us the same (or better) performance than
the server it replaced (which was 4.11).  The server load hovered between 0.x
and 1.x

But after upping it to 6.1-STABLE the load now hovers between 5.x and 6.x with
spikes as high as 8.x, and there has been no change at all in the NFS client
traffic or other loading factors that we can tell.  This in turn makes for 
slower
NFS client accesses.

I am going to try reverting to an earlier src tree and see if that helps.

Mark

--
Mark Morley
Owner / Administrator
Islandnet.com


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Howard Leadmon

 I need to follow up to the below, as I am not sure why the below test with
the vfs_lookup.c didn't pan out the first time, but with my new found
knowledge on cvs I was determined to regress the system till I found the
smoking gun so to speak, which I have done.

 First let me say that instead of running RELENG_6_1 like Rong-en is, I am
running the RELENG_6 tree that I know updates more often, but seems to work
well for me. 

 OK, so as I said above I started to regress the system a couple days at a
time, till suddenly NFS stared working again, so I knew at that point it was a
change that was made.  So then I started to narrow the time range, till I got
to the point that it broke.   Sure enough under the RELENG_6 branch, this time
was as follows:

*default tag=RELENG_6 date=2006.04.30.03.57.00   (Works OK)

*default tag=RELENG_6 date=2006.04.30.03.58.00   (Broken)


So what's changed at that delta, under the one that works vfs_lookup.c is:

 Edit src/sys/kern/vfs_lookup.c
  Add delta 1.80.2.6 2006.03.31.07.39.24 kris


Under the one that fails the vfs_lookup.c is:

 Edit src/sys/kern/vfs_lookup.c
  Add delta 1.80.2.7 2006.04.30.03.57.46 kris



 So I stand corrected on my last post, the issue is in fact in this module, as
just taking that module back to 1.80.2.6 fixes the problem with my server.   I
even took multiple NFS clients and gave them a heavy workload, and CPU still
remained reasonable, and very responsive.  As soon as I rev to the new
version, NFS breaks badly and even a single client doing something like a du
of a directory structure results in sluggishness and extreme CPU usage.

 I am not a coder, so not sure why this module was changed, but unless there
is some good reason why the changes were needed I would suspect it needs to be
rolled back, or something fixed.   So Rong-en Fan, I think you were dead on
with your analysis that the issue is in fact inside the vfs_lookup.c module.


I hope this helps...



---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Howard Leadmon
 Sent: Wednesday, May 24, 2006 1:23 PM
 To: 'Rong-en Fan'
 Cc: 'Konstantin Belousov'; freebsd-stable@freebsd.org
 Subject: RE: Trouble with NFSd under 6.1-Stable, any ideas?
 
 
Hello Rong-en,
 
 
 As an update, I did the below, and I still had the issue with 
 either version
 of vfs_lookup.c compiled in and running.   
 
 On the bright side, I didn't realize you could step through 
 the cvs by date, guess I just never paid attention.  So I 
 just stepped back to 'tag=RELENG_6 date=2006.04.20.00.00.00' 
 on my server, rebuilt and violla nfs is now running
 perfect.   
 
  So backing out something has fixed my problem, now to figure 
 out just what it
 was.   As I don't know what has caused this, I have done 
 complete buildworlds
 to make sure everything updates which takes a few hours.I 
 am going to
 start moving the cvs date forward till I get the problem 
 back, once I nail this down a bit more, I'll let you know 
 what I come up with.
 
 
 
 ---
 Howard Leadmon
 http://www.leadmon.net
 
  
 
  -Original Message-
  From: Rong-en Fan [mailto:[EMAIL PROTECTED]
  Sent: Tuesday, May 23, 2006 3:09 PM
  To: Howard Leadmon
  Cc: freebsd-stable@freebsd.org
  Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
  
  On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote:
  
  Hello Rong-en,
  
Thanks for the info on getting the debugger configured,
  and on the serial
   console.   I will have to try and play with the serial 
  console thing more, I
   just tried putting in the flags and the damn thing hung, I
  had to boot
   from CD and take the stuff back out.
  
One thing you mention below that concerns me is that you
  have version 1.90 of
   the vfs_lookup.c file.   I just did a less on 
  /usr/src/sys/kern/vfs_lookup.c
   and I see the following:
  
   FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30
  03:57:46 kris
   Exp
  
  
I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure
  I had the
   current stuff before rebuilding the kernel just now, and
  still I see the same thing.
   Is something fishy going on here, or did you by chance 
 make a typo??
  
  Sorry for the confusion. rev 1.90 is the number for -HEAD. 
 To back out 
  this MFC'ed change for RELENG_6_1, please cvsup to
  RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is
  
  1.80.2.6 2006/03/31 07:39:24 kris
  
  To verify the effect of this revision. Please run RELENG_6_1 with 
  2006.04.30.03.57.00 and 2006.04.30.04.00.00.
  
  Regards,
  Rong-En Fan
  
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to 
 [EMAIL PROTECTED]
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-24 Thread Kris Kennaway
On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote:

 So what's changed at that delta, under the one that works vfs_lookup.c is:
 
  Edit src/sys/kern/vfs_lookup.c
   Add delta 1.80.2.6 2006.03.31.07.39.24 kris
 
 
 Under the one that fails the vfs_lookup.c is:
 
  Edit src/sys/kern/vfs_lookup.c
   Add delta 1.80.2.7 2006.04.30.03.57.46 kris
 
 
 
  So I stand corrected on my last post, the issue is in fact in this module, as
 just taking that module back to 1.80.2.6 fixes the problem with my server.   I
 even took multiple NFS clients and gave them a heavy workload, and CPU still
 remained reasonable, and very responsive.  As soon as I rev to the new
 version, NFS breaks badly and even a single client doing something like a du
 of a directory structure results in sluggishness and extreme CPU usage.

Yep, unfortunately this commit was necessary to fix other bugs.  Jeff
said he should have time to look at it next week.

Kris


pgpjfHm2NRHm6.pgp
Description: PGP signature


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Konstantin Belousov
On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
 On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote:
 On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
 
 Hello All,
 
   I have been running FBSD a long while, and actually running since the 
 5.x
  releases on the server I am having troubles with.   I basically have a 
 small
  network and just use NIS/NFS to link my various FBSD and Solaris machines
  together.
 
   This has all been running fine up till a few days ago, when all of a 
 sudden
  NFS came to a crawl, and CPU usage so high the box appears to freeze 
 almost.
  When I had 6.1-RC running all seemed well, then came the announcement 
 for the
  official 6.1 release, so I did the cvs updates, made world, kernel, and 
 ran
  mergemaster to get everything up to the 6.1 stable version.
 
   Now after doing this, something is wrong with NFS.   It works, it will 
 return
  information and open files, just it's very very slow, and while 
 performing a
  request the CPU spike is astounding.  A simple du of my home directory 
 can
  take minutes, and machine all but locks up if the request is done over 
 NFS.
  Here is top snip:
 
PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU 
 COMMAND
497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
 
 
   This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM 
 on a
  disk array, and locally is screams, heck NFS used to scream till I 
 updated.  I
  am not really sure what info would be useful in debugging, so won't post 
 tons
  of misc junk in this eMail, but if anyone has any ideas as to how best to
  figure out and resolve this issue it would sure be appreicated...
 
 Use tcpdump and related tools to find out what traffic is being sent.
 
 Also verify that you did not change your system configuration in any
 way: there have been no changes to NFS since the release, so it is
 unclear why an update would cause the problem to suddenly occur.
 
 Kris
 
 Hi Kris and Howard,
 
 As I posted few days ago, I have similar problems like Howard's
 (some details in the thread 6.1-RELEASE, em0 high interrupt rate
 and nfsd eats lots of cpu on stable@). After binary searching
 the source tree, I found that
 
 RELENG_6_1, 2006.04.30.03.57 ok
 RELENG_6_1, 2006.04.30.04.00 bad
 
 The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
 With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
 the same problem occurs.
 
 Let me refresh what problems I'm seeing
 
 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
   a nfs directory
 2. on server-side, nfsd starts to eats lots of CPU
 3. the du finishes
 4. on server-side, nfsd still eats lots of CPU, but there is no
   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
   running and eats lots of CPU.
 
 On FreeBSD 6.1R client, it uses UDP mount and fstab is like
 rw,-L,nosuid,bg,nodev. On Linux cleint, it uses UDP mount and
 fstab is like defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192.
 The server's kernel conf is at
 
 http://www.rafan.org/FreeBSD/nfs/KERNEL
 
 Some related configuration files:
 
 /etc/export
  /export/dir1 host1 host2...
  /export/dir2 host1 host2...
 
 /etc/rc.conf
 nfs_server_enable=YES
 nfs_server_flags=-u -t -n 16
 mountd_enable=YES
 mountd_flags=-r -l -n
 rpc_lockd_enable=YES
 rpc_statd_enable=YES
 rpcbind_enable=YES
 
 /etc/fstab:
 /dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
 /dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2
 
 The NFS server is also using amd to mount some backup directories
 from another NFS server. the amd.conf is
 
 [global]
 browsable_dirs = yes
 map_type = file
 mount_type = nfs
 auto_dir = /nfs
 fully_qualified_hosts = no
 log_file = syslog
 nfs_proto = udp
 nfs_allow_insecure_port = no
 nfs_vers = 3
 # plock = yes
 selectors_on_default = yes
 restart_mounts = yes
 
 [/backup]
 map_options = type:=direct
 map_name = /etc/amd.direct
 
 /etc/amd.direct:
 /defaults
 opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
 backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}
 
 
 If there are any thing I can provide to help tracking this down. Please
 let me know. By the way, I tried with truss/kdump to see what happens
 when nfsd eats lot of CPUs, but in vain. They do not return anything.
 
I tried your recipe on 7-CURRENT with locally exported fs, remounted
over nfs. I did not get the behaviour your described.

Could you, please, provide the backtrace for the nfsd that
eats the CPU (from the ddb). I think it would be helpful to get several
backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...) to
see where it running.

Also, just in case, does filesystem that is exported and shows problem,
have quotas enabled ? One line of your fstab has userquotas, other does not.


pgpbHTNNDGLgc.pgp
Description: PGP signature


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Howard Leadmon


   If there are any thing I can provide to help tracking this down. 
   Please let me know. By the way, I tried with truss/kdump 
 to see what 
   happens when nfsd eats lot of CPUs, but in vain. They do 
 not return anything.
  
  I tried your recipe on 7-CURRENT with locally exported fs, 
 remounted 
  over nfs. I did not get the behaviour your described.
 
 As noted in my previous thread, I have another 6.1-RELEASE 
 nfs server, which does not have this problem.
 
  Could you, please, provide the backtrace for the nfsd that eats the 
  CPU (from the ddb). I think it would be helpful to get several 
  backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...) 
 to see where 
  it running.
 
 I'm afraid that I can not do that. Last time I tried breaking 
 into ddb (on 5.x), it hangs my serial console and the server 
 is miles away :-( . Perhaps we can ask Howard to do that?

 I am more than willing to do that, as this machine runs here with me, so if
needed I can easily get on a console, or perform a reboot.  Can one of you
shed a little light on exactly what I need to do, and how to do this?  I ask
as I have never used this ddb stuff, so not clue one on how to go about
getting the information your looking to find.  Guess I have been lucky, and
just never had an issue that took things to this level.

 
  Also, just in case, does filesystem that is exported and shows 
  problem, have quotas enabled ? One line of your fstab has 
 userquotas, other does not.

As to userquotas, I just tried accessing the NFS mounts here, as some have
filesystems with quotas, and some don't, and both are exibiting the exact same
problem.  So using quotas is for sure not the problem, or should I say not the
trigger to the problem.
 

 Regards,
 Rong-En Fan



---
Howard Leadmon
http://www.leadmon.net
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan

On 5/23/06, Konstantin Belousov [EMAIL PROTECTED] wrote:

On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:
 On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote:
 On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
 
 Hello All,
 
   I have been running FBSD a long while, and actually running since the
 5.x
  releases on the server I am having troubles with.   I basically have a
 small
  network and just use NIS/NFS to link my various FBSD and Solaris machines
  together.
 
   This has all been running fine up till a few days ago, when all of a
 sudden
  NFS came to a crawl, and CPU usage so high the box appears to freeze
 almost.
  When I had 6.1-RC running all seemed well, then came the announcement
 for the
  official 6.1 release, so I did the cvs updates, made world, kernel, and
 ran
  mergemaster to get everything up to the 6.1 stable version.
 
   Now after doing this, something is wrong with NFS.   It works, it will
 return
  information and open files, just it's very very slow, and while
 performing a
  request the CPU spike is astounding.  A simple du of my home directory
 can
  take minutes, and machine all but locks up if the request is done over
 NFS.
  Here is top snip:
 
PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU
 COMMAND
497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
 
 
   This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM
 on a
  disk array, and locally is screams, heck NFS used to scream till I
 updated.  I
  am not really sure what info would be useful in debugging, so won't post
 tons
  of misc junk in this eMail, but if anyone has any ideas as to how best to
  figure out and resolve this issue it would sure be appreicated...
 
 Use tcpdump and related tools to find out what traffic is being sent.
 
 Also verify that you did not change your system configuration in any
 way: there have been no changes to NFS since the release, so it is
 unclear why an update would cause the problem to suddenly occur.
 
 Kris

 Hi Kris and Howard,

 As I posted few days ago, I have similar problems like Howard's
 (some details in the thread 6.1-RELEASE, em0 high interrupt rate
 and nfsd eats lots of cpu on stable@). After binary searching
 the source tree, I found that

 RELENG_6_1, 2006.04.30.03.57 ok
 RELENG_6_1, 2006.04.30.04.00 bad

 The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
 With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
 the same problem occurs.

 Let me refresh what problems I'm seeing

 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
   a nfs directory
 2. on server-side, nfsd starts to eats lots of CPU
 3. the du finishes
 4. on server-side, nfsd still eats lots of CPU, but there is no
   nfs traffic. Wait for 5 minutes, you can still see that nfsd is
   running and eats lots of CPU.

 On FreeBSD 6.1R client, it uses UDP mount and fstab is like
 rw,-L,nosuid,bg,nodev. On Linux cleint, it uses UDP mount and
 fstab is like defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192.
 The server's kernel conf is at

 http://www.rafan.org/FreeBSD/nfs/KERNEL

 Some related configuration files:

 /etc/export
  /export/dir1 host1 host2...
  /export/dir2 host1 host2...

 /etc/rc.conf
 nfs_server_enable=YES
 nfs_server_flags=-u -t -n 16
 mountd_enable=YES
 mountd_flags=-r -l -n
 rpc_lockd_enable=YES
 rpc_statd_enable=YES
 rpcbind_enable=YES

 /etc/fstab:
 /dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
 /dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2

 The NFS server is also using amd to mount some backup directories
 from another NFS server. the amd.conf is

 [global]
 browsable_dirs = yes
 map_type = file
 mount_type = nfs
 auto_dir = /nfs
 fully_qualified_hosts = no
 log_file = syslog
 nfs_proto = udp
 nfs_allow_insecure_port = no
 nfs_vers = 3
 # plock = yes
 selectors_on_default = yes
 restart_mounts = yes

 [/backup]
 map_options = type:=direct
 map_name = /etc/amd.direct

 /etc/amd.direct:
 /defaults
 opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
 backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}


 If there are any thing I can provide to help tracking this down. Please
 let me know. By the way, I tried with truss/kdump to see what happens
 when nfsd eats lot of CPUs, but in vain. They do not return anything.

I tried your recipe on 7-CURRENT with locally exported fs, remounted
over nfs. I did not get the behaviour your described.


As noted in my previous thread, I have another 6.1-RELEASE nfs server,
which does not have this problem.


Could you, please, provide the backtrace for the nfsd that
eats the CPU (from the ddb). I think it would be helpful to get several
backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...) to
see where it running.


I'm afraid that I can not do that. Last time I tried breaking into ddb (on 5.x),
it hangs my serial console and the server is miles away :-( . Perhaps we

Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan

On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote:



   If there are any thing I can provide to help tracking this down.
   Please let me know. By the way, I tried with truss/kdump
 to see what
   happens when nfsd eats lot of CPUs, but in vain. They do
 not return anything.
  
  I tried your recipe on 7-CURRENT with locally exported fs,
 remounted
  over nfs. I did not get the behaviour your described.

 As noted in my previous thread, I have another 6.1-RELEASE
 nfs server, which does not have this problem.

  Could you, please, provide the backtrace for the nfsd that eats the
  CPU (from the ddb). I think it would be helpful to get several
  backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...)
 to see where
  it running.

 I'm afraid that I can not do that. Last time I tried breaking
 into ddb (on 5.x), it hangs my serial console and the server
 is miles away :-( . Perhaps we can ask Howard to do that?

 I am more than willing to do that, as this machine runs here with me, so if
needed I can easily get on a console, or perform a reboot.  Can one of you
shed a little light on exactly what I need to do, and how to do this?  I ask
as I have never used this ddb stuff, so not clue one on how to go about
getting the information your looking to find.  Guess I have been lucky, and
just never had an issue that took things to this level.


At least you have to add the following to your kernel:

options KDB
options DDB

Recompile it, reboot. You would better to setup a serial console
so you can easily copy thing from ddb output. To do it, you have
to put device sio in your kernel configuration and some files
below:

/boot.config
-Dh

/boot/loader.conf
comconsole_speed=115200
machdep.conspeed=115200

/etc/ttys
ttyd0   /usr/libexec/getty std.115200 cons25  on secure

On the other machine, /etc/remote:
com1:dv=/dev/cuad0:br#115200:pa=none:

Then, use tip com1 to attach the nfs server. The above settings
assume your serial console on nfs server is on COM1 and on the
client side is also COM1. If that's not the case, please follow
Handbook for howto setup a serial console other than COM1. To
break into ddb, either use ctrl+alt+esc or send a BREAK (I think ^b
will do) via serial line. After that, you should see

db

Then you first use ps to find out the nfsd pid (better to remember
the pid which eats  lots of cpu before enter ddb). After that, do
what Konstantin suggests. I have never tried cont in db. I guess
that will return the execution back to kernel and you need to break
into ddb again to do another bt pid.

By the way, could you verify that backing out vfs_lookup.c rev 1.90
helps in your situation? If not, maybe we are seeing different problems,
and then I have to figure out how to make my serial console work
here.

Thanks,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Howard Leadmon

   Hello Rong-en,

 Thanks for the info on getting the debugger configured, and on the serial
console.   I will have to try and play with the serial console thing more, I
just tried putting in the flags and the damn thing hung, I had to boot from CD
and take the stuff back out.

 One thing you mention below that concerns me is that you have version 1.90 of
the vfs_lookup.c file.   I just did a less on /usr/src/sys/kern/vfs_lookup.c
and I see the following:

FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp


 I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current
stuff before rebuilding the kernel just now, and still I see the same thing.
Is something fishy going on here, or did you by chance make a typo??


---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

 

 -Original Message-
 From: Rong-en Fan [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 23, 2006 10:19 AM
 To: Howard Leadmon
 Cc: Konstantin Belousov; Kris Kennaway; freebsd-stable@freebsd.org
 Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
 
 On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote:
 
 
 If there are any thing I can provide to help tracking 
 this down.
 Please let me know. By the way, I tried with truss/kdump
   to see what
 happens when nfsd eats lot of CPUs, but in vain. They do
   not return anything.

I tried your recipe on 7-CURRENT with locally exported fs,
   remounted
over nfs. I did not get the behaviour your described.
  
   As noted in my previous thread, I have another 6.1-RELEASE nfs 
   server, which does not have this problem.
  
Could you, please, provide the backtrace for the nfsd that eats 
the CPU (from the ddb). I think it would be helpful to 
 get several 
backtraces (i.e., bt nfsd pid, cont, bt nfsd pid ...)
   to see where
it running.
  
   I'm afraid that I can not do that. Last time I tried 
 breaking into 
   ddb (on 5.x), it hangs my serial console and the server is miles 
   away :-( . Perhaps we can ask Howard to do that?
 
   I am more than willing to do that, as this machine runs 
 here with me, 
  so if needed I can easily get on a console, or perform a 
 reboot.  Can 
  one of you shed a little light on exactly what I need to 
 do, and how 
  to do this?  I ask as I have never used this ddb stuff, so not clue 
  one on how to go about getting the information your looking 
 to find.  
  Guess I have been lucky, and just never had an issue that 
 took things to this level.
 
 At least you have to add the following to your kernel:
 
 options KDB
 options DDB
 
 Recompile it, reboot. You would better to setup a serial 
 console so you can easily copy thing from ddb output. To do 
 it, you have to put device sio in your kernel configuration 
 and some files
 below:
 
 /boot.config
 -Dh
 
 /boot/loader.conf
 comconsole_speed=115200
 machdep.conspeed=115200
 
 /etc/ttys
 ttyd0   /usr/libexec/getty std.115200 cons25  on secure
 
 On the other machine, /etc/remote:
 com1:dv=/dev/cuad0:br#115200:pa=none:
 
 Then, use tip com1 to attach the nfs server. The above 
 settings assume your serial console on nfs server is on COM1 
 and on the client side is also COM1. If that's not the case, 
 please follow Handbook for howto setup a serial console other 
 than COM1. To break into ddb, either use ctrl+alt+esc or send 
 a BREAK (I think ^b will do) via serial line. After that, you 
 should see
 
 db
 
 Then you first use ps to find out the nfsd pid (better to 
 remember the pid which eats  lots of cpu before enter ddb). 
 After that, do what Konstantin suggests. I have never tried 
 cont in db. I guess that will return the execution back to 
 kernel and you need to break into ddb again to do another bt pid.
 
 By the way, could you verify that backing out vfs_lookup.c 
 rev 1.90 helps in your situation? If not, maybe we are seeing 
 different problems, and then I have to figure out how to make 
 my serial console work here.
 
 Thanks,
 Rong-En Fan
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-23 Thread Rong-en Fan

On 5/23/06, Howard Leadmon [EMAIL PROTECTED] wrote:


   Hello Rong-en,

 Thanks for the info on getting the debugger configured, and on the serial
console.   I will have to try and play with the serial console thing more, I
just tried putting in the flags and the damn thing hung, I had to boot from CD
and take the stuff back out.

 One thing you mention below that concerns me is that you have version 1.90 of
the vfs_lookup.c file.   I just did a less on /usr/src/sys/kern/vfs_lookup.c
and I see the following:

FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp


 I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current
stuff before rebuilding the kernel just now, and still I see the same thing.
Is something fishy going on here, or did you by chance make a typo??


Sorry for the confusion. rev 1.90 is the number for -HEAD. To back
out this MFC'ed change for RELENG_6_1, please cvsup to
RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it
is

1.80.2.6 2006/03/31 07:39:24 kris

To verify the effect of this revision. Please run RELENG_6_1 with
2006.04.30.03.57.00 and 2006.04.30.04.00.00.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-22 Thread Rong-en Fan

On 5/14/06, Kris Kennaway [EMAIL PROTECTED] wrote:

On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:

Hello All,

  I have been running FBSD a long while, and actually running since the 5.x
 releases on the server I am having troubles with.   I basically have a small
 network and just use NIS/NFS to link my various FBSD and Solaris machines
 together.

  This has all been running fine up till a few days ago, when all of a sudden
 NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
 When I had 6.1-RC running all seemed well, then came the announcement for the
 official 6.1 release, so I did the cvs updates, made world, kernel, and ran
 mergemaster to get everything up to the 6.1 stable version.

  Now after doing this, something is wrong with NFS.   It works, it will return
 information and open files, just it's very very slow, and while performing a
 request the CPU spike is astounding.  A simple du of my home directory can
 take minutes, and machine all but locks up if the request is done over NFS.
 Here is top snip:

   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd


  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
 disk array, and locally is screams, heck NFS used to scream till I updated.  I
 am not really sure what info would be useful in debugging, so won't post tons
 of misc junk in this eMail, but if anyone has any ideas as to how best to
 figure out and resolve this issue it would sure be appreicated...

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris


Hi Kris and Howard,

As I posted few days ago, I have similar problems like Howard's
(some details in the thread 6.1-RELEASE, em0 high interrupt rate
and nfsd eats lots of cpu on stable@). After binary searching
the source tree, I found that

RELENG_6_1, 2006.04.30.03.57 ok
RELENG_6_1, 2006.04.30.04.00 bad

The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
the same problem occurs.

Let me refresh what problems I'm seeing

1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on
  a nfs directory
2. on server-side, nfsd starts to eats lots of CPU
3. the du finishes
4. on server-side, nfsd still eats lots of CPU, but there is no
  nfs traffic. Wait for 5 minutes, you can still see that nfsd is
  running and eats lots of CPU.

On FreeBSD 6.1R client, it uses UDP mount and fstab is like
rw,-L,nosuid,bg,nodev. On Linux cleint, it uses UDP mount and
fstab is like defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192.
The server's kernel conf is at

http://www.rafan.org/FreeBSD/nfs/KERNEL

Some related configuration files:

/etc/export
 /export/dir1 host1 host2...
 /export/dir2 host1 host2...

/etc/rc.conf
nfs_server_enable=YES
nfs_server_flags=-u -t -n 16
mountd_enable=YES
mountd_flags=-r -l -n
rpc_lockd_enable=YES
rpc_statd_enable=YES
rpcbind_enable=YES

/etc/fstab:
/dev/...  /export/dir1 ufs rw,nosuid,noexec 2 2
/dev/...  /export/dir2 ufs rw,nosuid,noexec,userquota 2 2

The NFS server is also using amd to mount some backup directories
from another NFS server. the amd.conf is

[global]
browsable_dirs = yes
map_type = file
mount_type = nfs
auto_dir = /nfs
fully_qualified_hosts = no
log_file = syslog
nfs_proto = udp
nfs_allow_insecure_port = no
nfs_vers = 3
# plock = yes
selectors_on_default = yes
restart_mounts = yes

[/backup]
map_options = type:=direct
map_name = /etc/amd.direct

/etc/amd.direct:
/defaults
opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192
backup  type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host}


If there are any thing I can provide to help tracking this down. Please
let me know. By the way, I tried with truss/kdump to see what happens
when nfsd eats lot of CPUs, but in vain. They do not return anything.

Regards,
Rong-En Fan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-22 Thread Kris Kennaway
On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote:

 As I posted few days ago, I have similar problems like Howard's
 (some details in the thread 6.1-RELEASE, em0 high interrupt rate
 and nfsd eats lots of cpu on stable@). After binary searching
 the source tree, I found that
 
 RELENG_6_1, 2006.04.30.03.57 ok
 RELENG_6_1, 2006.04.30.04.00 bad
 
 The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91.
 With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90,
 the same problem occurs.

Thanks for tracking this down, I'll see what Jeff has to say.

Kris


pgpxP6IYZlVir.pgp
Description: PGP signature


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-20 Thread Howard Leadmon

 Sorry for delay, ended up sick.. :(

 You say use tcpdump, is there something I should be looking out for?   As NFS
is serving files, even more strange is if I kill off the nfsd process it's
zippy fast for a moment and then the CPU load goes through the roof, and it
starts serving files slowly.   So it's actually working, outside of it
consumes all available CPU and brings the machine to it knees quickly.
Doesn't matter if I access it from my Solaris box, my other FBSD boxes, and so
on, it still dogs down terribly and never used.

 Anything anyone can think of config wise that might cause that, it would be
nice to know.  I have the following that I can think of that affects NFS
configs:

#
# NFS 
#
nfs_client_enable=NO  # This host is an NFS client (or NO).
nfs_access_cache=2# Client cache timeout in seconds
nfs_server_enable=YES # This host is an NFS server (or NO).
nfs_server_flags=-u -t -n 5   # Flags to nfsd (if enabled).
mountd_enable=YES # Run mountd (or NO).
mountd_flags=-r   # Flags to mountd (if NFS server enabled).
weak_mountd_authentication=NO # Allow non-root mount requests to be served.
nfs_reserved_port_only=YES# Provide NFS only on secure port (or NO).
nfs_bufpackets=   # bufspace (in packets) for client
rpc_lockd_enable=YES  # Run NFS rpc.lockd needed for client/server.
rpc_statd_enable=YES  # Run NFS rpc.statd needed for client/server.
rpcbind_enable=YES# Run the portmapper service (YES/NO).
rpcbind_program=/usr/sbin/rpcbind # path to rpcbind, if you want a
differe
rpcbind_flags=# Flags to rpcbind (if enabled).



 I can't think of anything that should have changed, unless mergemaster
updating the default files might have changed something that would have an
effect.



---
Howard Leadmon
http://www.leadmon.net

 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Kris Kennaway
 Sent: Sunday, May 14, 2006 10:50 PM
 To: Howard Leadmon
 Cc: freebsd-stable@freebsd.org
 Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
 
 On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
  
 Hello All,
  
   I have been running FBSD a long while, and actually 
 running since the 5.x
  releases on the server I am having troubles with.   I 
 basically have a small
  network and just use NIS/NFS to link my various FBSD and Solaris 
  machines together.
  
   This has all been running fine up till a few days ago, 
 when all of a 
  sudden NFS came to a crawl, and CPU usage so high the box 
 appears to freeze almost.
  When I had 6.1-RC running all seemed well, then came the 
 announcement 
  for the official 6.1 release, so I did the cvs updates, made world, 
  kernel, and ran mergemaster to get everything up to the 6.1 
 stable version.
  
   Now after doing this, something is wrong with NFS.   It 
 works, it will return
  information and open files, just it's very very slow, and while 
  performing a request the CPU spike is astounding.  A simple 
 du of my 
  home directory can take minutes, and machine all but locks 
 up if the request is done over NFS.
  Here is top snip:
  
PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME 
   WCPU COMMAND
497 root 1   40  1252K   780K -  2  50:42 
 188.48% nfsd
  
  
   This is a nice IBM eServer with dual P4-XEON's and a 
 couple GB or RAM 
  on a disk array, and locally is screams, heck NFS used to 
 scream till 
  I updated.  I am not really sure what info would be useful in 
  debugging, so won't post tons of misc junk in this eMail, but if 
  anyone has any ideas as to how best to figure out and 
 resolve this issue it would sure be appreicated...
 
 Use tcpdump and related tools to find out what traffic is being sent.
 
 Also verify that you did not change your system configuration in any
 way: there have been no changes to NFS since the release, so 
 it is unclear why an update would cause the problem to suddenly occur.
 
 Kris
 
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Howard Leadmon

   Hello All, 

 I have been running FBSD a long while, and actually running since the 5.x
releases on the server I am having troubles with.   I basically have a small
network and just use NIS/NFS to link my various FBSD and Solaris machines
together.

 This has all been running fine up till a few days ago, when all of a sudden
NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
When I had 6.1-RC running all seemed well, then came the announcement for the
official 6.1 release, so I did the cvs updates, made world, kernel, and ran
mergemaster to get everything up to the 6.1 stable version.

 Now after doing this, something is wrong with NFS.   It works, it will return
information and open files, just it's very very slow, and while performing a
request the CPU spike is astounding.  A simple du of my home directory can
take minutes, and machine all but locks up if the request is done over NFS.
Here is top snip:

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd


 This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
disk array, and locally is screams, heck NFS used to scream till I updated.  I
am not really sure what info would be useful in debugging, so won't post tons
of misc junk in this eMail, but if anyone has any ideas as to how best to
figure out and resolve this issue it would sure be appreicated...



---
Howard Leadmon
http://www.leadmon.net



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Stephen Hurd

Howard Leadmon wrote:
   Hello All, 


 I have been running FBSD a long while, and actually running since the 5.x
releases on the server I am having troubles with.   I basically have a small
network and just use NIS/NFS to link my various FBSD and Solaris machines
together.

 This has all been running fine up till a few days ago, when all of a sudden
NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
When I had 6.1-RC running all seemed well, then came the announcement for the
official 6.1 release, so I did the cvs updates, made world, kernel, and ran
mergemaster to get everything up to the 6.1 stable version.

 Now after doing this, something is wrong with NFS.   It works, it will return
information and open files, just it's very very slow, and while performing a
request the CPU spike is astounding.  A simple du of my home directory can
take minutes, and machine all but locks up if the request is done over NFS.
Here is top snip:

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd


 This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
disk array, and locally is screams, heck NFS used to scream till I updated.  I
am not really sure what info would be useful in debugging, so won't post tons
of misc junk in this eMail, but if anyone has any ideas as to how best to
figure out and resolve this issue it would sure be appreicated...
  
Are you running rpc.lockd?  I've had very bad luck with it since 
sometime in the 5.x series... especially with it interoperating with 
Solaris.  I submitted a PR on it, but it's apparently broken in about X 
ways.  If possible, I would suggest living without rpc.lockd for now (if 
you're currently living with it that is)


Other than that issue, NFS itself has been working nicely for me.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Michel Talon
 Are you running rpc.lockd?  I've had very bad luck with it since 
 sometime in the 5.x series... especially with it interoperating with 
 Solaris.  I submitted a PR on it, but it's apparently broken in about X 
 ways.  If possible, I would suggest living without rpc.lockd for now (if 
 you're currently living with it that is)

On the contrary NFS problems interoperating with Linux have been cleared for
me since upgrading Linux to Fedora Core 5 and FreeBSD to 6.1. In particular
rpc.lockd works, everything is OK, performance is fine. I had very bad
problems in the past, when we were running Fedora Core 3.


-- 

Michel TALON

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Kris Kennaway
On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote:
 
Hello All, 
 
  I have been running FBSD a long while, and actually running since the 5.x
 releases on the server I am having troubles with.   I basically have a small
 network and just use NIS/NFS to link my various FBSD and Solaris machines
 together.
 
  This has all been running fine up till a few days ago, when all of a sudden
 NFS came to a crawl, and CPU usage so high the box appears to freeze almost.
 When I had 6.1-RC running all seemed well, then came the announcement for the
 official 6.1 release, so I did the cvs updates, made world, kernel, and ran
 mergemaster to get everything up to the 6.1 stable version.
 
  Now after doing this, something is wrong with NFS.   It works, it will return
 information and open files, just it's very very slow, and while performing a
 request the CPU spike is astounding.  A simple du of my home directory can
 take minutes, and machine all but locks up if the request is done over NFS.
 Here is top snip:
 
   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
   497 root 1   40  1252K   780K -  2  50:42 188.48% nfsd
 
 
  This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a
 disk array, and locally is screams, heck NFS used to scream till I updated.  I
 am not really sure what info would be useful in debugging, so won't post tons
 of misc junk in this eMail, but if anyone has any ideas as to how best to
 figure out and resolve this issue it would sure be appreicated...

Use tcpdump and related tools to find out what traffic is being sent.

Also verify that you did not change your system configuration in any
way: there have been no changes to NFS since the release, so it is
unclear why an update would cause the problem to suddenly occur.

Kris



pgpiLgbpawelN.pgp
Description: PGP signature


RE: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Howard Leadmon

Would this just be lockd, or should I disable both lockd and statd?  I notice
in the rc.conf it claims they are both supposed to be enabled, so not sure
what issues I run into if I disable them, if any.

As to my servers, I have a bunch of stuff running actually.  The Dual XEON
server being my fastest machine by far is my main server, off of that I have a
SPARC running Solaris10, I have another SPARC running FreeBSD 6.1, I have a
DEC Alpha running FreeBSD 4.11, and another old Dual XEON machine running
FreeBSD 4.11, and finally another x86 machine running FreeBSD 7 current. 

This stuff has been doing great, I personally love FBSD, as you can tell from
what is loaded on most of the servers.   It just blows my mind that now after
years of running the network like this, now something breaks.  I get no errors
in syslog, and it IS serving requests, as I have some web pages on the old
4.11 machine and they come up, just slowly compared to what they used to do.
Oh well as they say, in the world of computers it's never easy..  LOL



---
Howard Leadmon - [EMAIL PROTECTED]
http://www.leadmon.net

 

 -Original Message-
 From: Stephen Hurd [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, May 14, 2006 5:54 PM
 To: Howard Leadmon
 Cc: freebsd-stable@freebsd.org
 Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas?
 
 Howard Leadmon wrote:
 Hello All,
 
   I have been running FBSD a long while, and actually 
 running since the 5.x
  releases on the server I am having troubles with.   I 
 basically have a small
  network and just use NIS/NFS to link my various FBSD and Solaris 
  machines together.
 
   This has all been running fine up till a few days ago, 
 when all of a 
  sudden NFS came to a crawl, and CPU usage so high the box 
 appears to freeze almost.
  When I had 6.1-RC running all seemed well, then came the 
 announcement 
  for the official 6.1 release, so I did the cvs updates, made world, 
  kernel, and ran mergemaster to get everything up to the 6.1 
 stable version.
 
   Now after doing this, something is wrong with NFS.   It 
 works, it will return
  information and open files, just it's very very slow, and while 
  performing a request the CPU spike is astounding.  A simple 
 du of my 
  home directory can take minutes, and machine all but locks 
 up if the request is done over NFS.
  Here is top snip:
 
PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME 
   WCPU COMMAND
497 root 1   40  1252K   780K -  2  50:42 
 188.48% nfsd
 
 
   This is a nice IBM eServer with dual P4-XEON's and a 
 couple GB or RAM 
  on a disk array, and locally is screams, heck NFS used to 
 scream till 
  I updated.  I am not really sure what info would be useful in 
  debugging, so won't post tons of misc junk in this eMail, but if 
  anyone has any ideas as to how best to figure out and 
 resolve this issue it would sure be appreicated...

 Are you running rpc.lockd?  I've had very bad luck with it 
 since sometime in the 5.x series... especially with it 
 interoperating with Solaris.  I submitted a PR on it, but 
 it's apparently broken in about X ways.  If possible, I would 
 suggest living without rpc.lockd for now (if you're currently 
 living with it that is)
 
 Other than that issue, NFS itself has been working nicely for me.
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Trouble with NFSd under 6.1-Stable, any ideas?

2006-05-14 Thread Stephen Hurd

Howard Leadmon wrote:

Would this just be lockd, or should I disable both lockd and statd?  I notice
in the rc.conf it claims they are both supposed to be enabled, so not sure
what issues I run into if I disable them, if any.
  
No need to disable rpc.statd though I don't know if any other programs 
request monitoring.  The issues you'll run into is simply a lack of any 
locks on the mounted drive.  This can easily lead to file corruption if 
multiple programs or multiple instances of a single program change the 
same file at the same time.  Many programs will use NFS-safe lockfiles 
if configure to do so, which is often a useable workaround in a 
lockd-free world.  If you're only using the files read-only, or only a 
single process uses files on the mount at a time (on *all* systems) then 
there's no issue at all.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]