Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Thu, Jun 01, 2006 at 01:06:44AM +0300, Dmitry Pryanishnikov wrote: > > Hello! > > On Thu, 25 May 2006, Konstantin Belousov wrote: > > KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), > > ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); > > > >from nfsserver/nfs_syscalls.c, line 570. > > > >As I understand the problem, kern/vfs_lookup.c:lookup() could > >aquire additional locks on Giant, indicating this by GIANTHELD > >flag in nd. All processing in nfsserver already goes with Giant held, > >so, I just dropped that excessive locks after return from lookup. > >System with patch applied survived smoke test (client did > >du on mounted dir, patch was generated from exported fs, etc.). > >nfsd eats no more than 25% of CPU (with INVARIANTS). > > > >Please, users who reported the problem and willing to help, > >try the patch (generated against STABLE) and give the feedback. > > Thank you very much. Your patch actually fixes "nfssvc_nfsd(): > debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr". > Oddly enough, NFS mount of server's "/" doesn't panic the server. Because conditions leading to Giant leak usually hold true for lookup of ".." :) pgpJ3iLod20xN.pgp Description: PGP signature
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Thu, Jun 01, 2006 at 01:06:44AM +0300, Dmitry Pryanishnikov wrote: > > Hello! > > On Thu, 25 May 2006, Konstantin Belousov wrote: > > KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), > > ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); > > > >from nfsserver/nfs_syscalls.c, line 570. > > > >As I understand the problem, kern/vfs_lookup.c:lookup() could > >aquire additional locks on Giant, indicating this by GIANTHELD > >flag in nd. All processing in nfsserver already goes with Giant held, > >so, I just dropped that excessive locks after return from lookup. > >System with patch applied survived smoke test (client did > >du on mounted dir, patch was generated from exported fs, etc.). > >nfsd eats no more than 25% of CPU (with INVARIANTS). > > > >Please, users who reported the problem and willing to help, > >try the patch (generated against STABLE) and give the feedback. > > Thank you very much. Your patch actually fixes "nfssvc_nfsd(): > debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr". > Oddly enough, NFS mount of server's "/" doesn't panic the server. > My kernel config contains "options QUOTA", however quotas are not enabled. > Please commit the fix, IMHO long-term breakage of such a basic functionality > (NFS server + quotas) in -STABLE branch isn't a Good Thing (TM). FYI, if you're not using quotas then you should remove the option from your kernel config to avoid trashing your performance. Kris pgpHaBFWdNItK.pgp Description: PGP signature
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
Hello! On Thu, 25 May 2006, Konstantin Belousov wrote: KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); from nfsserver/nfs_syscalls.c, line 570. As I understand the problem, kern/vfs_lookup.c:lookup() could aquire additional locks on Giant, indicating this by GIANTHELD flag in nd. All processing in nfsserver already goes with Giant held, so, I just dropped that excessive locks after return from lookup. System with patch applied survived smoke test (client did du on mounted dir, patch was generated from exported fs, etc.). nfsd eats no more than 25% of CPU (with INVARIANTS). Please, users who reported the problem and willing to help, try the patch (generated against STABLE) and give the feedback. Thank you very much. Your patch actually fixes "nfssvc_nfsd(): debug.mpsafenet=1 && Giant" panic during NFS mount of server's "/usr". Oddly enough, NFS mount of server's "/" doesn't panic the server. My kernel config contains "options QUOTA", however quotas are not enabled. Please commit the fix, IMHO long-term breakage of such a basic functionality (NFS server + quotas) in -STABLE branch isn't a Good Thing (TM). Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: [EMAIL PROTECTED] nic-hdl: LYNX-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
Hi! On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote: > Please, users who reported the problem and willing to help, > try the patch (generated against STABLE) and give the feedback. I test it with RELENG_6 from 25 May 2006. It's work fine. Thank you. WBR -- Dmitriy Kirhlarov OILspace, 26 Leninskaya sloboda, bld. 2, 2nd floor, 115280 Moscow, Russia P:+7 495 105 7247 ext.203 F:+7 495 105 7246 E:[EMAIL PROTECTED] OILspace - The resource enriched - www.oilspace.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Thu, May 25, 2006 at 05:58:09PM +0300, Konstantin Belousov wrote: > +options QUOTA > options UFS_ACL # Support for access control lists > options UFS_DIRHASH # Improve performance on big directories > options MD_ROOT # MD is a potential root device > > After that, server machine easily panics on > > KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), > ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); > > from nfsserver/nfs_syscalls.c, line 570. OK, I am also seeing this panic when I try and export a non-mpsafe filesystem (e.g. cd9660). I can't test the patch because my NFS server subsequently blew up :-( Kris pgpQYaxj9UfkJ.pgp Description: PGP signature
Re: [patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/25/06, Konstantin Belousov <[EMAIL PROTECTED]> wrote: On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote: > On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote: > > > So what's changed at that delta, under the one that works vfs_lookup.c is: > > > > Edit src/sys/kern/vfs_lookup.c > > Add delta 1.80.2.6 2006.03.31.07.39.24 kris > > > > > > Under the one that fails the vfs_lookup.c is: > > > > Edit src/sys/kern/vfs_lookup.c > > Add delta 1.80.2.7 2006.04.30.03.57.46 kris > > > > > > > > So I stand corrected on my last post, the issue is in fact in this module, as > > just taking that module back to 1.80.2.6 fixes the problem with my server. I > > even took multiple NFS clients and gave them a heavy workload, and CPU still > > remained reasonable, and very responsive. As soon as I rev to the new > > version, NFS breaks badly and even a single client doing something like a du > > of a directory structure results in sluggishness and extreme CPU usage. > > Yep, unfortunately this commit was necessary to fix other bugs. Jeff > said he should have time to look at it next week. > > Kris I tried to debug the problem. First, I have to admit that I cannot reproduce the problem on GENERIC kernel. Only after QUOTAS where added, and, correspondingly, UFS started to require Giant, I get described behaviour. Below are the changes to GENERIC config file I made to reproduce problem. [...] After that, server machine easily panics on KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); from nfsserver/nfs_syscalls.c, line 570. As I understand the problem, kern/vfs_lookup.c:lookup() could aquire additional locks on Giant, indicating this by GIANTHELD flag in nd. All processing in nfsserver already goes with Giant held, so, I just dropped that excessive locks after return from lookup. System with patch applied survived smoke test (client did du on mounted dir, patch was generated from exported fs, etc.). nfsd eats no more than 25% of CPU (with INVARIANTS). Please, users who reported the problem and willing to help, try the patch (generated against STABLE) and give the feedback. [...] Hi Konstantin and others, I'm now running RELENG_6_1 as of Apr 30 04:00 UTC source + your patch. The nfsd is quite happy! After client's du finishes, it stays idle as expected (eats 0.00% CPU). Thank you very much. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
[patch, try 1] Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Thu, May 25, 2006 at 01:19:26AM -0400, Kris Kennaway wrote: > On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote: > > > So what's changed at that delta, under the one that works vfs_lookup.c is: > > > > Edit src/sys/kern/vfs_lookup.c > > Add delta 1.80.2.6 2006.03.31.07.39.24 kris > > > > > > Under the one that fails the vfs_lookup.c is: > > > > Edit src/sys/kern/vfs_lookup.c > > Add delta 1.80.2.7 2006.04.30.03.57.46 kris > > > > > > > > So I stand corrected on my last post, the issue is in fact in this module, > > as > > just taking that module back to 1.80.2.6 fixes the problem with my server. > > I > > even took multiple NFS clients and gave them a heavy workload, and CPU still > > remained reasonable, and very responsive. As soon as I rev to the new > > version, NFS breaks badly and even a single client doing something like a du > > of a directory structure results in sluggishness and extreme CPU usage. > > Yep, unfortunately this commit was necessary to fix other bugs. Jeff > said he should have time to look at it next week. > > Kris I tried to debug the problem. First, I have to admit that I cannot reproduce the problem on GENERIC kernel. Only after QUOTAS where added, and, correspondingly, UFS started to require Giant, I get described behaviour. Below are the changes to GENERIC config file I made to reproduce problem. Index: amd64/conf/GENERIC === RCS file: /usr/local/arch/ncvs/src/sys/amd64/conf/GENERIC,v retrieving revision 1.439.2.11 diff -u -r1.439.2.11 GENERIC --- amd64/conf/GENERIC 30 Apr 2006 17:39:43 - 1.439.2.11 +++ amd64/conf/GENERIC 25 May 2006 14:44:14 - @@ -26,6 +26,19 @@ #hints "GENERIC.hints" # Default places to look for devices. makeoptionsDEBUG=-g# Build kernel with gdb(1) debug symbols +optionsKDB +optionsKDB_TRACE +#options KDB_UNATTENDED +optionsDDB +optionsDDB_NUMSYM +optionsBREAK_TO_DEBUGGER +options INVARIANTS +options INVARIANT_SUPPORT +options WITNESS +options DEBUG_LOCKS +options DEBUG_VFS_LOCKS +options DIAGNOSTIC +optionsMUTEX_PROFILING #options SCHED_ULE # ULE scheduler optionsSCHED_4BSD # 4BSD scheduler @@ -34,6 +47,7 @@ optionsINET6 # IPv6 communications protocols optionsFFS # Berkeley Fast Filesystem optionsSOFTUPDATES # Enable FFS soft updates support +optionsQUOTA optionsUFS_ACL # Support for access control lists optionsUFS_DIRHASH # Improve performance on big directories optionsMD_ROOT # MD is a potential root device After that, server machine easily panics on KASSERT(!(debug_mpsafenet == 1 && mtx_owned(&Giant)), ("nfssvc_nfsd(): debug.mpsafenet=1 && Giant")); from nfsserver/nfs_syscalls.c, line 570. As I understand the problem, kern/vfs_lookup.c:lookup() could aquire additional locks on Giant, indicating this by GIANTHELD flag in nd. All processing in nfsserver already goes with Giant held, so, I just dropped that excessive locks after return from lookup. System with patch applied survived smoke test (client did du on mounted dir, patch was generated from exported fs, etc.). nfsd eats no more than 25% of CPU (with INVARIANTS). Please, users who reported the problem and willing to help, try the patch (generated against STABLE) and give the feedback. Index: nfsserver/nfs_serv.c === RCS file: /usr/local/arch/ncvs/src/sys/nfsserver/nfs_serv.c,v retrieving revision 1.156.2.2 diff -u -r1.156.2.2 nfs_serv.c --- nfsserver/nfs_serv.c13 Mar 2006 03:06:49 - 1.156.2.2 +++ nfsserver/nfs_serv.c25 May 2006 14:44:25 - @@ -569,6 +569,10 @@ error = lookup(&ind); ind.ni_dvp = NULL; + if (ind.ni_cnd.cn_flags & GIANTHELD) { + mtx_unlock(&Giant); + ind.ni_cnd.cn_flags &= ~GIANTHELD; + } if (error == 0) { /* @@ -1915,6 +1919,10 @@ error = lookup(&nd); nd.ni_dvp = NULL; + if (nd.ni_cnd.cn_flags & GIANTHELD) { + mtx_unlock(&Giant); + nd.ni_cnd.cn_flags &= ~GIANTHELD; + } if (error) goto ereply; @@ -2141,6 +2149,10 @@ error = lookup(&nd); nd.ni_dvp = NULL; + if (nd.ni_cnd.cn_flags & GIANTHELD) { + mtx_unlock(&
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Wed, May 24, 2006 at 11:48:53PM -0400, Howard Leadmon wrote: > So what's changed at that delta, under the one that works vfs_lookup.c is: > > Edit src/sys/kern/vfs_lookup.c > Add delta 1.80.2.6 2006.03.31.07.39.24 kris > > > Under the one that fails the vfs_lookup.c is: > > Edit src/sys/kern/vfs_lookup.c > Add delta 1.80.2.7 2006.04.30.03.57.46 kris > > > > So I stand corrected on my last post, the issue is in fact in this module, as > just taking that module back to 1.80.2.6 fixes the problem with my server. I > even took multiple NFS clients and gave them a heavy workload, and CPU still > remained reasonable, and very responsive. As soon as I rev to the new > version, NFS breaks badly and even a single client doing something like a du > of a directory structure results in sluggishness and extreme CPU usage. Yep, unfortunately this commit was necessary to fix other bugs. Jeff said he should have time to look at it next week. Kris pgpjfHm2NRHm6.pgp Description: PGP signature
RE: Trouble with NFSd under 6.1-Stable, any ideas?
I need to follow up to the below, as I am not sure why the below test with the vfs_lookup.c didn't pan out the first time, but with my new found knowledge on cvs I was determined to regress the system till I found the smoking gun so to speak, which I have done. First let me say that instead of running RELENG_6_1 like Rong-en is, I am running the RELENG_6 tree that I know updates more often, but seems to work well for me. OK, so as I said above I started to regress the system a couple days at a time, till suddenly NFS stared working again, so I knew at that point it was a change that was made. So then I started to narrow the time range, till I got to the point that it broke. Sure enough under the RELENG_6 branch, this time was as follows: *default tag=RELENG_6 date=2006.04.30.03.57.00 (Works OK) *default tag=RELENG_6 date=2006.04.30.03.58.00 (Broken) So what's changed at that delta, under the one that works vfs_lookup.c is: Edit src/sys/kern/vfs_lookup.c Add delta 1.80.2.6 2006.03.31.07.39.24 kris Under the one that fails the vfs_lookup.c is: Edit src/sys/kern/vfs_lookup.c Add delta 1.80.2.7 2006.04.30.03.57.46 kris So I stand corrected on my last post, the issue is in fact in this module, as just taking that module back to 1.80.2.6 fixes the problem with my server. I even took multiple NFS clients and gave them a heavy workload, and CPU still remained reasonable, and very responsive. As soon as I rev to the new version, NFS breaks badly and even a single client doing something like a du of a directory structure results in sluggishness and extreme CPU usage. I am not a coder, so not sure why this module was changed, but unless there is some good reason why the changes were needed I would suspect it needs to be rolled back, or something fixed. So Rong-en Fan, I think you were dead on with your analysis that the issue is in fact inside the vfs_lookup.c module. I hope this helps... --- Howard Leadmon - [EMAIL PROTECTED] http://www.leadmon.net > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Howard Leadmon > Sent: Wednesday, May 24, 2006 1:23 PM > To: 'Rong-en Fan' > Cc: 'Konstantin Belousov'; freebsd-stable@freebsd.org > Subject: RE: Trouble with NFSd under 6.1-Stable, any ideas? > > >Hello Rong-en, > > > As an update, I did the below, and I still had the issue with > either version > of vfs_lookup.c compiled in and running. > > On the bright side, I didn't realize you could step through > the cvs by date, guess I just never paid attention. So I > just stepped back to 'tag=RELENG_6 date=2006.04.20.00.00.00' > on my server, rebuilt and violla nfs is now running > perfect. > > So backing out something has fixed my problem, now to figure > out just what it > was. As I don't know what has caused this, I have done > complete buildworlds > to make sure everything updates which takes a few hours.I > am going to > start moving the cvs date forward till I get the problem > back, once I nail this down a bit more, I'll let you know > what I come up with. > > > > --- > Howard Leadmon > http://www.leadmon.net > > > > > -Original Message- > > From: Rong-en Fan [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, May 23, 2006 3:09 PM > > To: Howard Leadmon > > Cc: freebsd-stable@freebsd.org > > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? > > > > On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote: > > > > > >Hello Rong-en, > > > > > > Thanks for the info on getting the debugger configured, > > and on the serial > > > console. I will have to try and play with the serial > > console thing more, I > > > just tried putting in the flags and the damn thing hung, I > > had to boot > > > from CD and take the stuff back out. > > > > > > One thing you mention below that concerns me is that you > > have version 1.90 of > > > the vfs_lookup.c file. I just did a less on > > /usr/src/sys/kern/vfs_lookup.c > > > and I see the following: > > > > > > FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 > > 03:57:46 kris > > > Exp > > > > > > > > > I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure > > I had the > > > current stuff before rebuilding the kernel just now, and > > still I see the same thing. > > > Is something fishy going on here, or did you by chance > make a typo?? > > > > Sorry for the confusion. rev 1.90 is the number for -HEAD. > To back out > > this MFC'ed change for RELE
RE: Trouble with NFSd under 6.1-Stable, any ideas?
Another data point: One of our NFS servers is an amd64 based system serving a cluster of web and email servers. Under 6.1-RCx it gave us the same (or better) performance than the server it replaced (which was 4.11). The server load hovered between 0.x and 1.x But after upping it to 6.1-STABLE the load now hovers between 5.x and 6.x with spikes as high as 8.x, and there has been no change at all in the NFS client traffic or other loading factors that we can tell. This in turn makes for slower NFS client accesses. I am going to try reverting to an earlier src tree and see if that helps. Mark -- Mark Morley Owner / Administrator Islandnet.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Trouble with NFSd under 6.1-Stable, any ideas?
Hello Rong-en, As an update, I did the below, and I still had the issue with either version of vfs_lookup.c compiled in and running. On the bright side, I didn't realize you could step through the cvs by date, guess I just never paid attention. So I just stepped back to 'tag=RELENG_6 date=2006.04.20.00.00.00' on my server, rebuilt and violla nfs is now running perfect. So backing out something has fixed my problem, now to figure out just what it was. As I don't know what has caused this, I have done complete buildworlds to make sure everything updates which takes a few hours.I am going to start moving the cvs date forward till I get the problem back, once I nail this down a bit more, I'll let you know what I come up with. --- Howard Leadmon http://www.leadmon.net > -Original Message- > From: Rong-en Fan [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 23, 2006 3:09 PM > To: Howard Leadmon > Cc: freebsd-stable@freebsd.org > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? > > On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote: > > > >Hello Rong-en, > > > > Thanks for the info on getting the debugger configured, > and on the serial > > console. I will have to try and play with the serial > console thing more, I > > just tried putting in the flags and the damn thing hung, I > had to boot > > from CD and take the stuff back out. > > > > One thing you mention below that concerns me is that you > have version 1.90 of > > the vfs_lookup.c file. I just did a less on > /usr/src/sys/kern/vfs_lookup.c > > and I see the following: > > > > FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 > 03:57:46 kris > > Exp > > > > > > I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure > I had the > > current stuff before rebuilding the kernel just now, and > still I see the same thing. > > Is something fishy going on here, or did you by chance make a typo?? > > Sorry for the confusion. rev 1.90 is the number for -HEAD. To > back out this MFC'ed change for RELENG_6_1, please cvsup to > RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is > > 1.80.2.6 2006/03/31 07:39:24 kris > > To verify the effect of this revision. Please run RELENG_6_1 > with 2006.04.30.03.57.00 and 2006.04.30.04.00.00. > > Regards, > Rong-En Fan > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
"Rong-en Fan" <[EMAIL PROTECTED]> wrote: On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: [...] Use tcpdump and related tools to find out what traffic is being sent. Also verify that you did not change your system configuration in any way: there have been no changes to NFS since the release, so it is unclear why an update would cause the problem to suddenly occur. Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread "6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu" on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. [...] Confirmed! I can create the problem here at will. Setup 1: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006 with sys/kern/vfs_lookup.c 1.80.2.7, NFS schurks FreeBSD 6.1-STABLE as of 15. May 2006. /usr/src from testido mounted on /mnt on schurks. running 'cd /mnt ; du >/dev/null' two times (first after fresh boot of testido second when all served data is in memory of testido): joerg @ schurks> cd /mnt joerg @ schurks> time du >/dev/null 86.09s real 0.14s user 1.91s system joerg @ schurks> time du >/dev/null 205.10s real 0.20s user 1.92s system joerg @ schurks> Screenfull output of top on testido AFTER both tests (testido stopped responding to screen output sometimes, especially during the second test): last pid: 329; load averages: 4.14, 2.77, 1.25up 0+00:07:30 18:44:47 29 processes: 1 running, 28 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 8420K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free Swap: 4000M Total, 4000M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 201 root1 40 1232K 792K -4:42 116.31% nfsd 329 joerg 1 960 2404K 1676K RUN 0:00 0.00% top 168 root1 1150 2456K 1760K select 0:00 0.00% sshd 313 root1 960 1428K 1168K select 0:00 0.00% rlogind 194 root1 1150 1556K 1256K select 0:00 0.00% mountd 299 root1 80 1720K 1436K wait 0:00 0.00% login 314 root1 80 1748K 1460K wait 0:00 0.00% login 298 root1 960 1304K 1048K select 0:00 0.00% rlogind 199 root1 40 1356K 1040K accept 0:00 0.00% nfsd 256 root1 960 2892K 1760K select 0:00 0.00% ntpd 315 joerg 1 200 1448K 1020K pause0:00 0.00% ksh 300 root1 50 1448K 996K ttyin0:00 0.00% ksh 158 root1 960 1332K 940K select 0:00 0.00% syslogd 163 root1 960 1448K 1128K select 0:00 0.00% inetd 176 root1 960 1408K 1044K select 0:00 0.00% rpcbind 185 root1 960 1476K 1148K select 0:00 0.00% ypbind 261 root1 1150 1304K 952K select 0:00 0.00% lpd Setup 2: NFS server 'testido' FreeBSD 6.1-STABLE as of 15. May 2006 with sys/kern/vfs_lookup.c 1.80.2.6, NFS schurks FreeBSD 6.1-STABLE as of 15. May 2006. Same tests as before: joerg @ schurks> time du >/dev/null 22.63s real 0.15s user 1.82s system joerg @ schurks> time du >/dev/null 16.52s real 0.17s user 1.68s system joerg @ schurks> Screenfull output of top on testido AFTER both tests (testido responded fine during both tests): last pid: 329; load averages: 0.49, 0.26, 0.10up 0+00:01:50 18:35:30 29 processes: 1 running, 28 sleeping CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle Mem: 8424K Active, 28M Inact, 72M Wired, 110M Buf, 880M Free Swap: 4000M Total, 4000M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 201 root1 40 1232K 792K -0:03 3.76% nfsd 168 root1 1150 2456K 1760K select 0:00 0.00% sshd 329 joerg 1 960 2404K 1676K RUN 0:00 0.00% top 313 root1 960 1428K 1168K select 0:00 0.00% rlogind 194 root1 1150 1556K 1256K select 0:00 0.00% mountd 299 root1 80 1720K 1440K wait 0:00 0.00% login 314 root1 80 1748K 1464K wait 0:00 0.00% login 298 root1 960 1304K 1048K select 0:00 0.00% rlogind 199 root1 40 1356K 1040K accept 0:00 0.00% nfsd 315 joerg 1 200 1448K 1020K pause0:00 0.00% ksh 256 root1 960 2892K 1760K select 0:00 0.00% ntpd 300 root1 50 1448K 996K ttyin0:00 0.00% ksh 158 root1 960 1332K 940K select 0:00 0.00% syslogd 163 root1 960 1448K 1128
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote: Hello Rong-en, Thanks for the info on getting the debugger configured, and on the serial console. I will have to try and play with the serial console thing more, I just tried putting in the flags and the damn thing hung, I had to boot from CD and take the stuff back out. One thing you mention below that concerns me is that you have version 1.90 of the vfs_lookup.c file. I just did a less on /usr/src/sys/kern/vfs_lookup.c and I see the following: FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current stuff before rebuilding the kernel just now, and still I see the same thing. Is something fishy going on here, or did you by chance make a typo?? Sorry for the confusion. rev 1.90 is the number for -HEAD. To back out this MFC'ed change for RELENG_6_1, please cvsup to RELENG_6_1 date=2006.04.30.03.57.00. Then you should see it is 1.80.2.6 2006/03/31 07:39:24 kris To verify the effect of this revision. Please run RELENG_6_1 with 2006.04.30.03.57.00 and 2006.04.30.04.00.00. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Trouble with NFSd under 6.1-Stable, any ideas?
Hello Rong-en, Thanks for the info on getting the debugger configured, and on the serial console. I will have to try and play with the serial console thing more, I just tried putting in the flags and the damn thing hung, I had to boot from CD and take the stuff back out. One thing you mention below that concerns me is that you have version 1.90 of the vfs_lookup.c file. I just did a less on /usr/src/sys/kern/vfs_lookup.c and I see the following: FreeBSD: src/sys/kern/vfs_lookup.c,v 1.80.2.7 2006/04/30 03:57:46 kris Exp I even did a cvsup (I use cvsup2.FreeBSD.org) to make sure I had the current stuff before rebuilding the kernel just now, and still I see the same thing. Is something fishy going on here, or did you by chance make a typo?? --- Howard Leadmon - [EMAIL PROTECTED] http://www.leadmon.net > -Original Message- > From: Rong-en Fan [mailto:[EMAIL PROTECTED] > Sent: Tuesday, May 23, 2006 10:19 AM > To: Howard Leadmon > Cc: Konstantin Belousov; Kris Kennaway; freebsd-stable@freebsd.org > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? > > On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote: > > > > > > > > > If there are any thing I can provide to help tracking > this down. > > > > > Please let me know. By the way, I tried with truss/kdump > > > to see what > > > > > happens when nfsd eats lot of CPUs, but in vain. They do > > > not return anything. > > > > > > > > > I tried your recipe on 7-CURRENT with locally exported fs, > > > remounted > > > > over nfs. I did not get the behaviour your described. > > > > > > As noted in my previous thread, I have another 6.1-RELEASE nfs > > > server, which does not have this problem. > > > > > > > Could you, please, provide the backtrace for the nfsd that eats > > > > the CPU (from the ddb). I think it would be helpful to > get several > > > > backtraces (i.e., bt , cont, bt ...) > > > to see where > > > > it running. > > > > > > I'm afraid that I can not do that. Last time I tried > breaking into > > > ddb (on 5.x), it hangs my serial console and the server is miles > > > away :-( . Perhaps we can ask Howard to do that? > > > > I am more than willing to do that, as this machine runs > here with me, > > so if needed I can easily get on a console, or perform a > reboot. Can > > one of you shed a little light on exactly what I need to > do, and how > > to do this? I ask as I have never used this ddb stuff, so not clue > > one on how to go about getting the information your looking > to find. > > Guess I have been lucky, and just never had an issue that > took things to this level. > > At least you have to add the following to your kernel: > > options KDB > options DDB > > Recompile it, reboot. You would better to setup a serial > console so you can easily copy thing from ddb output. To do > it, you have to put "device sio" in your kernel configuration > and some files > below: > > /boot.config > -Dh > > /boot/loader.conf > comconsole_speed=115200 > machdep.conspeed=115200 > > /etc/ttys > ttyd0 "/usr/libexec/getty std.115200" cons25 on secure > > On the other machine, /etc/remote: > com1:dv=/dev/cuad0:br#115200:pa=none: > > Then, use "tip com1" to attach the nfs server. The above > settings assume your serial console on nfs server is on COM1 > and on the client side is also COM1. If that's not the case, > please follow Handbook for howto setup a serial console other > than COM1. To break into ddb, either use ctrl+alt+esc or send > a BREAK (I think ^b will do) via serial line. After that, you > should see > > db> > > Then you first use "ps" to find out the nfsd pid (better to > remember the pid which eats lots of cpu before enter ddb). > After that, do what Konstantin suggests. I have never tried > "cont" in db. I guess that will return the execution back to > kernel and you need to break into ddb again to do another "bt ". > > By the way, could you verify that backing out vfs_lookup.c > rev 1.90 helps in your situation? If not, maybe we are seeing > different problems, and then I have to figure out how to make > my serial console work here. > > Thanks, > Rong-En Fan > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Howard Leadmon <[EMAIL PROTECTED]> wrote: > > > If there are any thing I can provide to help tracking this down. > > > Please let me know. By the way, I tried with truss/kdump > to see what > > > happens when nfsd eats lot of CPUs, but in vain. They do > not return anything. > > > > > I tried your recipe on 7-CURRENT with locally exported fs, > remounted > > over nfs. I did not get the behaviour your described. > > As noted in my previous thread, I have another 6.1-RELEASE > nfs server, which does not have this problem. > > > Could you, please, provide the backtrace for the nfsd that eats the > > CPU (from the ddb). I think it would be helpful to get several > > backtraces (i.e., bt , cont, bt ...) > to see where > > it running. > > I'm afraid that I can not do that. Last time I tried breaking > into ddb (on 5.x), it hangs my serial console and the server > is miles away :-( . Perhaps we can ask Howard to do that? I am more than willing to do that, as this machine runs here with me, so if needed I can easily get on a console, or perform a reboot. Can one of you shed a little light on exactly what I need to do, and how to do this? I ask as I have never used this ddb stuff, so not clue one on how to go about getting the information your looking to find. Guess I have been lucky, and just never had an issue that took things to this level. At least you have to add the following to your kernel: options KDB options DDB Recompile it, reboot. You would better to setup a serial console so you can easily copy thing from ddb output. To do it, you have to put "device sio" in your kernel configuration and some files below: /boot.config -Dh /boot/loader.conf comconsole_speed=115200 machdep.conspeed=115200 /etc/ttys ttyd0 "/usr/libexec/getty std.115200" cons25 on secure On the other machine, /etc/remote: com1:dv=/dev/cuad0:br#115200:pa=none: Then, use "tip com1" to attach the nfs server. The above settings assume your serial console on nfs server is on COM1 and on the client side is also COM1. If that's not the case, please follow Handbook for howto setup a serial console other than COM1. To break into ddb, either use ctrl+alt+esc or send a BREAK (I think ^b will do) via serial line. After that, you should see db> Then you first use "ps" to find out the nfsd pid (better to remember the pid which eats lots of cpu before enter ddb). After that, do what Konstantin suggests. I have never tried "cont" in db. I guess that will return the execution back to kernel and you need to break into ddb again to do another "bt ". By the way, could you verify that backing out vfs_lookup.c rev 1.90 helps in your situation? If not, maybe we are seeing different problems, and then I have to figure out how to make my serial console work here. Thanks, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/23/06, Konstantin Belousov <[EMAIL PROTECTED]> wrote: On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: > On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: > >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > >> > >>Hello All, > >> > >> I have been running FBSD a long while, and actually running since the > >5.x > >> releases on the server I am having troubles with. I basically have a > >small > >> network and just use NIS/NFS to link my various FBSD and Solaris machines > >> together. > >> > >> This has all been running fine up till a few days ago, when all of a > >sudden > >> NFS came to a crawl, and CPU usage so high the box appears to freeze > >almost. > >> When I had 6.1-RC running all seemed well, then came the announcement > >for the > >> official 6.1 release, so I did the cvs updates, made world, kernel, and > >ran > >> mergemaster to get everything up to the 6.1 stable version. > >> > >> Now after doing this, something is wrong with NFS. It works, it will > >return > >> information and open files, just it's very very slow, and while > >performing a > >> request the CPU spike is astounding. A simple du of my home directory > >can > >> take minutes, and machine all but locks up if the request is done over > >NFS. > >> Here is top snip: > >> > >> PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU > >COMMAND > >> 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd > >> > >> > >> This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM > >on a > >> disk array, and locally is screams, heck NFS used to scream till I > >updated. I > >> am not really sure what info would be useful in debugging, so won't post > >tons > >> of misc junk in this eMail, but if anyone has any ideas as to how best to > >> figure out and resolve this issue it would sure be appreicated... > > > >Use tcpdump and related tools to find out what traffic is being sent. > > > >Also verify that you did not change your system configuration in any > >way: there have been no changes to NFS since the release, so it is > >unclear why an update would cause the problem to suddenly occur. > > > >Kris > > Hi Kris and Howard, > > As I posted few days ago, I have similar problems like Howard's > (some details in the thread "6.1-RELEASE, em0 high interrupt rate > and nfsd eats lots of cpu" on stable@). After binary searching > the source tree, I found that > > RELENG_6_1, 2006.04.30.03.57 ok > RELENG_6_1, 2006.04.30.04.00 bad > > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, > the same problem occurs. > > Let me refresh what problems I'm seeing > > 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on > a nfs directory > 2. on server-side, nfsd starts to eats lots of CPU > 3. the du finishes > 4. on server-side, nfsd still eats lots of CPU, but there is no > nfs traffic. Wait for 5 minutes, you can still see that nfsd is > "running" and eats lots of CPU. > > On FreeBSD 6.1R client, it uses UDP mount and fstab is like > "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and > fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192". > The server's kernel conf is at > > http://www.rafan.org/FreeBSD/nfs/KERNEL > > Some related configuration files: > > /etc/export > /export/dir1 host1 host2... > /export/dir2 host1 host2... > > /etc/rc.conf > nfs_server_enable="YES" > nfs_server_flags="-u -t -n 16" > mountd_enable="YES" > mountd_flags="-r -l -n" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > rpcbind_enable="YES" > > /etc/fstab: > /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 > /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 > > The NFS server is also using amd to mount some backup directories > from another NFS server. the amd.conf is > > [global] > browsable_dirs = yes > map_type = file > mount_type = nfs > auto_dir = /nfs > fully_qualified_hosts = no > log_file = syslog > nfs_proto = udp > nfs_allow_insecure_port = no > nfs_vers = 3 > # plock = yes > selectors_on_default = yes > restart_mounts = yes > > [/backup] > map_options = type:=direct > map_name = /etc/amd.direct > > /etc/amd.direct: > /defaults > opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192 > backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host} > > > If there are any thing I can provide to help tracking this down. Please > let me know. By the way, I tried with truss/kdump to see what happens > when nfsd eats lot of CPUs, but in vain. They do not return anything. > I tried your recipe on 7-CURRENT with locally exported fs, remounted over nfs. I did not get the behaviour your described. As noted in my previous thread, I have another 6.1-RELEASE nfs server, which does not have this problem. Could you, please, provide the backtrace for the nfsd that eats the CPU (from the ddb). I think it would be helpful to get seve
RE: Trouble with NFSd under 6.1-Stable, any ideas?
> > > If there are any thing I can provide to help tracking this down. > > > Please let me know. By the way, I tried with truss/kdump > to see what > > > happens when nfsd eats lot of CPUs, but in vain. They do > not return anything. > > > > > I tried your recipe on 7-CURRENT with locally exported fs, > remounted > > over nfs. I did not get the behaviour your described. > > As noted in my previous thread, I have another 6.1-RELEASE > nfs server, which does not have this problem. > > > Could you, please, provide the backtrace for the nfsd that eats the > > CPU (from the ddb). I think it would be helpful to get several > > backtraces (i.e., bt , cont, bt ...) > to see where > > it running. > > I'm afraid that I can not do that. Last time I tried breaking > into ddb (on 5.x), it hangs my serial console and the server > is miles away :-( . Perhaps we can ask Howard to do that? I am more than willing to do that, as this machine runs here with me, so if needed I can easily get on a console, or perform a reboot. Can one of you shed a little light on exactly what I need to do, and how to do this? I ask as I have never used this ddb stuff, so not clue one on how to go about getting the information your looking to find. Guess I have been lucky, and just never had an issue that took things to this level. > > Also, just in case, does filesystem that is exported and shows > > problem, have quotas enabled ? One line of your fstab has > userquotas, other does not. As to userquotas, I just tried accessing the NFS mounts here, as some have filesystems with quotas, and some don't, and both are exibiting the exact same problem. So using quotas is for sure not the problem, or should I say not the trigger to the problem. > Regards, > Rong-En Fan --- Howard Leadmon http://www.leadmon.net ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: > On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: > >On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > >> > >>Hello All, > >> > >> I have been running FBSD a long while, and actually running since the > >5.x > >> releases on the server I am having troubles with. I basically have a > >small > >> network and just use NIS/NFS to link my various FBSD and Solaris machines > >> together. > >> > >> This has all been running fine up till a few days ago, when all of a > >sudden > >> NFS came to a crawl, and CPU usage so high the box appears to freeze > >almost. > >> When I had 6.1-RC running all seemed well, then came the announcement > >for the > >> official 6.1 release, so I did the cvs updates, made world, kernel, and > >ran > >> mergemaster to get everything up to the 6.1 stable version. > >> > >> Now after doing this, something is wrong with NFS. It works, it will > >return > >> information and open files, just it's very very slow, and while > >performing a > >> request the CPU spike is astounding. A simple du of my home directory > >can > >> take minutes, and machine all but locks up if the request is done over > >NFS. > >> Here is top snip: > >> > >> PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU > >COMMAND > >> 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd > >> > >> > >> This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM > >on a > >> disk array, and locally is screams, heck NFS used to scream till I > >updated. I > >> am not really sure what info would be useful in debugging, so won't post > >tons > >> of misc junk in this eMail, but if anyone has any ideas as to how best to > >> figure out and resolve this issue it would sure be appreicated... > > > >Use tcpdump and related tools to find out what traffic is being sent. > > > >Also verify that you did not change your system configuration in any > >way: there have been no changes to NFS since the release, so it is > >unclear why an update would cause the problem to suddenly occur. > > > >Kris > > Hi Kris and Howard, > > As I posted few days ago, I have similar problems like Howard's > (some details in the thread "6.1-RELEASE, em0 high interrupt rate > and nfsd eats lots of cpu" on stable@). After binary searching > the source tree, I found that > > RELENG_6_1, 2006.04.30.03.57 ok > RELENG_6_1, 2006.04.30.04.00 bad > > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, > the same problem occurs. > > Let me refresh what problems I'm seeing > > 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on > a nfs directory > 2. on server-side, nfsd starts to eats lots of CPU > 3. the du finishes > 4. on server-side, nfsd still eats lots of CPU, but there is no > nfs traffic. Wait for 5 minutes, you can still see that nfsd is > "running" and eats lots of CPU. > > On FreeBSD 6.1R client, it uses UDP mount and fstab is like > "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and > fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192". > The server's kernel conf is at > > http://www.rafan.org/FreeBSD/nfs/KERNEL > > Some related configuration files: > > /etc/export > /export/dir1 host1 host2... > /export/dir2 host1 host2... > > /etc/rc.conf > nfs_server_enable="YES" > nfs_server_flags="-u -t -n 16" > mountd_enable="YES" > mountd_flags="-r -l -n" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > rpcbind_enable="YES" > > /etc/fstab: > /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 > /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 > > The NFS server is also using amd to mount some backup directories > from another NFS server. the amd.conf is > > [global] > browsable_dirs = yes > map_type = file > mount_type = nfs > auto_dir = /nfs > fully_qualified_hosts = no > log_file = syslog > nfs_proto = udp > nfs_allow_insecure_port = no > nfs_vers = 3 > # plock = yes > selectors_on_default = yes > restart_mounts = yes > > [/backup] > map_options = type:=direct > map_name = /etc/amd.direct > > /etc/amd.direct: > /defaults > opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192 > backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host} > > > If there are any thing I can provide to help tracking this down. Please > let me know. By the way, I tried with truss/kdump to see what happens > when nfsd eats lot of CPUs, but in vain. They do not return anything. > I tried your recipe on 7-CURRENT with locally exported fs, remounted over nfs. I did not get the behaviour your described. Could you, please, provide the backtrace for the nfsd that eats the CPU (from the ddb). I think it would be helpful to get several backtraces (i.e., bt , cont, bt ...) to see where it running. Also, just in case, does filesystem that is exported and shows probl
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Mon, May 22, 2006 at 05:43:32PM -0400, Rong-en Fan wrote: > As I posted few days ago, I have similar problems like Howard's > (some details in the thread "6.1-RELEASE, em0 high interrupt rate > and nfsd eats lots of cpu" on stable@). After binary searching > the source tree, I found that > > RELENG_6_1, 2006.04.30.03.57 ok > RELENG_6_1, 2006.04.30.04.00 bad > > The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. > With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, > the same problem occurs. Thanks for tracking this down, I'll see what Jeff has to say. Kris pgpxP6IYZlVir.pgp Description: PGP signature
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On 5/14/06, Kris Kennaway <[EMAIL PROTECTED]> wrote: On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > >Hello All, > > I have been running FBSD a long while, and actually running since the 5.x > releases on the server I am having troubles with. I basically have a small > network and just use NIS/NFS to link my various FBSD and Solaris machines > together. > > This has all been running fine up till a few days ago, when all of a sudden > NFS came to a crawl, and CPU usage so high the box appears to freeze almost. > When I had 6.1-RC running all seemed well, then came the announcement for the > official 6.1 release, so I did the cvs updates, made world, kernel, and ran > mergemaster to get everything up to the 6.1 stable version. > > Now after doing this, something is wrong with NFS. It works, it will return > information and open files, just it's very very slow, and while performing a > request the CPU spike is astounding. A simple du of my home directory can > take minutes, and machine all but locks up if the request is done over NFS. > Here is top snip: > > PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND > 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd > > > This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a > disk array, and locally is screams, heck NFS used to scream till I updated. I > am not really sure what info would be useful in debugging, so won't post tons > of misc junk in this eMail, but if anyone has any ideas as to how best to > figure out and resolve this issue it would sure be appreicated... Use tcpdump and related tools to find out what traffic is being sent. Also verify that you did not change your system configuration in any way: there have been no changes to NFS since the release, so it is unclear why an update would cause the problem to suddenly occur. Kris Hi Kris and Howard, As I posted few days ago, I have similar problems like Howard's (some details in the thread "6.1-RELEASE, em0 high interrupt rate and nfsd eats lots of cpu" on stable@). After binary searching the source tree, I found that RELENG_6_1, 2006.04.30.03.57 ok RELENG_6_1, 2006.04.30.04.00 bad The only commit is kern/vfs_lookup.c, an MFC of rev 1.90 and 1.91. With 04.30 03.57's source + manaully patched vfs_lookup.c rev 1.90, the same problem occurs. Let me refresh what problems I'm seeing 1. a client (no matter Linux 2.6.16 or FreeBSD 6.1) runs du on a nfs directory 2. on server-side, nfsd starts to eats lots of CPU 3. the du finishes 4. on server-side, nfsd still eats lots of CPU, but there is no nfs traffic. Wait for 5 minutes, you can still see that nfsd is "running" and eats lots of CPU. On FreeBSD 6.1R client, it uses UDP mount and fstab is like "rw,-L,nosuid,bg,nodev". On Linux cleint, it uses UDP mount and fstab is like "defaults,udp,hard,intr,nfsvers=3,rsize=8192,wsize=8192". The server's kernel conf is at http://www.rafan.org/FreeBSD/nfs/KERNEL Some related configuration files: /etc/export /export/dir1 host1 host2... /export/dir2 host1 host2... /etc/rc.conf nfs_server_enable="YES" nfs_server_flags="-u -t -n 16" mountd_enable="YES" mountd_flags="-r -l -n" rpc_lockd_enable="YES" rpc_statd_enable="YES" rpcbind_enable="YES" /etc/fstab: /dev/... /export/dir1 ufs rw,nosuid,noexec 2 2 /dev/... /export/dir2 ufs rw,nosuid,noexec,userquota 2 2 The NFS server is also using amd to mount some backup directories from another NFS server. the amd.conf is [global] browsable_dirs = yes map_type = file mount_type = nfs auto_dir = /nfs fully_qualified_hosts = no log_file = syslog nfs_proto = udp nfs_allow_insecure_port = no nfs_vers = 3 # plock = yes selectors_on_default = yes restart_mounts = yes [/backup] map_options = type:=direct map_name = /etc/amd.direct /etc/amd.direct: /defaults opts:=rw,grpid,resvport,vers=3,proto=udp,nosuid,nodev,rsize=8192,wsize=8192 backup type:=nfs;rhost:=nfs2;rfs:=/nfs2/${host} If there are any thing I can provide to help tracking this down. Please let me know. By the way, I tried with truss/kdump to see what happens when nfsd eats lot of CPUs, but in vain. They do not return anything. Regards, Rong-En Fan ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Trouble with NFSd under 6.1-Stable, any ideas?
Sorry for delay, ended up sick.. :( You say use tcpdump, is there something I should be looking out for? As NFS is serving files, even more strange is if I kill off the nfsd process it's zippy fast for a moment and then the CPU load goes through the roof, and it starts serving files slowly. So it's actually working, outside of it consumes all available CPU and brings the machine to it knees quickly. Doesn't matter if I access it from my Solaris box, my other FBSD boxes, and so on, it still dogs down terribly and never used. Anything anyone can think of config wise that might cause that, it would be nice to know. I have the following that I can think of that affects NFS configs: # # NFS # nfs_client_enable="NO" # This host is an NFS client (or NO). nfs_access_cache="2"# Client cache timeout in seconds nfs_server_enable="YES" # This host is an NFS server (or NO). nfs_server_flags="-u -t -n 5" # Flags to nfsd (if enabled). mountd_enable="YES" # Run mountd (or NO). mountd_flags="-r" # Flags to mountd (if NFS server enabled). weak_mountd_authentication="NO" # Allow non-root mount requests to be served. nfs_reserved_port_only="YES"# Provide NFS only on secure port (or NO). nfs_bufpackets="" # bufspace (in packets) for client rpc_lockd_enable="YES" # Run NFS rpc.lockd needed for client/server. rpc_statd_enable="YES" # Run NFS rpc.statd needed for client/server. rpcbind_enable="YES"# Run the portmapper service (YES/NO). rpcbind_program="/usr/sbin/rpcbind" # path to rpcbind, if you want a differe rpcbind_flags=""# Flags to rpcbind (if enabled). I can't think of anything that should have changed, unless mergemaster updating the default files might have changed something that would have an effect. --- Howard Leadmon http://www.leadmon.net > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Kris Kennaway > Sent: Sunday, May 14, 2006 10:50 PM > To: Howard Leadmon > Cc: freebsd-stable@freebsd.org > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? > > On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > > > >Hello All, > > > > I have been running FBSD a long while, and actually > running since the 5.x > > releases on the server I am having troubles with. I > basically have a small > > network and just use NIS/NFS to link my various FBSD and Solaris > > machines together. > > > > This has all been running fine up till a few days ago, > when all of a > > sudden NFS came to a crawl, and CPU usage so high the box > appears to freeze almost. > > When I had 6.1-RC running all seemed well, then came the > announcement > > for the official 6.1 release, so I did the cvs updates, made world, > > kernel, and ran mergemaster to get everything up to the 6.1 > stable version. > > > > Now after doing this, something is wrong with NFS. It > works, it will return > > information and open files, just it's very very slow, and while > > performing a request the CPU spike is astounding. A simple > du of my > > home directory can take minutes, and machine all but locks > up if the request is done over NFS. > > Here is top snip: > > > > PID USERNAME THR PRI NICE SIZERES STATE C TIME > WCPU COMMAND > > 497 root 1 40 1252K 780K - 2 50:42 > 188.48% nfsd > > > > > > This is a nice IBM eServer with dual P4-XEON's and a > couple GB or RAM > > on a disk array, and locally is screams, heck NFS used to > scream till > > I updated. I am not really sure what info would be useful in > > debugging, so won't post tons of misc junk in this eMail, but if > > anyone has any ideas as to how best to figure out and > resolve this issue it would sure be appreicated... > > Use tcpdump and related tools to find out what traffic is being sent. > > Also verify that you did not change your system configuration in any > way: there have been no changes to NFS since the release, so > it is unclear why an update would cause the problem to suddenly occur. > > Kris > > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
Howard Leadmon wrote: Would this just be lockd, or should I disable both lockd and statd? I notice in the rc.conf it claims they are both supposed to be enabled, so not sure what issues I run into if I disable them, if any. No need to disable rpc.statd though I don't know if any other programs request monitoring. The issues you'll run into is simply a lack of any locks on the mounted drive. This can easily lead to file corruption if multiple programs or multiple instances of a single program change the same file at the same time. Many programs will use NFS-safe lockfiles if configure to do so, which is often a useable workaround in a lockd-free world. If you're only using the files read-only, or only a single process uses files on the mount at a time (on *all* systems) then there's no issue at all. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: Trouble with NFSd under 6.1-Stable, any ideas?
Would this just be lockd, or should I disable both lockd and statd? I notice in the rc.conf it claims they are both supposed to be enabled, so not sure what issues I run into if I disable them, if any. As to my servers, I have a bunch of stuff running actually. The Dual XEON server being my fastest machine by far is my main server, off of that I have a SPARC running Solaris10, I have another SPARC running FreeBSD 6.1, I have a DEC Alpha running FreeBSD 4.11, and another old Dual XEON machine running FreeBSD 4.11, and finally another x86 machine running FreeBSD 7 current. This stuff has been doing great, I personally love FBSD, as you can tell from what is loaded on most of the servers. It just blows my mind that now after years of running the network like this, now something breaks. I get no errors in syslog, and it IS serving requests, as I have some web pages on the old 4.11 machine and they come up, just slowly compared to what they used to do. Oh well as they say, in the world of computers it's never easy.. LOL --- Howard Leadmon - [EMAIL PROTECTED] http://www.leadmon.net > -Original Message- > From: Stephen Hurd [mailto:[EMAIL PROTECTED] > Sent: Sunday, May 14, 2006 5:54 PM > To: Howard Leadmon > Cc: freebsd-stable@freebsd.org > Subject: Re: Trouble with NFSd under 6.1-Stable, any ideas? > > Howard Leadmon wrote: > >Hello All, > > > > I have been running FBSD a long while, and actually > running since the 5.x > > releases on the server I am having troubles with. I > basically have a small > > network and just use NIS/NFS to link my various FBSD and Solaris > > machines together. > > > > This has all been running fine up till a few days ago, > when all of a > > sudden NFS came to a crawl, and CPU usage so high the box > appears to freeze almost. > > When I had 6.1-RC running all seemed well, then came the > announcement > > for the official 6.1 release, so I did the cvs updates, made world, > > kernel, and ran mergemaster to get everything up to the 6.1 > stable version. > > > > Now after doing this, something is wrong with NFS. It > works, it will return > > information and open files, just it's very very slow, and while > > performing a request the CPU spike is astounding. A simple > du of my > > home directory can take minutes, and machine all but locks > up if the request is done over NFS. > > Here is top snip: > > > > PID USERNAME THR PRI NICE SIZERES STATE C TIME > WCPU COMMAND > > 497 root 1 40 1252K 780K - 2 50:42 > 188.48% nfsd > > > > > > This is a nice IBM eServer with dual P4-XEON's and a > couple GB or RAM > > on a disk array, and locally is screams, heck NFS used to > scream till > > I updated. I am not really sure what info would be useful in > > debugging, so won't post tons of misc junk in this eMail, but if > > anyone has any ideas as to how best to figure out and > resolve this issue it would sure be appreicated... > > > Are you running rpc.lockd? I've had very bad luck with it > since sometime in the 5.x series... especially with it > interoperating with Solaris. I submitted a PR on it, but > it's apparently broken in about X ways. If possible, I would > suggest living without rpc.lockd for now (if you're currently > living with it that is) > > Other than that issue, NFS itself has been working nicely for me. > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
On Sun, May 14, 2006 at 02:28:55PM -0400, Howard Leadmon wrote: > >Hello All, > > I have been running FBSD a long while, and actually running since the 5.x > releases on the server I am having troubles with. I basically have a small > network and just use NIS/NFS to link my various FBSD and Solaris machines > together. > > This has all been running fine up till a few days ago, when all of a sudden > NFS came to a crawl, and CPU usage so high the box appears to freeze almost. > When I had 6.1-RC running all seemed well, then came the announcement for the > official 6.1 release, so I did the cvs updates, made world, kernel, and ran > mergemaster to get everything up to the 6.1 stable version. > > Now after doing this, something is wrong with NFS. It works, it will return > information and open files, just it's very very slow, and while performing a > request the CPU spike is astounding. A simple du of my home directory can > take minutes, and machine all but locks up if the request is done over NFS. > Here is top snip: > > PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND > 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd > > > This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a > disk array, and locally is screams, heck NFS used to scream till I updated. I > am not really sure what info would be useful in debugging, so won't post tons > of misc junk in this eMail, but if anyone has any ideas as to how best to > figure out and resolve this issue it would sure be appreicated... Use tcpdump and related tools to find out what traffic is being sent. Also verify that you did not change your system configuration in any way: there have been no changes to NFS since the release, so it is unclear why an update would cause the problem to suddenly occur. Kris pgpiLgbpawelN.pgp Description: PGP signature
Re: Trouble with NFSd under 6.1-Stable, any ideas?
> Are you running rpc.lockd? I've had very bad luck with it since > sometime in the 5.x series... especially with it interoperating with > Solaris. I submitted a PR on it, but it's apparently broken in about X > ways. If possible, I would suggest living without rpc.lockd for now (if > you're currently living with it that is) On the contrary NFS problems interoperating with Linux have been cleared for me since upgrading Linux to Fedora Core 5 and FreeBSD to 6.1. In particular rpc.lockd works, everything is OK, performance is fine. I had very bad problems in the past, when we were running Fedora Core 3. -- Michel TALON ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: Trouble with NFSd under 6.1-Stable, any ideas?
Howard Leadmon wrote: Hello All, I have been running FBSD a long while, and actually running since the 5.x releases on the server I am having troubles with. I basically have a small network and just use NIS/NFS to link my various FBSD and Solaris machines together. This has all been running fine up till a few days ago, when all of a sudden NFS came to a crawl, and CPU usage so high the box appears to freeze almost. When I had 6.1-RC running all seemed well, then came the announcement for the official 6.1 release, so I did the cvs updates, made world, kernel, and ran mergemaster to get everything up to the 6.1 stable version. Now after doing this, something is wrong with NFS. It works, it will return information and open files, just it's very very slow, and while performing a request the CPU spike is astounding. A simple du of my home directory can take minutes, and machine all but locks up if the request is done over NFS. Here is top snip: PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 497 root 1 40 1252K 780K - 2 50:42 188.48% nfsd This is a nice IBM eServer with dual P4-XEON's and a couple GB or RAM on a disk array, and locally is screams, heck NFS used to scream till I updated. I am not really sure what info would be useful in debugging, so won't post tons of misc junk in this eMail, but if anyone has any ideas as to how best to figure out and resolve this issue it would sure be appreicated... Are you running rpc.lockd? I've had very bad luck with it since sometime in the 5.x series... especially with it interoperating with Solaris. I submitted a PR on it, but it's apparently broken in about X ways. If possible, I would suggest living without rpc.lockd for now (if you're currently living with it that is) Other than that issue, NFS itself has been working nicely for me. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"