Re: rpc.lockd stalls
On Sep 7, 2006, at 10:34 AM, Tom Ierna wrote: For the purposes of ease of software and hardware management, I'm attempting to run a set of PXE-booted Client machines as web/db or mail servers. It is perhaps reasonable to run a diskless webserver, especially if it is serving mainly dynamically generated content. Trying to run a database server or mail server without a disk strikes me as a very bad idea. I am surprised that rpc.lockd is holding up well enough to only go down about once a month; simply running the locking tests which come with sendmail used to be enough to cause rpc.lockd to crash... Best of luck, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd stalls
On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote: Hello, list. For the purposes of ease of software and hardware management, I'm attempting to run a set of PXE-booted Client machines as web/db or mail servers. The NFS/DHCP/YP servers are running on a 5.4-STABLE Server. I mostly followed the PXE guide when building these systems. All of the disk (except for swap) sits on the master Server (which has a bunch of external drive sleds), and all of the Client machines boot via Gig-E. Client machines are running 5.4-STABLE as well, but it is not compiled with the same kernel configuration as the master Server, as the hardware is slightly different. Client machines share userland with the Server. At the moment I have one Client machine running about 40 domains of web and db, with reasonably low traffic (less than 3Mbit/sec total) and one Client machine booted from the master Server, but not doing anything. Resource utilization on the master Server seems pretty low. Sporadically, there appear to be stalls on some locks with rpc.lockd. rpc.lockd is unreliable in all versions of FreeBSD (although it may be worse in 5.x), see the mailing list archives for extensive discussion of this. Try turning it off and using mount_nfs -L instead to fake the lock traffic (See the manpage). Kris pgpYJqMO1v6e5.pgp Description: PGP signature
Re: rpc.lockd stalls
On Sep 7, 2006, at 1:40 PM, Kris Kennaway wrote: On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote: Sporadically, there appear to be stalls on some locks with rpc.lockd. rpc.lockd is unreliable in all versions of FreeBSD (although it may be worse in 5.x), see the mailing list archives for extensive discussion of this. Try turning it off and using mount_nfs -L instead to fake the lock traffic (See the manpage). Kris, Is there a way to note -L via fstab? Since these machines are PXE booted, unmounting and re-mounting with -L will be problematic, and I'd like them to inherit this property at reboot. Thanks, -Tom -- Tom Ierna President Shockergroup, Inc. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd stalls
On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote: On Sep 7, 2006, at 1:40 PM, Kris Kennaway wrote: On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote: Sporadically, there appear to be stalls on some locks with rpc.lockd. rpc.lockd is unreliable in all versions of FreeBSD (although it may be worse in 5.x), see the mailing list archives for extensive discussion of this. Try turning it off and using mount_nfs -L instead to fake the lock traffic (See the manpage). Kris, Is there a way to note -L via fstab? Since these machines are PXE booted, unmounting and re-mounting with -L will be problematic, and I'd like them to inherit this property at reboot. Yes, use the -o format, see the manpage. Kris pgp9m8FlnRVJi.pgp Description: PGP signature
rpc.lockd stalls
Hello, list. For the purposes of ease of software and hardware management, I'm attempting to run a set of PXE-booted Client machines as web/db or mail servers. The NFS/DHCP/YP servers are running on a 5.4-STABLE Server. I mostly followed the PXE guide when building these systems. All of the disk (except for swap) sits on the master Server (which has a bunch of external drive sleds), and all of the Client machines boot via Gig-E. Client machines are running 5.4-STABLE as well, but it is not compiled with the same kernel configuration as the master Server, as the hardware is slightly different. Client machines share userland with the Server. At the moment I have one Client machine running about 40 domains of web and db, with reasonably low traffic (less than 3Mbit/sec total) and one Client machine booted from the master Server, but not doing anything. Resource utilization on the master Server seems pretty low. Sporadically, there appear to be stalls on some locks with rpc.lockd. These lock stalls exhibit interesting behavior on the Client machines: Slots will fill up on Apache in the W state. SSH login attempts to the client machine (passwd files get some user data via YP) will hang and timeout. when I find a file (via Apache's extended status) which appears to be one of the stalled locks, and I attempt to do anything with the file via a shell on the client machine, such as cat it, that shell will become unresponsive. Any process which is stalled on one of these files cannot be killled. On the server, the only symptom I've witnessed is that rpc.lockd starts using a bit more proc than it usually does. Normal utilization is 0.0, and when the problem is happening, proc might go up to 3.0 or so. cating a file on the Server which appears stalled on the Client, works fine. A stop and start of nfslocking on the server seems to clear things up. Apache on the client will recover on its own, I'm guessing after each stalled lock reaches a timeout. I usually gracefully restart Apache, which forces the recovery to happen faster. As far as timing, it doesn't appear to be consistently periodic. It doesn't appear to be load related - I suffered through a Digg of one of the sites, and while the client machine served more bandwidth that couple of days than it had in a month, this particular problem did not occur. Over the past three months or so, this issue has probably cropped up three or four times. What can I do to troubleshoot this? I would like to add more client machines, but I can't until this problem is resolved. Changing OS builds at this point, unless absolutely necessary, is not something I want to do. Thanks for any insight! -- Tom Ierna President Shockergroup, Inc. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd stalls
On Sep 7, 2006, at 1:44 PM, Chuck Swiger wrote: Trying to run a database server or mail server without a disk strikes me as a very bad idea. This is unfortunate - the client machines I have chosen have no front-panel disk sleds. Hardware administration will be a bear if they each have to have their own disks. Software-wise, I was hoping to have them all share a common Kernel and userland too, so I only have to update software in one place. I am surprised that rpc.lockd is holding up well enough to only go down about once a month; simply running the locking tests which come with sendmail used to be enough to cause rpc.lockd to crash... I will be using qmail, when I get to that stage. qmail is supposed to be rather safe, even over NFS. Best of luck, -- -Chuck Thanks, it sounds like you think I need it :) I'm open to suggestions on a better method of accomplishing my goals. Best, -Tom -- Tom Ierna President Shockergroup, Inc. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd stalls
On Sep 7, 2006, at 11:16 AM, Tom Ierna wrote: On Sep 7, 2006, at 1:44 PM, Chuck Swiger wrote: Trying to run a database server or mail server without a disk strikes me as a very bad idea. This is unfortunate - the client machines I have chosen have no front-panel disk sleds. Hardware administration will be a bear if they each have to have their own disks. Software-wise, I was hoping to have them all share a common Kernel and userland too, so I only have to update software in one place. I can see your reasoning, however, it's not especially difficult to keep many FreeBSD systems updated against a single machine configured to build out new versions of the kernel, userland, and installed ports when needed. [1] The thing is, software like mail servers and the database are usually I/O bound, not CPU-bound; when you get under enough load to matter, usually what you need to do is add more disk spindles and spread DB tables or logfiles or mailspool/queuedir locations amongst the extra disks. I am surprised that rpc.lockd is holding up well enough to only go down about once a month; simply running the locking tests which come with sendmail used to be enough to cause rpc.lockd to crash... I will be using qmail, when I get to that stage. qmail is supposed to be rather safe, even over NFS. Yes, agreed-- qmail + maildir rather than mbox format is probably your best bet for doing operations over NFS. Best of luck, -- -Chuck Thanks, it sounds like you think I need it :) Well, yes. But I wouldn't be unhappy if you found something that works for your needs, even if it isn't what I would recommend myself. At least some of the time, I even learn things from people who configure things strangely from my perspective... I'm open to suggestions on a better method of accomplishing my goals. [1]: Mount /usr/src /usr/obj from the buildserver on each machine, do the update process, and then rsync over or mount /usr/ports/ packages, and use portupgrade or whatever to update or install from the precompiled packages. -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: rpc.lockd stalls
On Thu, Sep 07, 2006 at 03:19:51PM -0400, Tom Ierna wrote: On Sep 7, 2006, at 2:39 PM, Kris Kennaway wrote: On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote: Is there a way to note -L via fstab? Since these machines are PXE booted, unmounting and re-mounting with -L will be problematic, and I'd like them to inherit this property at reboot. Yes, use the -o format, see the manpage. Under the man page for mount_nfs, I have the following: -o Options are specified with a -o flag followed by a comma sepa- rated string of options. See the mount(8) man page for possible options and their meanings. The following NFS specific options are also available: ... Historic -o Options ... lockd Same as not specifying -L. ... It doesn't have any other reference to -L. Are mounts specified in fstab automatically non-locking, or is the man page incorrect? Prefixing with 'no' negates an option. Kris pgpfmX1pSMpew.pgp Description: PGP signature
Re: rpc.lockd stalls
On Sep 7, 2006, at 2:39 PM, Kris Kennaway wrote: On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote: Is there a way to note -L via fstab? Since these machines are PXE booted, unmounting and re-mounting with -L will be problematic, and I'd like them to inherit this property at reboot. Yes, use the -o format, see the manpage. Under the man page for mount_nfs, I have the following: -o Options are specified with a -o flag followed by a comma sepa- rated string of options. See the mount(8) man page for possible options and their meanings. The following NFS specific options are also available: ... Historic -o Options ... lockd Same as not specifying -L. ... It doesn't have any other reference to -L. Are mounts specified in fstab automatically non-locking, or is the man page incorrect? Thanks, -Tom -- Tom Ierna President Shockergroup, Inc. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]