Re: rpc.lockd stalls

2006-09-07 Thread Chuck Swiger

On Sep 7, 2006, at 10:34 AM, Tom Ierna wrote:
For the purposes of ease of software and hardware management, I'm  
attempting to run a set of PXE-booted Client machines as web/db or  
mail servers.


It is perhaps reasonable to run a diskless webserver, especially if  
it is serving mainly dynamically generated content.


Trying to run a database server or mail server without a disk strikes  
me as a very bad idea.  I am surprised that rpc.lockd is holding up  
well enough to only go down about once a month; simply running the  
locking tests which come with sendmail used to be enough to cause  
rpc.lockd to crash...


Best of luck,
--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: rpc.lockd stalls

2006-09-07 Thread Kris Kennaway
On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote:
 Hello, list.
 
 For the purposes of ease of software and hardware management, I'm  
 attempting to run a set of PXE-booted Client machines as web/db or  
 mail servers.
 
 The NFS/DHCP/YP servers are running on a 5.4-STABLE Server. I mostly  
 followed the PXE guide when building these systems.
 
 All of the disk (except for swap) sits on the master Server (which  
 has a bunch of external drive sleds), and all of the Client machines  
 boot via Gig-E.
 
 Client machines are running 5.4-STABLE as well, but it is not  
 compiled with the same kernel configuration as the master Server, as  
 the hardware is slightly different. Client machines share userland  
 with the Server.
 
 At the moment I have one Client machine running about 40 domains of  
 web and db, with reasonably low traffic (less than 3Mbit/sec total)  
 and one Client machine booted from the master Server, but not doing  
 anything.
 
 Resource utilization on the master Server seems pretty low.
 
 Sporadically, there appear to be stalls on some locks with rpc.lockd.  

rpc.lockd is unreliable in all versions of FreeBSD (although it may be
worse in 5.x), see the mailing list archives for extensive discussion
of this.  Try turning it off and using mount_nfs -L instead to fake
the lock traffic (See the manpage).

Kris


pgpYJqMO1v6e5.pgp
Description: PGP signature


Re: rpc.lockd stalls

2006-09-07 Thread Tom Ierna


On Sep 7, 2006, at 1:40 PM, Kris Kennaway wrote:


On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote:


Sporadically, there appear to be stalls on some locks with rpc.lockd.


rpc.lockd is unreliable in all versions of FreeBSD (although it may be
worse in 5.x), see the mailing list archives for extensive discussion
of this.  Try turning it off and using mount_nfs -L instead to fake
the lock traffic (See the manpage).


Kris,

Is there a way to note -L via fstab? Since these machines are PXE  
booted, unmounting and re-mounting with -L will be problematic, and  
I'd like them to inherit this property at reboot.


Thanks,
-Tom

--
Tom Ierna
President
Shockergroup, Inc.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: rpc.lockd stalls

2006-09-07 Thread Kris Kennaway
On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote:
 
 On Sep 7, 2006, at 1:40 PM, Kris Kennaway wrote:
 
 On Thu, Sep 07, 2006 at 01:34:08PM -0400, Tom Ierna wrote:
 
 Sporadically, there appear to be stalls on some locks with rpc.lockd.
 
 rpc.lockd is unreliable in all versions of FreeBSD (although it may be
 worse in 5.x), see the mailing list archives for extensive discussion
 of this.  Try turning it off and using mount_nfs -L instead to fake
 the lock traffic (See the manpage).
 
 Kris,
 
 Is there a way to note -L via fstab? Since these machines are PXE  
 booted, unmounting and re-mounting with -L will be problematic, and  
 I'd like them to inherit this property at reboot.

Yes, use the -o format, see the manpage.

Kris


pgp9m8FlnRVJi.pgp
Description: PGP signature


rpc.lockd stalls

2006-09-07 Thread Tom Ierna

Hello, list.

For the purposes of ease of software and hardware management, I'm  
attempting to run a set of PXE-booted Client machines as web/db or  
mail servers.


The NFS/DHCP/YP servers are running on a 5.4-STABLE Server. I mostly  
followed the PXE guide when building these systems.


All of the disk (except for swap) sits on the master Server (which  
has a bunch of external drive sleds), and all of the Client machines  
boot via Gig-E.


Client machines are running 5.4-STABLE as well, but it is not  
compiled with the same kernel configuration as the master Server, as  
the hardware is slightly different. Client machines share userland  
with the Server.


At the moment I have one Client machine running about 40 domains of  
web and db, with reasonably low traffic (less than 3Mbit/sec total)  
and one Client machine booted from the master Server, but not doing  
anything.


Resource utilization on the master Server seems pretty low.

Sporadically, there appear to be stalls on some locks with rpc.lockd.  
These lock stalls exhibit interesting behavior on the Client  
machines: Slots will fill up on Apache in the W state. SSH login  
attempts to the client machine (passwd files get some user data via  
YP) will hang and timeout. when I find a file (via Apache's extended  
status) which appears to be one of the stalled locks, and I attempt  
to do anything with the file via a shell on the client machine, such  
as cat it, that shell will become unresponsive. Any process which  
is stalled on one of these files cannot be killled.


On the server, the only symptom I've witnessed is that rpc.lockd  
starts using a bit more proc than it usually does. Normal utilization  
is 0.0, and when the problem is happening, proc might go up to 3.0 or  
so. cating a file on the Server which appears stalled on the  
Client, works fine.


A stop and start of nfslocking on the server seems to clear things  
up. Apache on the client will recover on its own, I'm guessing after  
each stalled lock reaches a timeout. I usually gracefully restart  
Apache, which forces the recovery to happen faster.


As far as timing, it doesn't appear to be consistently periodic. It  
doesn't appear to be load related - I suffered through a Digg of one  
of the sites, and while the client machine served more bandwidth that  
couple of days than it had in a month, this particular problem did  
not occur.


Over the past three months or so, this issue has probably cropped up  
three or four times.


What can I do to troubleshoot this? I would like to add more client  
machines, but I can't until this problem is resolved.


Changing OS builds at this point, unless absolutely necessary, is not  
something I want to do.


Thanks for any insight!

--
Tom Ierna
President
Shockergroup, Inc.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: rpc.lockd stalls

2006-09-07 Thread Tom Ierna

On Sep 7, 2006, at 1:44 PM, Chuck Swiger wrote:
Trying to run a database server or mail server without a disk  
strikes me as a very bad idea.


This is unfortunate - the client machines I have chosen have no  
front-panel disk sleds. Hardware administration will be a bear if  
they each have to have their own disks. Software-wise, I was hoping  
to have them all share a common Kernel and userland too, so I only  
have to update software in one place.


I am surprised that rpc.lockd is holding up well enough to only go  
down about once a month; simply running the locking tests which  
come with sendmail used to be enough to cause rpc.lockd to crash...


I will be using qmail, when I get to that stage. qmail is supposed to  
be rather safe, even over NFS.



Best of luck,
--
-Chuck


Thanks, it sounds like you think I need it :)

I'm open to suggestions on a better method of accomplishing my goals.

Best,
-Tom

--
Tom Ierna
President
Shockergroup, Inc.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: rpc.lockd stalls

2006-09-07 Thread Chuck Swiger

On Sep 7, 2006, at 11:16 AM, Tom Ierna wrote:

On Sep 7, 2006, at 1:44 PM, Chuck Swiger wrote:
Trying to run a database server or mail server without a disk  
strikes me as a very bad idea.


This is unfortunate - the client machines I have chosen have no  
front-panel disk sleds. Hardware administration will be a bear if  
they each have to have their own disks. Software-wise, I was hoping  
to have them all share a common Kernel and userland too, so I only  
have to update software in one place.


I can see your reasoning, however, it's not especially difficult to  
keep many FreeBSD systems updated against a single machine configured  
to build out new versions of the kernel, userland, and installed  
ports when needed. [1]


The thing is, software like mail servers and the database are usually  
I/O bound, not CPU-bound; when you get under enough load to matter,  
usually what you need to do is add more disk spindles and spread DB  
tables or logfiles or mailspool/queuedir locations amongst the extra   
disks.


I am surprised that rpc.lockd is holding up well enough to only go  
down about once a month; simply running the locking tests which  
come with sendmail used to be enough to cause rpc.lockd to crash...


I will be using qmail, when I get to that stage. qmail is supposed  
to be rather safe, even over NFS.


Yes, agreed-- qmail + maildir rather than mbox format is probably  
your best bet for doing operations over NFS.



Best of luck,
--
-Chuck


Thanks, it sounds like you think I need it :)


Well, yes.  But I wouldn't be unhappy if you found something that  
works for your needs, even if it isn't what I would recommend myself.
At least some of the time, I even learn things from people who  
configure things strangely from my perspective...



I'm open to suggestions on a better method of accomplishing my goals.


[1]: Mount /usr/src  /usr/obj from the buildserver on each machine,  
do the update process, and then rsync over or mount /usr/ports/ 
packages, and use portupgrade or whatever to update or install from  
the precompiled packages.


--
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: rpc.lockd stalls

2006-09-07 Thread Kris Kennaway
On Thu, Sep 07, 2006 at 03:19:51PM -0400, Tom Ierna wrote:
 
 On Sep 7, 2006, at 2:39 PM, Kris Kennaway wrote:
 
 On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote:
 Is there a way to note -L via fstab? Since these machines are PXE
 booted, unmounting and re-mounting with -L will be problematic, and
 I'd like them to inherit this property at reboot.
 
 Yes, use the -o format, see the manpage.
 
 Under the man page for mount_nfs, I have the following:
 
  -o  Options are specified with a -o flag followed by a  
 comma sepa-
  rated string of options.  See the mount(8) man page for  
 possible
  options and their meanings.  The following NFS specific  
 options
  are also available:
 ...
  Historic -o Options
 ...
  lockd  Same as not specifying -L.
 ...
 
 It doesn't have any other reference to -L. Are mounts specified in  
 fstab automatically non-locking, or is the man page incorrect?

Prefixing with 'no' negates an option.

Kris


pgpfmX1pSMpew.pgp
Description: PGP signature


Re: rpc.lockd stalls

2006-09-07 Thread Tom Ierna


On Sep 7, 2006, at 2:39 PM, Kris Kennaway wrote:


On Thu, Sep 07, 2006 at 02:12:26PM -0400, Tom Ierna wrote:

Is there a way to note -L via fstab? Since these machines are PXE
booted, unmounting and re-mounting with -L will be problematic, and
I'd like them to inherit this property at reboot.


Yes, use the -o format, see the manpage.


Under the man page for mount_nfs, I have the following:

 -o  Options are specified with a -o flag followed by a  
comma sepa-
 rated string of options.  See the mount(8) man page for  
possible
 options and their meanings.  The following NFS specific  
options

 are also available:
...
 Historic -o Options
...
 lockd  Same as not specifying -L.
...

It doesn't have any other reference to -L. Are mounts specified in  
fstab automatically non-locking, or is the man page incorrect?


Thanks,
-Tom


--
Tom Ierna
President
Shockergroup, Inc.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]