Re: Monitoring for a hung NFS mount?

2008-04-02 Thread Augie Schwer
I hear Jim's going to release the nfscheck.monitor he wrote into the
mon-contrib tree which is the same basic logic as what I wrote, but
implemented in a far cleaner way.

On the topic of NFS; the next step would be to do a compare between
mtab and fstab and alert if everything you thought was mounted
actually wasn't; seems pretty trivial, but anyone already have
something written up?

--Augie

On Tue, Apr 1, 2008 at 5:02 PM, Augie Schwer [EMAIL PROTECTED] wrote:
 Thanks everyone who replied (privately and on the list); attached is
  what I finally went with; it works well, doesn't stack procs. for hung
  mounts and works great using the snmpvar monitor.

  --Augie



  On Thu, Mar 27, 2008 at 3:27 PM, Augie Schwer [EMAIL PROTECTED] wrote:
   Anyone have a good way to monitor for a hung NFS mount on a remote machine?
  
I've been at it all day trying to come up with a clever way to check
the hung mount, not let the monitor get hung and return some useful
information; like what mount is hung, but I've come to a dead end and
I think the best that can be done is to let the monitor timeout and
then sound an alarm based on that timeout.
  
Anyone else have ideas?
  
  
--
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072
  



  --
  Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
  Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072




-- 
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-04-02 Thread Ed Ravin
On Wed, Apr 02, 2008 at 10:49:00AM -0700, Augie Schwer wrote:
 On the topic of NFS; the next step would be to do a compare between
 mtab and fstab and alert if everything you thought was mounted
 actually wasn't; seems pretty trivial, but anyone already have
 something written up?

No, but remember that the location and semantics of mount tables varies
drastically with the operating system - Solaris, for example (and IIRC),
keeps the mount table in-kernel, and you need to call an API to see what's
mounted.  The equivalent of mtab is actually a device driver that calls
the API, not a regular file.  So don't hard code any paths and use
test -e (existence), not test -f (exists and is a regular file) when
scripting in the sanity checks.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-04-02 Thread Augie Schwer
On Wed, Apr 2, 2008 at 11:19 AM, Ed Ravin [EMAIL PROTECTED] wrote:
 On Wed, Apr 02, 2008 at 10:49:00AM -0700, Augie Schwer wrote:
   On the topic of NFS; the next step would be to do a compare between
   mtab and fstab and alert if everything you thought was mounted
   actually wasn't; seems pretty trivial, but anyone already have
   something written up?
  No, but remember that the location and semantics of mount tables varies
  drastically with the operating system - Solaris, for example (and IIRC),
  keeps the mount table in-kernel, and you need to call an API to see what's
  mounted.  The equivalent of mtab is actually a device driver that calls
  the API, not a regular file.  So don't hard code any paths and use
  test -e (existence), not test -f (exists and is a regular file) when
  scripting in the sanity checks.

Noted; I think Jim's monitor does this already, so maybe I'll use that
as a basis.


-- 
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-04-01 Thread Augie Schwer
Thanks everyone who replied (privately and on the list); attached is
what I finally went with; it works well, doesn't stack procs. for hung
mounts and works great using the snmpvar monitor.

--Augie

On Thu, Mar 27, 2008 at 3:27 PM, Augie Schwer [EMAIL PROTECTED] wrote:
 Anyone have a good way to monitor for a hung NFS mount on a remote machine?

  I've been at it all day trying to come up with a clever way to check
  the hung mount, not let the monitor get hung and return some useful
  information; like what mount is hung, but I've come to a dead end and
  I think the best that can be done is to let the monitor timeout and
  then sound an alarm based on that timeout.

  Anyone else have ideas?


  --
  Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
  Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072




-- 
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072


nfs_monitor.pl
Description: Perl program
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-03-28 Thread Augie Schwer
On Fri, Mar 28, 2008 at 8:17 AM, Jeff Price [EMAIL PROTECTED] wrote:
 can you cat a file on the mounted directory and maybe do a checksum on
  it?  If you can't open the file consider it hung.  I am not suire of a
  direct way to get at the NFS, maybe the NFS control port is unavaila
  when it hangs, but I think since those get spawned on demand that might
  not work.

The problem is that the mount is hung so any procs. trying to do a
read on that mount hang as well; which could be your indicator that
you have a failure scenario, but then you have hung procs. stacking up
and you can't communicate back to your monitor agent which mount is
hung.


-- 
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-03-28 Thread Andrew Ryan
You'd probably need 2 processes ; one to drive and another process to go
off and stat the mount point. The driver would invoke the stat'ers, and
if the stat doesn't come back in some seconds, declare the mount hung.
Because if the mount really is hung, the stat process is going to hang
forever too, so you don't want your driver process to get hung too.

--andrew

On Thu, 27 Mar 2008, Augie Schwer wrote:

 Anyone have a good way to monitor for a hung NFS mount on a remote machine?

 I've been at it all day trying to come up with a clever way to check
 the hung mount, not let the monitor get hung and return some useful
 information; like what mount is hung, but I've come to a dead end and
 I think the best that can be done is to let the monitor timeout and
 then sound an alarm based on that timeout.

 Anyone else have ideas?


 --
 Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
 Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072

 ___
 mon mailing list
 mon@linux.kernel.org
 http://linux.kernel.org/mailman/listinfo/mon



___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-03-28 Thread Augie Schwer
On Fri, Mar 28, 2008 at 11:00 AM, Andrew Ryan [EMAIL PROTECTED] wrote:
 You'd probably need 2 processes ; one to drive and another process to go
  off and stat the mount point. The driver would invoke the stat'ers, and
  if the stat doesn't come back in some seconds, declare the mount hung.
  Because if the mount really is hung, the stat process is going to hang
  forever too, so you don't want your driver process to get hung too.

This is the path I went down, but the problem I ran into was that the
forked child inherits process info., file descriptors, etc. from the
parent and running my monitor remotely (ssh, or snmp) will hang the
session as ssh or snmp waits for the inherited resources to be
released from the child.

Attached is where I got stuck; you can see I try and divorce the
parent from the child as much as possible by closing all file handles,
but when I run it via snmp (exec) it still hangs when I walk the tree;
hanging on the STDOUT OID bit. I do get the correct return code
though, so my next step is to try and just grab the return value OID
and see if I can alert on that.

Of course I still leave hung procs. around which is not really desirable.


-- 
Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072


nfs_monitor.pl
Description: Perl program
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-03-28 Thread Andrew Ryan
Yeah, a plain fork/alarm isn't going to help you here because it's going
to block waiting on the IO from the hung mount.

A low tech way around this would be instead of forking, to exec off your
stat process, and have it write/touch some file marker somewhere after it
finishes the stat. The only way it wont finish the stat is if the mount
hangs.

Then sleep for the requisite number of seconds,wake up and check mtime on
the marker files. There are other ways too, but you get the idea.

(greets from bolt.sonic.net, which I see has 71 NFS mounts. I can see why
you have this need :) )

On Fri, 28 Mar 2008, Augie Schwer wrote:

 On Fri, Mar 28, 2008 at 11:00 AM, Andrew Ryan [EMAIL PROTECTED] wrote:
  You'd probably need 2 processes ; one to drive and another process to go
   off and stat the mount point. The driver would invoke the stat'ers, and
   if the stat doesn't come back in some seconds, declare the mount hung.
   Because if the mount really is hung, the stat process is going to hang
   forever too, so you don't want your driver process to get hung too.

 This is the path I went down, but the problem I ran into was that the
 forked child inherits process info., file descriptors, etc. from the
 parent and running my monitor remotely (ssh, or snmp) will hang the
 session as ssh or snmp waits for the inherited resources to be
 released from the child.

 Attached is where I got stuck; you can see I try and divorce the
 parent from the child as much as possible by closing all file handles,
 but when I run it via snmp (exec) it still hangs when I walk the tree;
 hanging on the STDOUT OID bit. I do get the correct return code
 though, so my next step is to try and just grab the return value OID
 and see if I can alert on that.

 Of course I still leave hung procs. around which is not really desirable.


 --
 Augie Schwer - [EMAIL PROTECTED] - http://schwer.us
 Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-03-28 Thread Michael Alan Dorman
On Fri, 28 Mar 2008 11:00:54 -0700 (PDT)
Andrew Ryan [EMAIL PROTECTED] wrote:

 You'd probably need 2 processes ; one to drive and another process to
 go off and stat the mount point. The driver would invoke the
 stat'ers, and if the stat doesn't come back in some seconds, declare
 the mount hung. Because if the mount really is hung, the stat process
 is going to hang forever too, so you don't want your driver process
 to get hung too.

Why not implement this as a heartbeat trap---you have a process that
just stats the mount point every however often, and sends a trap to
mon.  If it freezes because the mount freezes, it won't send the trap
and mon will alert.

Seems a lot cleaner, IMHO.

Mike.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon