Current version is a mess

2006-05-09 Thread Hans Kinwel

I thought to download a more or less current version of mon.

According to mon-0.38.20 is
the current version, with a timestamp of 2000-08-29, it even has an

According to it says
LATEST-STABLE-IS-0.99.2 with a timestamp of 09/08/2001

For the daredevils there's a devel tree, with loads of mon-0.99*, all
timestamped 2003/2004, and I personally run already for years mon-1.0pre4
without any significant problem.

Also I see loads of mon-1.0pre* but nowhere a mon-1.0.tar.gz, but I *do*
see mon-1.1.0pre? files.  So I guess mon went from 1.0pre5 to 1.1pre1,
whilst skipping any stable or this is our current baseline mention.

This is all rather disheartening.

Cleaning out the tree would take no more than fifteen minutes, and would
consist of deleting all references (maybe under obsolete) to 0.38 and
0.99 versions, and making a symlink from mon.1.0.gz to mon.1.0pre5.gz.

  |Hans Kinwel

mon mailing list

Re: fping.monitor output problems

2005-12-13 Thread Hans Kinwel

On 09/01/2005 03:50 AM, Ed Ravin wrote:

On Thu, Jul 07, 2005 at 01:05:13PM +0200, Kevin Ivory wrote:

On 2005-07-07 13:00, Kevin Ivory wrote:

the fping.monitor included with mon-1.0.0pre5 doesn't semm to parse
the output of fping correctly.


# ./fping.monitor ICMP ICMP ICMP ICMP

some more extra information: the problematic code must have went in
between pre3 and pre4: pre3's output looks fine.

I've been getting this too - it looks like fping sends some of its
error messages to stderr, which confuse fping.monitor.  Try the patch
below, which discards the messages, which fping.monitor would ignore

I finally went to the bottom of this.  Not that it is rocket science.

When I do fping I get

ICMP Host Unreachable from for ICMP Echo sent to
ICMP Host Unreachable from for ICMP Echo sent to
ICMP Host Unreachable from for ICMP Echo sent to is unreachable

In the (new, broken) fping.monitor I see:

   if (/^(\S+).*unreachable/i)
   push (@unreachable, $1);

Whereas the (old, good) fping monitor says:

   if (/^(\S+).*unreachable/)
   push (@unreachable, $1);

It is evident now.  Some well intending person (probably Jim, from
the RCSid) added a /i and now that string matches with the ICMP Host
Unreachable, to which it is not supposed to match. It is supposed later
to match with a do nothing clause that is indeed the right thing to do.

So if somebody would be so kind to remove that /i I will be much obliged.

Another thing is that the fping.monitor prints out a start time,
end time and duration time.  Where duration time is always in
the order of seconds.  Turns out that these times refer to the runtime
of the fping script.  I find this extremely confusing.  When I, or one
of my collegues, gets an alert regarding a ping-problem, which includes
a starttime, endtime and duration, one is naturally inclined to think
that the ping problem only lasted a couple of seconds.  In reality in
ping problem still remains, and that means that connectivity to some
host is still lost.

I find that if I comment out those three print statements the resulting
alert gets much more readable.  So, I would appreciate it if someone
would remove these statements from the repository as well.

Thank you and greetings,
  |Hans Kinwel

mon mailing list

Re: Patch for snmpdiskspace.monitor (ext2/ext3)

2005-11-17 Thread Hans Kinwel

On 16/11/05 20:17, Ed Ravin wrote:

Hans, thanks for posting the two patches!  I hate to look a gift horse
in the mouth, but I have a couple of concerns:

No worry about the gift horse.   I happen to disagree though.  Not that 
its a big issue.

This is not unique to ext2/ext3 filesystems - it's implemented on most
Unix filesystems, and as the OP said, the percentage of reserved space
is configurable.

Well, let he who has another type of filesystem add his own if clause.
And yes, it is configurable.  I've never seen anybody do configure it 
though.  There's no need.  The amount of diskspace you win by 
configuring it, is obsolete in two months when the next generation of 
diskdrives comes out with double the capacity.  It *could* be 
configured, yes, but I'm not here to make life easier for some 
hypothetical person who tweaks a setting that nobody ever tweaks.  I'm 
here to solve my own problems.  And it's keeping me pretty busy at that.

Thus, I recommend that you make the reservation compensation a
configurable option, turned off by default, so that people who upgrade
to the new version aren't surprised by the change in behavior.

Well, they *should* be surprised.  Only 1/2 a :-) here.  They do not 
have as much diskspace available as they thought they had.  If they've 
configured a limit of 5 % they only get an alert when the disk is 100 % 
full.   That's not good behaviour to me.  It actually looks more like a 
bug, though technically it's not.  To me free diskspace is what df 
reports.  But YMMV.

Likewise for the swap space patch - that too should be
an option, perhaps a more generic option like

   --include-filesystems regexp

which would check space on any filesystem whose description matched
the regexp.

Well, apart from the trivial nitpick that swapspace is not a filesystem,
*and* under windows it's called Virtual Memory while under Linux its 
called Swap Space, this could still be called --include-swapspace but 
then again, who wouldn't appreciate the extra, relevant, if not to say 
important information the monitor now reports.  So, if anything, I would 
be in favour of an --exclude-swapspace, to accomodate some hypothetical 
person who would be annoyed by the extra information the monitor is 
giving out.  I'm not overwhelmed.

Come to think of it, it could be as simple as a default setting in

*Swap Space  0%
*Virtual Memory  0%

But it's only a few lines of code.  You can do with it whatever you 
want.  I won't be traumatized if you don't see it fit to use them. 
Neither will I if you decide to surround them with some sort of if 
clause.  These are just two hacks that work for me, and I just posted 
them back to the list.

To me mon is a day-to-day lifesaver, giving me, on an almost daily 
basis, alerts of the most important kind.  I wouldn't know what to do 
without it.  Competing products don't cut it.
Oh, yeah, I also have a file.monitor which monitors (log) files for 
certain strings.  That has come from a rudimentary hack to quite an 
elaborate script.  I'll post it after I've brushed it up.

   |Hans Kinwel

mon mailing list

Patch for snmpdiskspace.monitor (ext2/ext3)

2005-11-16 Thread Hans Kinwel

As a long time user of the snmpdiskspace.monitor, I was misled everytime 
an alert went of, as it reported significantly more disk free then 

Cause was that ext2/ext3 filesystems reserves 5 % for emergencies, ie
root usage only.  This diskspace is not available to the general user
(df doesn't see it), but according to SNMP this diskspace is unused
and hence snmpdiskspace.monitor reports much more disk free then is 
available in reality.  The problem is of course that recipients of the 
alert think the amount of diskspace still suffices, while in reality the 
situation is more urgent than it looks.

The solution turns out te be really simple: check if the filesystem is
ext2/ext3 and reduce the total diskspace by 5 % in the
snmpdiskspace.monitor.  This is indeed the correct calculation for 
calculating the true numbers.  Alerts now generated correspond with what 
one sees with df.

The patch is against Ed's version 1.5 2005/01/13.  Attached is the 
context diff.  It might look impressive, but it's only four lines of code.

Something else, in the new SNMP::Session call I NEED to add a Version 
= 2 parameter.  Else the monitor crashes with could not get SNMP 
info: Unknown user name.  Debugging turns out te be quite obscure. I 
would prefer it if the monitor came with a parameter or comment or 
somesuch that would remind me of its existence.

   |Hans Kinwel

*** snmpdiskspace.monitor.old   2005-11-16 14:38:10.0 +0100
---   2005-11-16 15:07:05.0 +0100
*** 331,337 
  sub get_values {
  my ($host) = @_;
! my 
  my ($v,$s);
--- 331,337 
  sub get_values {
  my ($host) = @_;
! my 
  my ($v,$s);
*** 361,366 
--- 361,367 
+   ['hrFSType'],
*** 372,377 
--- 373,388 
$AllocationUnits= $v-[3]-val;
$Size   = $v-[4]-val;
$Used   = $v-[5]-val;
+   $FSType = $v-[6]-val;
+   # if filesystem == ext2/ext3 then...
+   # ext2/ext3 filesystems reserve 5 % of diskspace for 
+   # Substract from total, and get an outcome much more in line 
+   # what df tells you
+   # . is OID: 
+   if ($FSType eq . {
+   $Size = $Size * 0.95;
+   }
$Freespace = (($Size - $Used) * $AllocationUnits);
print STDERR Found HOST MIB filesystem: Type=$Type, 
Descr=$Descr, AllocationUnits=$AllocationUnits, Size=$Size, Used=$Used\n if 
mon mailing list

Another patch for snmpdiskspace.monitor (swapspace)

2005-11-16 Thread Hans Kinwel

Here's another patch for snmpdiskspace.monitor; this one's even more 
impressive: two lines of code changed.

Yet I consider it quite an important patch myself: with this patch also 
swapspace get monitored for disk full.  I don't know about you guys, 
but when one of my production servers runs out of swapspace it is just 
as bad as any other partition filling up, maybe even more so.

The change is extremely trivial, and reporting and configuration in the 
.cf file works straight out of the box without any extra effort needed.

   |Hans Kinwel
*** snmpdiskspace.monitor.old   2005-11-16 15:36:48.0 +0100
---   2005-11-16 15:43:06.0 +0100
*** 393,399 
# Using the Empire agent, this will eliminate drive types other
# than hard disks. The UCD agent is not as good as determining
# drive types under the HOST mib.
!   next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/);

if ($Size != 0) {
$Percent= ($Used / $Size) * 100.0;
--- 393,402 
# Using the Empire agent, this will eliminate drive types other
# than hard disks. The UCD agent is not as good as determining
# drive types under the HOST mib.
!   # We do not only monitor FixedDevice type (4), but also 
!   # Virtual Memory type (3), as running out of swap is as bad as 
!   # running out of other diskspace.
!   next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.[34]/);

if ($Size != 0) {
$Percent= ($Used / $Size) * 100.0;
*** 431,437 
while (defined $s-getnext($v)) {
# Make sure we are still in relevant portion of MIB
!   last if ($v-[1]-val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/);
last if ($v-[0]-val =~ /Total/);

$Descr  = ( $v-[0]-val =~ /.*:.*:(\w+:)$/gi)[-1] ;
--- 434,440 
while (defined $s-getnext($v)) {
# Make sure we are still in relevant portion of MIB
!   last if ($v-[1]-val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.[34]/);
last if ($v-[0]-val =~ /Total/);

$Descr  = ( $v-[0]-val =~ /.*:.*:(\w+:)$/gi)[-1] ;
mon mailing list