Current version is a mess
I thought to download a more or less current version of mon. According to http://sourceforge.net/projects/mon/ mon-0.38.20 is the current version, with a timestamp of 2000-08-29, it even has an ANNOUNCE file. According to ftp://ftp.kernel.org/pub/software/admin/mon/ it says LATEST-STABLE-IS-0.99.2 with a timestamp of 09/08/2001 For the daredevils there's a devel tree, with loads of mon-0.99*, all timestamped 2003/2004, and I personally run already for years mon-1.0pre4 without any significant problem. Also I see loads of mon-1.0pre* but nowhere a mon-1.0.tar.gz, but I *do* see mon-1.1.0pre? files. So I guess mon went from 1.0pre5 to 1.1pre1, whilst skipping any stable or this is our current baseline mention. This is all rather disheartening. Cleaning out the tree would take no more than fifteen minutes, and would consist of deleting all references (maybe under obsolete) to 0.38 and 0.99 versions, and making a symlink from mon.1.0.gz to mon.1.0pre5.gz. -- |Hans Kinwel | [EMAIL PROTECTED] ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
Re: fping.monitor output problems
On 09/01/2005 03:50 AM, Ed Ravin wrote: On Thu, Jul 07, 2005 at 01:05:13PM +0200, Kevin Ivory wrote: On 2005-07-07 13:00, Kevin Ivory wrote: the fping.monitor included with mon-1.0.0pre5 doesn't semm to parse the output of fping correctly. ... # ./fping.monitor 192.168.140.3 192.168.140.3 ICMP ICMP ICMP ICMP some more extra information: the problematic code must have went in between pre3 and pre4: pre3's output looks fine. I've been getting this too - it looks like fping sends some of its error messages to stderr, which confuse fping.monitor. Try the patch below, which discards the messages, which fping.monitor would ignore anyway. I finally went to the bottom of this. Not that it is rocket science. When I do fping 1.2.3.4 I get ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4 ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4 ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4 1.2.3.4 is unreachable In the (new, broken) fping.monitor I see: if (/^(\S+).*unreachable/i) { push (@unreachable, $1); } Whereas the (old, good) fping monitor says: if (/^(\S+).*unreachable/) { push (@unreachable, $1); } It is evident now. Some well intending person (probably Jim, from the RCSid) added a /i and now that string matches with the ICMP Host Unreachable, to which it is not supposed to match. It is supposed later to match with a do nothing clause that is indeed the right thing to do. So if somebody would be so kind to remove that /i I will be much obliged. Another thing is that the fping.monitor prints out a start time, end time and duration time. Where duration time is always in the order of seconds. Turns out that these times refer to the runtime of the fping script. I find this extremely confusing. When I, or one of my collegues, gets an alert regarding a ping-problem, which includes a starttime, endtime and duration, one is naturally inclined to think that the ping problem only lasted a couple of seconds. In reality in ping problem still remains, and that means that connectivity to some host is still lost. I find that if I comment out those three print statements the resulting alert gets much more readable. So, I would appreciate it if someone would remove these statements from the repository as well. Thank you and greetings, -- |Hans Kinwel | [EMAIL PROTECTED] ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
Re: Patch for snmpdiskspace.monitor (ext2/ext3)
On 16/11/05 20:17, Ed Ravin wrote: Hans, thanks for posting the two patches! I hate to look a gift horse in the mouth, but I have a couple of concerns: No worry about the gift horse. I happen to disagree though. Not that its a big issue. This is not unique to ext2/ext3 filesystems - it's implemented on most Unix filesystems, and as the OP said, the percentage of reserved space is configurable. Well, let he who has another type of filesystem add his own if clause. And yes, it is configurable. I've never seen anybody do configure it though. There's no need. The amount of diskspace you win by configuring it, is obsolete in two months when the next generation of diskdrives comes out with double the capacity. It *could* be configured, yes, but I'm not here to make life easier for some hypothetical person who tweaks a setting that nobody ever tweaks. I'm here to solve my own problems. And it's keeping me pretty busy at that. Thus, I recommend that you make the reservation compensation a configurable option, turned off by default, so that people who upgrade to the new version aren't surprised by the change in behavior. Well, they *should* be surprised. Only 1/2 a :-) here. They do not have as much diskspace available as they thought they had. If they've configured a limit of 5 % they only get an alert when the disk is 100 % full. That's not good behaviour to me. It actually looks more like a bug, though technically it's not. To me free diskspace is what df reports. But YMMV. Likewise for the swap space patch - that too should be an option, perhaps a more generic option like --include-filesystems regexp which would check space on any filesystem whose description matched the regexp. Well, apart from the trivial nitpick that swapspace is not a filesystem, *and* under windows it's called Virtual Memory while under Linux its called Swap Space, this could still be called --include-swapspace but then again, who wouldn't appreciate the extra, relevant, if not to say important information the monitor now reports. So, if anything, I would be in favour of an --exclude-swapspace, to accomodate some hypothetical person who would be annoyed by the extra information the monitor is giving out. I'm not overwhelmed. Come to think of it, it could be as simple as a default setting in snmpdiskspace.cf: *Swap Space 0% *Virtual Memory 0% But it's only a few lines of code. You can do with it whatever you want. I won't be traumatized if you don't see it fit to use them. Neither will I if you decide to surround them with some sort of if clause. These are just two hacks that work for me, and I just posted them back to the list. To me mon is a day-to-day lifesaver, giving me, on an almost daily basis, alerts of the most important kind. I wouldn't know what to do without it. Competing products don't cut it. Oh, yeah, I also have a file.monitor which monitors (log) files for certain strings. That has come from a rudimentary hack to quite an elaborate script. I'll post it after I've brushed it up. Cheers, -- |Hans Kinwel | [EMAIL PROTECTED] ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
Patch for snmpdiskspace.monitor (ext2/ext3)
As a long time user of the snmpdiskspace.monitor, I was misled everytime an alert went of, as it reported significantly more disk free then available. Cause was that ext2/ext3 filesystems reserves 5 % for emergencies, ie root usage only. This diskspace is not available to the general user (df doesn't see it), but according to SNMP this diskspace is unused and hence snmpdiskspace.monitor reports much more disk free then is available in reality. The problem is of course that recipients of the alert think the amount of diskspace still suffices, while in reality the situation is more urgent than it looks. The solution turns out te be really simple: check if the filesystem is ext2/ext3 and reduce the total diskspace by 5 % in the snmpdiskspace.monitor. This is indeed the correct calculation for calculating the true numbers. Alerts now generated correspond with what one sees with df. The patch is against Ed's version 1.5 2005/01/13. Attached is the context diff. It might look impressive, but it's only four lines of code. Something else, in the new SNMP::Session call I NEED to add a Version = 2 parameter. Else the monitor crashes with could not get SNMP info: Unknown user name. Debugging turns out te be quite obscure. I would prefer it if the monitor came with a parameter or comment or somesuch that would remind me of its existence. Cheers, -- |Hans Kinwel | [EMAIL PROTECTED] *** snmpdiskspace.monitor.old 2005-11-16 14:38:10.0 +0100 --- snmpdiskspace.monitor.new 2005-11-16 15:07:05.0 +0100 *** *** 331,337 sub get_values { my ($host) = @_; ! my (@disklist,$Type,$Descr,$AllocationUnits,$Size,$Used,$Freespace,$Percent,$InodePercent); my ($v,$s); --- 331,337 sub get_values { my ($host) = @_; ! my (@disklist,$Type,$FSType,$Descr,$AllocationUnits,$Size,$Used,$Freespace,$Percent,$InodePercent); my ($v,$s); *** *** 361,366 --- 361,367 ['hrStorageAllocationUnits'], ['hrStorageSize'], ['hrStorageUsed'], + ['hrFSType'], ); *** *** 372,377 --- 373,388 $AllocationUnits= $v-[3]-val; $Size = $v-[4]-val; $Used = $v-[5]-val; + $FSType = $v-[6]-val; + + # if filesystem == ext2/ext3 then... + # ext2/ext3 filesystems reserve 5 % of diskspace for emergencies. + # Substract from total, and get an outcome much more in line with + # what df tells you + # .1.3.6.1.2.1.25.3.9.23 is OID: HOST-RESOURCES-TYPES::hrFSLinuxExt2 + if ($FSType eq .1.3.6.1.2.1.25.3.9.23) { + $Size = $Size * 0.95; + } $Freespace = (($Size - $Used) * $AllocationUnits); print STDERR Found HOST MIB filesystem: Type=$Type, Descr=$Descr, AllocationUnits=$AllocationUnits, Size=$Size, Used=$Used\n if $DEBUG; ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon
Another patch for snmpdiskspace.monitor (swapspace)
Here's another patch for snmpdiskspace.monitor; this one's even more impressive: two lines of code changed. Yet I consider it quite an important patch myself: with this patch also swapspace get monitored for disk full. I don't know about you guys, but when one of my production servers runs out of swapspace it is just as bad as any other partition filling up, maybe even more so. The change is extremely trivial, and reporting and configuration in the .cf file works straight out of the box without any extra effort needed. Cheers, -- |Hans Kinwel | [EMAIL PROTECTED] *** snmpdiskspace.monitor.old 2005-11-16 15:36:48.0 +0100 --- snmpdiskspace.monitor.new 2005-11-16 15:43:06.0 +0100 *** *** 393,399 # Using the Empire agent, this will eliminate drive types other # than hard disks. The UCD agent is not as good as determining # drive types under the HOST mib. ! next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/); if ($Size != 0) { $Percent= ($Used / $Size) * 100.0; --- 393,402 # Using the Empire agent, this will eliminate drive types other # than hard disks. The UCD agent is not as good as determining # drive types under the HOST mib. ! # We do not only monitor FixedDevice type (4), but also ! # Virtual Memory type (3), as running out of swap is as bad as ! # running out of other diskspace. ! next if ($Type !~ /\.1\.3\.6\.1\.2\.1\.25\.2\.1\.[34]/); if ($Size != 0) { $Percent= ($Used / $Size) * 100.0; *** *** 431,437 while (defined $s-getnext($v)) { # Make sure we are still in relevant portion of MIB ! last if ($v-[1]-val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.4/); last if ($v-[0]-val =~ /Total/); $Descr = ( $v-[0]-val =~ /.*:.*:(\w+:)$/gi)[-1] ; --- 434,440 while (defined $s-getnext($v)) { # Make sure we are still in relevant portion of MIB ! last if ($v-[1]-val !~ /^\.1\.3\.6\.1\.2\.1\.25\.2\.1\.[34]/); last if ($v-[0]-val =~ /Total/); $Descr = ( $v-[0]-val =~ /.*:.*:(\w+:)$/gi)[-1] ; ___ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon