SSL Certificate expiration monitor

2011-01-31 Thread Ed Ravin
So I finally got around to writing an SSL certificate monitor for
POP/IMAP/SMTP servers, using the OpenSSL s_client -starttls feature.

As noted in the script, the timeout is not user-settable since it
seems to be buried in the OpenSSL client somewhere, so if you use
this script on multiple hosts, make sure that if all of them time
out it will not exceed the period of the test.  On my system,
the openssl s_client command took 75 seconds to time out when
the host was not available.

-- Ed
#!/usr/local/bin/perl5.8.8
#
# $Id: sslcert.monitor,v 1.2 2011/01/31 21:56:52 root Exp $
#

my $usage=Usage: sslcert.monitor  --expiry NN --port NNN [--starttls 
{imap|smtp|pop3|ftp}] hostname [...];

# check SSL certs of direct SSL-wrapped services or STARTTLS services
# alarm if certificate expires within --expiry days.

# Written by Ed Ravin era...@panix.com January 2011
# Code made available courtesy of Public Access Networks, http://panix.com
# License is GNU

use Getopt::Long;
use Date::Parse;

my @details=();
my @failures=();

GetOptions( \%options, port=i, expiry=i, starttls=s, debug )
or die $usage;

my $port= $options{port} || die $usage;
my $expiry= $options{expiry} || die $usage;
my $starttls= $options{starttls} || ;
die $usage unless ( $starttls =~ /^(imap|smtp|pop3|ftp)$/ or $starttls eq );
$starttls= -starttls $starttls unless $starttls eq ;
my $debug= $options{debug} || 0;
my $now = time;
my $expiredeadline= $now + ($expiry * 3600 * 24);

#openssl s_client -connect mail.panix.com:993  2/dev/null /dev/null | openssl 
x509 -noout -enddate; echo $?
# notAfter=Jan 28 20:18:05 2013 GMT

for $host( @ARGV ) {

my $cmdline= sprintf(openssl s_client -connect %s:%d %s 2/dev/null  
/dev/null | openssl x509 -noout -enddate 21,
$host, $port, $starttls);

print Command: $cmdline\n if $debug;

my $ssloutput= `$cmdline`;
my $rc= $?  8;

if( $rc != 0 ) {
push( @failures, $host);
push( @details, $host: openssl return code $rc, output: 
$ssloutput\n);
next;
}

chomp $ssloutput;
if ( $ssloutput !~ /^\s*notAfter\s*=\s*(.*)/) {
push( @failures, $host);
push( @details, $host: unexpected result from openssl command 
line: $ssloutput\n);
next;
}

my $certexpiretime= str2time($1);
if (!defined($certexpiretime)) {
push( @failures, $host);
push( @details, $host: unable to parse openssl command line 
output: $ssloutput\n);
next;
}

if ($certexpiretime   $expiredeadline) {
push( @failures, $host);
push( @details, $host: certificate expires within $expiry 
days: $ssloutput is  . int(($certexpiretime - $now) / 3600 / 24) .  days 
away\n);
next;
}

print $host: $ssloutput - notAfter=$certexpiretime - 
deadline=$expiredeadline\n if $debug;

}

if (@failures == 0) {
exit 0;
}

print join ( , sort @failures), \n;
print sort @details if (scalar @details  0);

exit 1;

__END__

=head1 NAME

sslcert.monitor - alarm when SSL certificates approach expiration date

=head1 SYNOPSIS

Bsslcert.monitor  --expiry Idays --port Iportnum [--starttls 
{imap|smtp|pop3|ftp}] hostname [...]

=head1 DESCRIPTION

Bsslcert.monitor checks the requested server(s) at the requested
port number and alarms if the SSL certificate of the server will
expire within the specified deadline.

=head1 OPTIONS

=item B--expiry Idays

Alarm if the server certificate expires within the specified number of days.

=item B--portnum Iport-number

The numeric port number to test.

=item B--starttls Iprotocol

Connect to the specified port number without encryption and issue a
STARTTLS command to switch to encrypted mode.  The OpenSSL client
supports the protocals pop3, smtp, imap, and ftp.

=item B--debug

List out debugging information for each server.

=head1 BUGS

Bsslcert.monitor is a wrapper around the Bopenssl s_client command
and has the same limitations, including a non-adjustable timeout.
If the host is not answering it could take 75 seconds or longer for
Bsslcert.monitor to detect the error for each host, possibly causing
long delays for this command when testing multiple hosts.
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: AMANDA Monitor

2011-01-07 Thread Ed Ravin
On Fri, Jan 07, 2011 at 06:52:19PM -0500, Nathan Gibbs wrote:
... 
  I suppose, if you didn't want to read your reports, you could have
  procmail or something route them all to a folder where something like
  SEC could scan them and then only generate an alert when there was
  something out of the ordinary or a specific problem. But that seems like
  a lot of effort to avoid looking at your reports.
 
 Agreed, way too much work to avoid work.
 :-)
 What is SEC?
 If I'm chasing shadows, I might as well learn something in the process.

Simple Event Correlator

This looks like the site for it:

   http://simple-evcorr.sourceforge.net/

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: multi RBL monitor

2010-05-22 Thread Ed Ravin
On Sat, May 22, 2010 at 11:40:44AM +1000, Noel Butler wrote:
 On Fri, 2010-05-21 at 20:15 -0400, Jim Trocki wrote:
 
  On Sat, 22 May 2010, Noel Butler wrote:
  
   Hrmm, it is not there now thats why you cant find it. The original was
   from 2003,  I dont know why it was pulled, ill find out.
  
  http://linux.kernel.org/pipermail/mon/2007-July/001645.html
  
  Not sure what happened to it in the repo. Maybe I fatfingered something.
  I know in the past some other things Ed has posted to the list haven't
  made it into the repo, so sorry for that.
  
 
 
 Ahh excellent, maybe it should be modified to give original credit to
 Tim Hanes as well?
 It's likely Tim's version I'm using, it's served us well anyway :)

Looking back over past posts to the Mon list, I see that the script
I just re-submitted to the list was inspired by Tim Hanes's work, but
I had to do a total rewrite to use asynchronous I/O.

Speaking of credit, I see my author line was edited out of the version
Jim put into CVS.  What's that about?

-- Ed


-- 
Ed Ravin   |  Warning - this email may contain rhetorical
   |  devices, metaphors, analogies, typographical
eravin@|  errors, or just plain snarkiness.  A sense of
panix.com  |  humor may be required for proper interpretation.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: multi RBL monitor

2010-05-20 Thread Ed Ravin
On Thu, May 20, 2010 at 08:33:45PM -0400, Nathan Gibbs wrote:
 Anybody already done this?

Yes, see attached.  Note that it queries all the RBL lists in parallel,
so if you run it with a big list and/or a lot of IPs to check,it can run
out of file descriptors.  If that happens, adjust the max number of FDs
with ulimit -n or the like.

 If, so where can I find it?

Argh, did I not upload it to the config section on Sourceforge?  Do I
still even remember how?

-- Ed
#!/usr/bin/perl

# rbl.monitor - check RBL blacklists for an IP address.  Uses asynch I/O
# to send all the requests simultaneously

# Copyright (c) 2007, 2008 by Ed Ravin era...@panix.com.  License is GNU.
# Available to the public courtesy of Public Access Networks http://panix.com

my $usage=\
Usage: rbl.monitor [options] hostname [...]

Options [and default values]:

--listfile list of RBL domains [preset list, see script]
--rbllist comma separated list of RBL domains
--timeout  master timeout  [60 seconds]
--debug  [off]
;


use strict;

use Net::DNS;
use IO::Select;
use Getopt::Long;

my %opt;
GetOptions(\%opt,
listfile=s,
rbllist=s,
timeout=i,
debug,
) or die $usage;

my $listfile= $opt{listfile} || ;
my $rbllist= $opt{rbllist} || ;
my $selecttimeout = 5;
my $timeout= ($opt{timeout} || 60) + ($selecttimeout * 2);
my $debug= $opt{debug} || 0;


# Default RBLs to check - just a few of the lists most likely to block mail
# Sites with specific needs should customize via the command line
my @rbls2check=(
bl.spamcop.net,
relays.mail-abuse.org, 
zen.spamhaus.org,
dnsbl.sorbs.net,
dnsbl-1.uceprotect.net,
);

if ($listfile) {
open(LIST,  $listfile) ||
die $0: cannot open list file \$listfile\: $!\n;
@rbls2check= grep !/^\s*#/, LIST;
@rbls2check= grep !/^\s*$/, @rbls2check;
map {chomp} @rbls2check;
close LIST;
die $0: no RBL names found in \$listfile\\n unless @rbls2check;
}

if ($rbllist) {
@rbls2check= split(',', $rbllist);
}

print *** checking these RBLs:\n. join(\n   , @rbls2check) . \n
if $debug;

my (@summary, @detail);
my @sockets;


my $res  = Net::DNS::Resolver-new;
my $sel  = IO::Select-new();
my $starttime= time;

my %revip2host;

# gethostbyname is non-reentrant, so parse the hostnames to test up front
foreach my $host (@ARGV) {
my $hostdata= gethostbyname($host);
if (!defined($hostdata)) {
push @summary, $host;
push @detail, $host: bad hostname;
next;
}
my $revip= join(., reverse(unpack(C4, $hostdata)));
$revip2host{$revip}= $host;
}

# start all the queries
foreach my $revip (keys %revip2host) {
foreach my $rbl (@rbls2check) {
my $dnssock=  $res-bgsend(join(., $revip, $rbl));
die $0: Net::DNS::Resolver::bgsend returns undef - too many 
open files?\n
unless defined($dnssock);
push @sockets, $dnssock;
$sel-add($dnssock);
}
}

MAINLOOP:
while ($sel-handles  0) {
my @ready = $sel-can_read($selecttimeout);
if ( (time - $starttime)  $timeout) { # waited too long?
push @detail, TIMEOUT:  . scalar($sel-handles) .  responses 
still pending;
last MAINLOOP;
}
foreach my $sock (@ready) {
my ($authority, $ipaddress, $revip, $forwardip, $host);
my $packet = $res-bgread($sock);
foreach my $rr ($packet-answer) {
if ($rr-type eq A) {
$ipaddress= $rr-address;
$authority= $rr-name;
my $q= \$packet-question;
my @qquads= split('\.',${$$q}{qname});
splice(@qquads, 4);
$revip= join('.', @qquads);
$forwardip= join('.', reverse(@qquads));
$host= $revip2host{$revip} || $forwardip;
push @summary, $host
unless grep /^$host$/, @summary;
push @detail, $host: $authority:  . 
$rr-address;
}
}
$sel-remove($sock);
}
}

print join( , (sort @summary)) if (@summary);
print \n;

print join(\n, (sort @detail)), \n  if @detail;

exit 1 if @summary;
exit 0;
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


spamd.monitor

2009-11-01 Thread Ed Ravin
On Sun, Nov 01, 2009 at 07:57:34AM -0500, Nathan Gibbs wrote:
 * Ed Ravin wrote:
  We use a similar monitor for SpamAssassin that uses the corresponding
  fake spam signature to test whether spamd is checking messages - if
  anyone's interested, let me know.
  
  -- Ed
 
 Sure, I could use that.

See attached.

#!/usr/bin/perl -w
#
# test spamd by sending a test spam string.  Will alarm if socket doesn
# not answer or if spamd fails to recognize the test string as spam.

# copyright(2004) by Ed Ravin era...@panix.com.  License is GPL
# this software is made available courtesy of PANIX, http://www.panix.com
# based on code snatched from nntp.monitor by Jim Trocki and
# http.monitor by Jon Meek.
#
#
my $usage=
  spamd.monitor [-d] [-p port] [-t timeout] host [host...]\n;
#  -d for debug
#

use strict;
use Getopt::Std;
use English;

use vars qw($opt_p $opt_t $opt_d);

getopts (m:p:t:d) || die $usage;
my $PORT = $opt_p || 783;
my $TIMEOUT = $opt_t || 30;
my $DEBUG = $opt_d || ;

my @failures = ();
my @details=   ();

# WARNING - this spam test string is broken up to avoid getting trapped by
# spam filters if the program is sent via mail.
my $GTUBE= 
XJS*C4JDBQADN1.NSBN3*2IDNEN* . 
GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X;


foreach my $host (@ARGV) {

if (! spamdTEST($host, $PORT)) {
push (@failures, $host);
}
}

if (@failures == 0) {
exit 0;
}

print join ( , sort @failures), \n;
print sort @details if (scalar @details)  0;

exit 1;


sub spamdTEST {
use Socket;
use Sys::Hostname;

my($Server, $Port) = @_;
my($ServerOK, $TheContent);

$ServerOK = 0;

$TheContent = '';

###
my $TransactionOK= eval {

local $SIG{ALRM} = sub { die Timeout Alarm };
alarm $TIMEOUT;
my $result = OpenSocket($Server, $Port); # Open a connection to the 
server
if (!$result) { # Failure to open the socket
print $Server: Unable to open socket\n if $DEBUG;
return '';
}

my $now= time;
my $testmessage=Subject: Mon test of spamd at $now\r\n\r\n$GTUBE;
my $testlength= length($testmessage) + 2;


#  52 45 50 4f 52 54 20 5350 41 4d 43 2f 31 2e 33REPORT SPAMC/1.3
#  0d 0a 55 73 65 72 3a 2072 6f 6f 74 0d 0a 43 6f..User: root..Co
#  6e 74 65 6e 74 2d 6c 656e 67 74 68 3a 20 31 31ntent-length: 11
#  31 0d 0a 0d 0a1   

transact(REPORT SPAMC/1.3\r\nUser: netmon\r\nContent-length: 
$testlength\r\n, '', $Server: failed sending REPORT request) || return 0;

#  Expected reply to test message:
#  53 50 41 4d 44 2f 31 2e31 20 30 20 45 58 5f 4fSPAMD/1.1 0 EX_O
#  4b 0d 0a  K.. 

transact($testmessage, '^SPAMD/.*\b0\b', $Server: no response (or 
incorrect response) to test message) || return 0;
my $inputline=;

my @spamcresults= S;

if (grep /^1000\s+GTUBE\b/, @spamcresults)
{
push @details, $Server: spamd OK, found test spam\n if $DEBUG;
return 1;
}
else
{
push @details, $Server: spamd responded but didn't find test 
spam\n;
map {push @details, $Server: $_ } @spamcresults;
return 0;
}
};

close(S);
alarm 0; # Cancel the alarm

if ($EVAL_ERROR and ($EVAL_ERROR =~ /^Timeout Alarm/)) {
push(@details, $Server: timeout($TIMEOUT)\n);
return 0;
}

return 0 unless $TransactionOK;

$ServerOK = 1;
return $ServerOK;

}

sub OpenSocket {
#
# Make a Berkeley socket connection between this program and a TCP port
#  on another (or this) host. Port can be a number or a named service
#
my($OtherHostname, $Port) = @_;
my($OurHostname, $sockaddr, $proto, $type, $len,
  $ThisAddr, $that, $OtherHostAddr, $result);
$OurHostname = hostname;

$proto = getprotobyname('tcp');
$Port = getservbyname($Port, 'tcp') unless $Port =~ /^\d+$/;
$ThisAddr = gethostbyname($OurHostname);
$OtherHostAddr = gethostbyname($OtherHostname);
if (!defined $OtherHostAddr)
{
push (@details, $OtherHostname: cannot resolve hostname\n);
return undef
}

$that = sockaddr_in ($Port, $OtherHostAddr);

if (! ($result = socket(S, PF_INET, SOCK_STREAM, $proto)) ||
   (! ($result = connect(S, $that))) )
{
 push (@details, $OtherHostname: $!\n);  return undef;
}

select(S); $| = 1; select(STDOUT);  # set S to be un-buffered
return 1;   # success
}

sub transact # string to send, pattern to expect, error message
{
my($sendstr, $rxpattern, $errormsg) = @_;
my($rxstr);

warn DEBUG: sending data: $sendstrCR LF\n if $DEBUG;
print S $sendstr . \r\n unless

Re: Updated Clam AV monitor

2009-11-01 Thread Ed Ravin
On Sun, Nov 01, 2009 at 04:39:03PM -0500, Nathan Gibbs wrote:
 AAAHHH!
 
 Every minute run clamd.monitor against our servers.
 
 Later that day...
 A few hundred emails to our noc with the subject line
 VIRUS ALERT: Eicar-Test-Signature
...
 If' I'm going to use this code, emailing the noc every minute per server
 running clamd won't work.

Indeed.  It all depends on what you want to do - in my opinion, an incoming
virus is hardly worth reporting if it's been identified and the email is
being quarantined.  I'd rather get email about the viruses that haven't
been ID'd and that are about to start running on the network when someone
clicks on them :-(.

Since VirusEvent accepts a command line, you can replace the command
you have there now with a script that filters out the Eicar-Test-Signature
before sending any mail.  You could also not bother with VirusEvent and
look at the syslogs at the end of the day to see what clamd's been up
to.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Updated Clam AV monitor

2009-10-31 Thread Ed Ravin
Sorry, I should have posted the clamd.monitor used at my shop.

The one from http://www.cmpublishers.com/oss/ checks the TCP
banner, complains if the socket isn't answered or if you're running
an outdated clamd (the latter a nice feature which is not in the
one I've been using).

However, the clamd monitor attached to this message goes through
the steps to actually submit a piece of email for virus scanning,
and uses the EICAR fake virus to test whether clamd is actually
going through the message.  That goes a bit deeper into the internals
and might turn up problems that a simple socket open/close wouldn't.

We use a similar monitor for SpamAssassin that uses the corresponding
fake spam signature to test whether spamd is checking messages - if
anyone's interested, let me know.

-- Ed
#!/usr/local/bin/perl5.6.1

# clamd.monitor - make sure clamd recognizes the EICAR test virus

# Written by Jed Davis.  Released to public (license is GPL) courtesy of
# PANIX Public Access Networks, http://www.panix.com

require 5.006;
use strict;
use Getopt::Std;
use ClamAV::Client;
use IO::String;

my $usage = clamd.monitor [-d] [-p port] [-t timeout] host [host...]\n;
our ($opt_t, $opt_p, $opt_d);
getopts(p:t:d) || die $usage;
my $tcpport = $opt_p || 9001;
my $timeout = $opt_t || 30;
my $debugp = $opt_d;

# Standard test virus - broken up into two lines to avoid triggering
# anti-virus systems (cough, cough)
my $virus = 'x5o...@ap[4\pzx54(P^)7CC)7}$EICAR-STANDARD-' .
'ANTIVIRUS-TEST-FILE!$H+H*';

my (@failures);
for my $host (@ARGV) {
my $result = undef;
eval {
alarm $timeout;
$SIG{ALRM} = sub { die Timeout ($timeout seconds)\n };
my $scanner = ClamAV::Client-new(
socket_host = $host,
socket_port = $tcpport);
$result = $scanner-scan_stream(IO::String-new($virus));
print STDERR DEBUG: $host: $result\n if $debugp;
};
if ($@) {
chomp $@;
$@ =~ s/^(Could not establish socket connection), tried UNIX 
domain and TCP sockets at .*/$1/;
push @failures, [$host, Exception: $@];
} elsif (!$result) {
push @failures, [$host, Responded, but failed to recognize 
test virus];
} elsif ($result ne Eicar-Test-Signature) {
push @failures, [$host, Unexpected response: $result];
}
}

print join( ,map{$$_[...@failures).\n;
print join(,map{$$_[0]: $$_[1]\n}...@failures);

exit ($#failures=0);
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: problem with syslog mon 1.2.0

2009-01-16 Thread Ed Ravin
On Fri, Jan 16, 2009 at 08:36:22AM -0500, Jim Trocki wrote:
 On Thu, 15 Jan 2009, Tom Lieuallen wrote:

 A while ago, I upgraded from mon-0.99.3-47 to mon-1.2.0.  I believe that
 was the time when I stopped getting syslog output from mon.

 try the attached patch.

 there's a wrapper func for syslog which is meant to catch an exception
 if syslog throws one.

The wrapper is for the Perl Sys::Syslog module, which unwisely throws
an exception if it can't reach the syslog daemon.

Recent versions of that syslog module include a nofatal option when
opening up a syslog connection.  We could switch to that and make
Sys-Syslog-0.15 a prerequisite for installing Mon.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: CUPS monitor?

2008-07-09 Thread Ed Ravin
On Wed, Jul 09, 2008 at 03:02:39PM -0700, Don Forbes wrote:
 I'm looking for a monitor to test our CUPS server for problems;
 disabled printers (printer state)... the usual statistics here.

Try using one of the http monitors with a URL that lists out the
printer state, and look for a match (or non-match) for the strings
you're looking for.

If you're going to roll your own, I see that there's a Net::Cups
module on CPAN.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: bash-based snmpdiskspace.monitor

2008-05-08 Thread Ed Ravin
On Thu, May 08, 2008 at 12:22:40PM -0400, Jonathan Baxter wrote:
 I recently installed mon on some opensuse 10.3 machines, but could not 
 get the snmpdiskspace.monitor to work. I am not a perl guy, but it 
 seems the information coming back from the perl Net::SNMP module is 
 not correct, in that it assumes all fields are contiguous when they 
 are not.

You're almost right on the money.  snmpdiskspace.monitor isn't using
Net::SNMP, it's using G.S. Marzot's SNMP.pm, which is substantially older 
and less wiser.  But that doesn't matter - the problem is as you surmised,
snmpdiskspace.monitor, which assumes there aren't any discontinuities
in the MIB and fails miserably when there are.  I don't think this is
the fault of the SNMP library, it's only doing what it's being asked
to do.

I'm going to try to redo snmpdiskspace.monitor's SNMP fetching so it
will work with discontinuous tables.

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitoring for a hung NFS mount?

2008-04-02 Thread Ed Ravin
On Wed, Apr 02, 2008 at 10:49:00AM -0700, Augie Schwer wrote:
 On the topic of NFS; the next step would be to do a compare between
 mtab and fstab and alert if everything you thought was mounted
 actually wasn't; seems pretty trivial, but anyone already have
 something written up?

No, but remember that the location and semantics of mount tables varies
drastically with the operating system - Solaris, for example (and IIRC),
keeps the mount table in-kernel, and you need to call an API to see what's
mounted.  The equivalent of mtab is actually a device driver that calls
the API, not a regular file.  So don't hard code any paths and use
test -e (existence), not test -f (exists and is a regular file) when
scripting in the sanity checks.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Putting a server into maintance

2008-03-12 Thread Ed Ravin
On Wed, Mar 12, 2008 at 09:44:59AM -0600, Michael Osburn wrote:
  How are most of you managing planned system 
  downtime?

In most cases, our engineers log into Mon and use the host disable
or service disable to stop montoring the stuff that's about to go
down, and re-enable them when the maintenance is over.

Sometimes, we just ACK whatever's broken when Mon starts alarming.

If I had a really big planned outage I would comment out big chunks of
the config file and restore it after the window.

 Or am I missing a feature to have mon check it's configuration 
 file and reload if it changes?

You are - look up reset Mon in the CGI or the API.  You can also
send Mon a HUP signal to make it reload its config.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon not reporting on localhost

2007-12-06 Thread Ed Ravin
On Thu, Dec 06, 2007 at 04:18:01PM -0700, Osburn, Michael wrote:
...
### group definitions (hostnames or IP addresses)
hostgroup servers 127.0.0.1
watch servers
service http
interval 5m
monitor http.monitor
period wd {Mon-Fri} hr {7am-10pm}
alert mail.alert [EMAIL PROTECTED]
alertevery 1m
period wd {Sat-Sun}
alert mail.alert [EMAIL PROTECTED]
...


You need a blank line after every hostgroup definition, as described
in the mon man page:

 Hostgroup Entries
 Hostgroup entries begin with the keyword hostgroup, and are followed by
 a hostgroup tag and one or more hostnames or IP addresses, separated by
 whitespace.  The hostgroup tag must be composed of alphanumeric charac-
 ters, a dash (-), a period (.), or an underscore  (_).  Non-blank
 lines  following the first hostgroup line are interpreted as more host-
 names.  The hostgroup definition ends with a blank line. For example:

hostgroup servers nameserver smtpserver nntpserver
 nfsserver httpserver smbserver

hostgroup router_group cisco7000 agsplus


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: mon project

2007-08-26 Thread Ed Ravin
On Sat, Aug 25, 2007 at 11:55:10AM -0400, Allan Wind wrote:
 On 2007-08-25T06:15:31-0700, Augie Schwer wrote:
  I thought the same way you did a few months ago, that the Mon project
  was dead, but it's not, it's just not very visibly alive. ;)
 
 This is bad when someone evaluate this project.  Can we setup a cvs 
 mailing list that gets the patches, or some type of status report 
 summarizing changes over the last month?

I believe the mon-devel list receives copies of all patches.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitor for SSL Certificate expiration date

2007-07-16 Thread Ed Ravin
On Mon, Jul 16, 2007 at 10:41:15AM -0500, Owen Crow wrote:
 I've seen some tests mentioned in this list, but they point to broken links.
 
 It seems like this can be done with the openssl command line, but I
 can only get certificate date information _after_ the certificate
 expires.  If anyone knows how to extract an SSL certificate's
 expiration date remotely, I'd be happy to convert that into a monitor
 script.

Yes, I've wanted to do this for a long time.  You just inspired me to
read the man pages and it looks pretty straightforward to use the
openssl command line:

   # download the certificate:
   openssl s_client -connect server.example.com:443  /dev/null   testme.pem

   # print out the expiration date:
   openssl x509 -noout -in testme.pem  -enddate

The output showing the expiration date looks like this:

   notAfter=Nov  3 18:58:34 1999 GMT

Which should be easy to feed to Date::Parse::str2time() to turn into a ctime.

 I'm primarily interested in HTTPS, but it seems like this would be
 generic for any SSL/TLS-protected service.

The openssl command line man page says it also supports SMTP and POP
protocol for downloading certificates:

  openssl s_client -connect mail.example.com:25 -starttls smtp  /dev/null  
testme.pem

Or -starttls pop3 for a POP server.  No IMAP support, unfortunately.

Here's a possible starting point:

   sslcert.monitor [--protocol {https|smtp|pop3}] [--port NNN]
   [--expirewarn NN] host [...]

Where the port number defaults to 443, and expirewarn defaults to 30 days
(i.e. alarm if the server's certificate expiration date is within 30 days).

Later on we could add bells and whistles to check the verification chain,
warn on self-signed certs, 

If you start the script I'll help you finish it.  I suggest writing it
in Perl since I know it'll have no problem parsing the expiration date
output.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Monitor for SSL Certificate expiration date

2007-07-16 Thread Ed Ravin
On Mon, Jul 16, 2007 at 07:14:38PM +0200, Jan-Frode Myklebust wrote:
 On 2007-07-16, Owen Crow [EMAIL PROTECTED] wrote:
 
  It seems like this can be done with the openssl command line, but I
  can only get certificate date information _after_ the certificate
  expires.  If anyone knows how to extract an SSL certificate's
  expiration date remotely, I'd be happy to convert that into a monitor
  script.
 
 
 Thanks for the offer, I could use something like that :-)
 
 $ echo  | openssl s_client -connect mail.altibox.no:443 2/dev/null | sed 
 -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' |openssl x509 -text|grep 
 Not After :

No need to parse out the certificate with sed - as implied in my previous
message, openssl seems to be able to ignore the non-certificate portions
of the file:

openssl s_client -connect www.example.com:443 2/dev/null /dev/null | 
openssl x509 -noout -enddate

But if I was scripting this, I would call the two openssl commands
separately and save the output to a file, so that I could detect failures
more reliably...

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


mon.cgi: patch to untaint monitor output

2007-07-12 Thread Ed Ravin
One of my custom monitors was printing the output of a syslog entry 
as its summary output.  The syslog entry was from a mail program,
so it had stuff like to=[EMAIL PROTECTED].  But in mon.cgi's
output in my Web browser, it just said to=.

This is because mon.cgi is just dropping the output into a web page,
where the browser parses it for HTML.  Years ago we added a call
to HTML::Entities:encode_entities to mon.cgi so that messages typed
in with the ACK command would not get confused with HTML - the attached
patch extends that functionality to monitor output.

There are security implications here - if an outside party could get
control of the output of a Mon script (easy in my case, since the data
comes from syslog and includes error messages from a remote host), that
outside party could cobble together some HTML that eventually gets
executed in the Mon user's browser (i.e. cross-site scripting).

The attached patch renames the untaint_ack_msgs parameter to
untaint_all_msgs and, when set, not only untaints ACK messages,
but also untaints last_summary and last_detail output before displaying
it.

I recommend that we remove this parameter completely and always untaint
messages before displaying them in the CGI interface - it's the right
thing to do.  The cost is one more Perl module dependency, but there
are already a host of Perl modules needed to run mon and one more is
not going to hurt much.

I haven't worked on this code for several years so I may have missed
something - an extra pair of eyes on this patch would be appreciated.

-- Ed
--- mon.cgi.pl  2005-04-21 16:56:29.0 -0400
+++ mon2.cgi.pl 2007-07-12 10:32:59.0 -0400
@@ -143,7 +143,7 @@
$monhost_and_port_args $monhost_and_port_args_meta
$has_read_config $moncgi_config_file $cf_file_mtime
$mon11
-   $untaint_ack_msgs @show_watch @no_watch $show_watch_strict
+   $untaint_all_msgs @show_watch @no_watch $show_watch_strict
$required_mon_client_version);
 # Formatting-related global vars
 use vars qw($BGCOLOR $TEXTCOLOR $LINKCOLOR $VLINKCOLOR 
@@ -224,7 +224,7 @@
 $cookie_name = mon-cookie;   #name of cookie given to browser 
for auth
 $cookie_path = /;  # path for auth cookie
  # Set this to  for auto-path set
-$untaint_ack_msgs = yes;   # Use HTML::Entities to scrub 
user-supplied ack messages (recommended!)
+$untaint_all_msgs = yes;   # Use HTML::Entities to scrub 
user-supplied ack messages (recommended!)
 # Define optional regexes in the @show_watch variable,
 # and only hostgroups which match one of these regexes
 # will be shown.
@@ -367,10 +367,10 @@
 #
 # Used to escape HTML in ack's
 #
-if ($untaint_ack_msgs =~ /^y(es)?$/i) {
+if ($untaint_all_msgs =~ /^y(es)?$/i) {
 eval use HTML::Entities ;
 } else {
-undef $untaint_ack_msgs;
+undef $untaint_all_msgs;
 }
 
 
@@ -1104,7 +1104,7 @@
# user requested it, otherwise, just pass it on through 
 # as is.
if ( $op{$group}{$service}{'ack'} != 0 ) {
-   if ($untaint_ack_msgs) {
+   if ($untaint_all_msgs) {
#
# We untaint
#
@@ -1186,7 +1186,8 @@
$webpage-print(td align=left bgcolor=\$td_bg_color\);
$webpage-print($service_disabled_stringa 
href=\$url?${monhost_and_port_args}command=svc_detailsamp;args=$group,$service\);
$webpage-print(font 
size=+1b${service}/b/font/a${desc_string} : \n);
-   $webpage-print(font size=+1$s-{last_summary}/font\n);
+   $webpage-print(font size=+1 .
+   $untaint_all_msgs ?  
HTML::Entities::encode_entities($s-{last_summary}) : $s-{last_summary} . 
/font\n);
$webpage-print(br($failure_string));
$webpage-print( ${service_acked_string}) if 
$service_acked_string ne ;
$webpage-print(/td\n);
@@ -1642,9 +1643,18 @@
$webpage-print(/td/tr);
 
# Now print the detail and summary information for the failed service
-   $op{$group}-{$service}-{'last_summary'} = lt;not specifiedgt; if 
$op{$group}-{$service}-{'last_summary'} eq  ;
-   $op{$group}-{$service}-{'last_detail'} = lt;not specifiedgt; if 
$op{$group}-{$service}-{'last_detail'} eq  ;
-   $op{$group}-{$service}-{'last_detail'} =~ s/\n/BR/g;
+   if ($op{$group}-{$service}-{'last_summary'} eq ) {
+   $op{$group}-{$service}-{'last_summary'} = lt;not 
specifiedgt;;
+   } elsif ($untaint_all_msgs) {
+   $op{$group}-{$service}-{'last_summary'} = 
HTML::Entities::encode_entities($op{$group}{$service}{'last_summary'})
+   }
+   if ($op{$group}-{$service}-{'last_detail'} eq  ) {
+   $op{$group}-{$service}-{'last_detail'} = lt;not 
specifiedgt;
+   } elsif ($untaint_all_msgs) {
+   

rbl.monitor - warn if mailservers are in a blacklist

2007-07-03 Thread Ed Ravin
I've rewritten the prototype rbl.monitor that was submitted by Tim
Hanes a while back.  This version uses asynchronous DNS requests
(using Net::DNS), and allows for an external list of the RBL zones
to check.  It also has a master timeout in case it gets stuck on
any of the DNS queries.

Please let me know what you think of it.  I haven't put this in
Sourceforge yet, I want to make sure it runs in at least one other
environment besides mine.

-- Ed

#!/usr/bin/perl

# rbl.monitor - check RBL blacklists for an IP address.  Uses asynch I/O
# to send all the requests simultaneously

my $usage=\
Usage: rbl.monitor [options] hostname [...]

Options [and default values]:

--listfile list of RBL domains [preset list, see script]
--rbllist comma separated list of RBL domains
--timeout  master timeout  [60 seconds]
--debug  [off]
;


use strict;

use Net::DNS;
use IO::Select;
use Getopt::Long;

my %opt;
GetOptions(\%opt,
listfile=s,
rbllist=s,
timeout=i,
debug,
) or die $usage;

my $listfile= $opt{listfile} || ;
my $rbllist= $opt{rbllist} || ;
my $selecttimeout = 5;
my $timeout= ($opt{timeout} || 60) + ($selecttimeout * 2);
my $debug= $opt{debug} || 0;


# Default RBLs to check - just a few of the lists most likely to block mail
# Sites with specific needs should customize via the command line
my @rbls2check=(
bl.spamcop.net,
relays.mail-abuse.org, 
zen.spamhaus.org,
dnsbl.sorbs.net,
dnsbl-1.uceprotect.net,
);

if ($listfile) {
open(LIST,  $listfile) ||
die $0: cannot open list file \$listfile\: $!\n;
@rbls2check= grep !/^\s*#/, LIST;
@rbls2check= grep !/^\s*$/, @rbls2check;
map {chomp} @rbls2check;
close LIST;
die $0: no RBL names found in \$listfile\\n unless @rbls2check;
}

if ($rbllist) {
@rbls2check= split(',', $rbllist);
}

print *** checking these RBLs:\n. join(\n   , @rbls2check) . \n
if $debug;

my (@summary, @detail);
my @sockets;


my $res  = Net::DNS::Resolver-new;
my $sel  = IO::Select-new();
my $starttime= time;

my %hostpart2host;

# gethostbyname is non-reentrant, so do all the queries up front
foreach my $host (@ARGV) {
my $hostdata= gethostbyname($host);
if (!defined($hostdata)) {
push @summary, $host;
push @detail, $host: bad hostname;
next;
}
my $hostpart= join(., reverse(unpack(C4, $hostdata)));
$hostpart2host{$hostpart}= $host;
}

# start all the queries
foreach my $hostpart (keys %hostpart2host) {
foreach my $rbl (@rbls2check) {
my $dnssock=  $res-bgsend(join(., $hostpart, $rbl));
push @sockets, $dnssock;
$sel-add($dnssock);
}
}

MAINLOOP:
while ($sel-handles  0) {
my @ready = $sel-can_read($selecttimeout);
if ( (time - $starttime)  $timeout) { # waited too long?
push @detail, TIMEOUT:  . scalar($sel-handles) .  responses 
still pending;
last MAINLOOP;
}
foreach my $sock (@ready) {
my ($authority, $ipaddress, $hostpart, $host);
my $packet = $res-bgread($sock);
foreach my $rr ($packet-answer) {
if ($rr-type eq A) {
$ipaddress= $rr-address;
$authority= $rr-name;
if ($authority=~ /^(\d+\.\d+\.\d+\.\d+)\./) {
$hostpart= $1;
$host= $hostpart2host{$hostpart};
} else { $host= ??? }
push @summary, $host
unless grep /^$host$/, @summary;
push @detail, $host: $authority:  . 
$rr-address;
}
}
$sel-remove($sock);
}
}

print join( , (sort @summary)) if (@summary);
print \n;

print join(\n, (sort @detail)), \n  if @detail;

exit 1 if @summary;
exit 0;
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon not logging to Syslog

2007-06-19 Thread Ed Ravin
On Mon, Jun 18, 2007 at 04:31:33PM +0200, Rados??aw Koz??owski wrote:
 I'm running mon r1.22 on RHEL 4 with perl 5.8.5 and the following line:
 
   my @log = map { s/\%//mg; } @_;
 
 in
 
 no warnings; # Redefining syslog
 sub syslog {
eval {
local $SIG{__DIE__}= sub { }; 
my @log = map { s/\%//mg; } @_;
Sys::Syslog::syslog(@log);
}
 }
 use warnings;
 
 breaks logging to syslogd for me (ie. nothing is being logged). If I
 pass @_ to Sys::Syslog::syslog directly, everything works fine.
 
 What's the idea behind this substitution, why is it needed?

I just realized, I didn't fully answer your question.

There are two things happening in Mon's local definition of syslog().

One, as I mentioned earlier, was wrapping it with an eval block so
if Sys::Syslog decided to call die() it wouldn't kill the Mon server.

The other is a workaround for a possible syslog vulnerability - syslog
has printf-like processing with % arguments, but Mon doesn't use
that feature.  If somehow Mon syslogged something with % signs in it,
you might see garbage in the log, or worse yet, a Perl error or crash.
There was a Perl advisory on this a year or two ago, so the
my @log = map { s/\%//mg; } @_; was added to delete any % signs that
may have ended up in strings intended for syslog.

If that map {} statement is giving you trouble, you could try replacing
these two lines:
 my @log = map { s/\%//mg; } @_;
 Sys::Syslog::syslog(@log);
with:

  Sys::Syslog::syslog($_[0], %s, $_[1])

Which should work since as long as all the Mon syslog statements in the
current release use only two arguments (priority, string) to syslog()
(which I believe they do, but haven't the time to check at the moment).

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon not logging to Syslog

2007-06-18 Thread Ed Ravin
On Mon, Jun 18, 2007 at 04:31:33PM +0200, Rados??aw Koz??owski wrote:
 I'm running mon r1.22 on RHEL 4 with perl 5.8.5 and the following line:
 
   my @log = map { s/\%//mg; } @_;
 
 in
 
 no warnings; # Redefining syslog
 sub syslog {
eval {
local $SIG{__DIE__}= sub { }; 
my @log = map { s/\%//mg; } @_;
Sys::Syslog::syslog(@log);
}
 }
 use warnings;
 
 breaks logging to syslogd for me (ie. nothing is being logged). If I
 pass @_ to Sys::Syslog::syslog directly, everything works fine.
 
 What's the idea behind this substitution, why is it needed?

Older versions of the Sys:Syslog module had an interesting feature -
they would crash the calling Perl program if your syslog server was
not running.  As of Sys::Syslog-0.15, you can supply an option nofatal
when calling openlog(), which will only cause a warning to be emitted
rather than crashing your program.

The workaround above was written before the maintainers of Sys::Syslog
agreed to do anything about the problem - originally when I filed
a bug report they refused to accept it, claiming that it wasn't erroneous
behavior and that to change it now might break someone else's code.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


mon disable/enable bug when one host is in two groups?

2007-05-18 Thread Ed Ravin
I have something like this in my Mon config:

hostgroup misc_hosts fnord

hostgroup herd_hosts fnord word bord lord dord

When I try to disable fnord in mon.cgi - it doesn't get disabled -
instead the hostgroup misc_hosts gets disabled.  I think this code
is to blame:

 2829 foreach my $h (@hosts)
 2830 {
 2831 if (my $g=host_singleton_group($h) ) {
 2832 disen_watch($g, 0);
 2833 $stchanged++;
 2834 mysystem($CF{MONREMOTE} disable watch $g) if ($CF{MONREMOTE});
 2835 } else {
 2836 disen_host ($h, 0);
 2837 $stchanged++;
 2838 mysystem($CF{MONREMOTE} disable host $h) if ($CF{MONREMOTE});
 2839 }
 2840 }
 2841 sock_write ($fh, 220 disable host completed\n);
 2842 }

Note how if a host is in a singleton group that watch gets disabled,
and then the code assumes there's nothing more to do, even though the
host might be in another group as well.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Leave alert's ack'd through restart

2007-03-07 Thread Ed Ravin
On Thu, Mar 08, 2007 at 01:40:04PM +1030, Ben Ragg wrote:
 We've noticed when we reread the config files that our ack'd devices 
 become unack'd. Is there a way to avoid this?

Assuming you're using one of the recent versions of mon, your
man page should mention this command-line option:

  -l statetype
  Load state from the last saved state file. The  supported saved
  state  types  are  disabled  for disabled watches, services, and
  hosts, opstatus for failure/alert/ack status  of  all services,
  and  all  for  both.   If  no statetype is provided, disabled is
  assumed.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: fping.monitor output problems

2006-12-20 Thread Ed Ravin
On Tue, Dec 13, 2005 at 04:53:21PM +0100, Hans Kinwel wrote:
 
 I finally went to the bottom of this.  Not that it is rocket science.
 
 When I do fping 1.2.3.4 I get
 
 ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4
 ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4
 ICMP Host Unreachable from 194.178.10.133 for ICMP Echo sent to 1.2.3.4
 1.2.3.4 is unreachable
 
 In the (new, broken) fping.monitor I see:
 
if (/^(\S+).*unreachable/i)
{
push (@unreachable, $1);
}
...
 It is evident now.  Some well intending person (probably Jim, from
 the RCSid) added a /i and now that string matches with the ICMP Host
 Unreachable, to which it is not supposed to match. It is supposed later
 to match with a do nothing clause that is indeed the right thing to do.
 
 So if somebody would be so kind to remove that /i I will be much obliged.

I just got around to dealing with fping.monitor on my site.  I removed
the /i, got rid of my dumb idea to redirect stdout to /dev/null (it
loses the address not found error messages), and rearranged how
fping.monitor treats unidentified output, with this simple patch:

--- fping.monitor   2006/12/20 22:46:26 1.3
+++ fping.monitor   2006/12/20 23:07:36
@@ -63,6 +63,7 @@
die could not open pipe to fping: $!\n;
 
 my @unreachable;
+my @unidentified;
 my @alive;
 my @addr_not_found;
 my @slow;
@@ -122,7 +123,7 @@
 
 else
 {
-   print STDERR unidentified output from fping: [$_]\n;
+   push @unidentified, $_;
 }
 }
 
@@ -216,6 +217,17 @@
 }
 }
 
+if (@unidentified != 0)
+{
+print EOF;
+
+--
+unidentified output from fping
+--
+EOF
+   print join(\n, @unidentified), \n;
+}
+
 #
 # traceroute
 #

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: netsnmp-freespace.monitor - No log handling enabled

2006-11-29 Thread Ed Ravin
Did you upgrade or downgrade the net-snmp libraries?  Googling for
No log handling enabled - turning on stderr logging suggests that
there is a Perl / NetSNMP library mismatch.

Another possibility is that someone installed another copy of net-snmp
/ ucd-snmp in a directory in the library path, or the library path
was changed and points to the wrong place.

On Wed, Nov 29, 2006 at 09:35:27AM -0700, Jeff Montagna wrote:
 Anyone have an idea of why logging would suddenly be disabled?  
 netsnmp-freespace.monitor was working fine yesterday then broke today.
 
 [EMAIL PROTECTED]:mon.d$ ./netsnmp-freespace.monitor sts1
 No log handling enabled - turning on stderr logging
 truncating integer value to 32 bits
 truncating integer value to 32 bits
 truncating integer value to 32 bits
 [EMAIL PROTECTED]:mon.d$ ./netsnmp-freespace.monitor ltxnic7
 No log handling enabled - turning on stderr logging
 truncating integer value to 32 bits
 ltxnic7
 
 ltxnic7:/(/dev/dsk/c0t0d0s0) total=7468 used=6002(89%) free=718  
 err=/: less than 90 free (= 736195)

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Contrib: RBL lookup monitor

2006-10-13 Thread Ed Ravin
On Fri, Oct 13, 2006 at 10:28:06AM +0100, Tim Haynes wrote:
 I've implemented an RBL monitor for work - checks to see if hosts are  
 listed in a blacklist, so I thought I'd contribute it back to mon -  
 see attached. (Work have approved its release under the GPL.)
 
 Usage: rbl.monitor host [...host...]
 
 Bugs: it would be more elegant if the list of RBL domains were a  
 parameter; as it is, it's obvious what to change in the script.

Thanks, I've wanted one of these for a while, can't wait to try it out!

Looking over the code, I have a couple of questions - you don't seem to
set server timeouts anywhere, what if a blacklist isn't responding?
Sometimes DNS queries can hang for 30 seconds or more, we don't want
that to bog down the monitoring script.  Of course, that would probably
require using Net::DNS and fine-tuning the lookups.

Have you seen the blacklist checker at: http://www.dnsstuff.com/
(center column, Spam database lookup)?  I've been using that from
time to time to see if any of my mail servers are in the hall of fame.
They check a whopping 271 blacklists, and we've found our servers
caught every now and then by some of the more obscure lists.

I hope to try out your script in the next few days.  I will probably
be unable to refrain from adding features to it - besides the
timeout stuff mentioned above, I'd like the option to load the
blacklists from an external file - no way to put 271 blacklists
on the command line or into the script!


-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Getting 20 instead of spaces

2006-07-26 Thread Ed Ravin
On Wed, Jul 26, 2006 at 03:52:55PM -0500, Tim Carr wrote:
 OK, I checked my Client.pm module, and found this as a version:
 
 #
 # Perl module for interacting with a mon server
 #
 # $Id: Client.pm 1.4 Thu, 11 Jan 2001 08:42:17 -0800 trockij $

That looks kind of old.  You want to use this one:

  # $Id: Client.pm,v 1.1.1.1 2004/06/18 14:25:16 trockij Exp $

 I've done a perl -MCPAN -e shell and tried to install Mon::Client, and
 it said I was running the latest and greatest.
 
 Is there a later version somewhere?

Yes, probably the same place you got the new mon.  It should also be on
Sourceforge.  My version is mon-client-1-0-0pre2 .

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


New version of monfailures client

2006-05-22 Thread Ed Ravin
On Tue, May 16, 2006 at 02:46:54PM -0400, Ed Ravin wrote:
 I need to automate the kick something when it falls over stage of
 system management.  Mon is the way we detect that things have fallen
 over, but the host Mon runs on is not the host that has the privileges
 to kick things.  So here's my question:
 
 Has anyone built a Mon client that can make decisions (or invoke scripts)
 based on the status of a particular service?  I can cook something up
 if needed, but thought it would be wise to see what other people are doing.
 I think what I need is a client that will return a non-zero exit status
 if a particular watch/service is down for N seconds and not acked.

Didn't get any responses.  I ended up updating the monfailures client
to give it a few new features to dump out the individual fields in
Mon's entry for the service, and to control listing based on the value
of a field.  A bit primitive, but useful for when you need simple
information that would otherwise require putting the Mon API interface
into some other script.  The new version of monfailures is attached.
I also improved the -include and -exclude features to work in more cases,
and to work for service names as well as watch names, and added a perldoc
man page.

-- Ed

#!/usr/local/bin/perl5.6.1 -w

# Quickly show Mon failure status from command line.

# to configure, hard-code the user and password for either
# your public Mon username or a username that is only allowed
# to use the list command and nothing else.  I run this
# script out of inetd on the mon server so the people who can
# see its results can't read the script (and see the hard-coded
# password).

# use --exclude or --include (or set their default values in the
# script) to exclude or include only particular regexp matches of
# watches.

# other features (-fields, -match) for getting out more data
# or for testing for failed services via command line

# Written by Ed Ravin [EMAIL PROTECTED] Jan  2002.
# made available to the public by courtesy of PANIX (http://www.panix.com).
# This script is licensed under the GPL.

# Updated May 2006 with field control and other features


# $Header: /devel/build/NetBSD/mon/mon-1.1-devel/mon/clients/RCS/monfailures,v 
1.8 2006/05/20 01:23:52 root Exp $

use strict;


my %opt;
use Getopt::Long;

my $usage=Usage: monfailures [--server host] [--port port] [--user user] 
[--password pw] [--timeout n] [--include watch-regexp] [--exclude watch-regexp] 
[--fields {ALL|f1,f2,...}] [--testfield 'fieldname op value']\n;

die $usage unless
  GetOptions (\%opt, debug,  testfield=s, fields=s, server=s, port=s, 
 timeout=i, user=s, password=s, include=s, exclude=s);

  configurable stuff  - or put in defaults file
my $defaults_file= /etc/mon/monfailures.cf;

my $default_user=public;
my $default_password= readonly;
my $default_server= localhost;
my $default_timeout= 120;

my $default_include= .*;
my $default_exclude= ;
 


my $debug= $opt{'debug'} || 0; 

my @fields= ();
if (exists($opt{'fields'}))
{
@fields= split ',' , $opt{'fields'};
}

my $teststr= $opt{'testfield'} || ;
my ($testfield, $testop, $testval)= (, , );
if (length($teststr))
{
($testfield, $testop, $testval)= split(' ', $teststr);
warn testfield=$testfield, testop=$testop, testval=$testval\n if 
$debug;

die $0: illegal characters in --testfield option\n
if $testfield =~ /[`'$ ]/;

die $0: illegal fieldname in --testfield option\n
unless $testfield =~ /^\w+$/;

die $0: illegal test operator in --testfield option\n
unless ($testop eq + or $testop eq - or $testop eq ==
or $testop eq != or $testop eq  or $testop eq );

die $0: illegal integer value in --testfield option\n unless
$testval =~ /^-?\d+$/;
}

my (%failures, %disabled);
my ($now);

use Mon::Client;

# format of defaults file:
#  keyword = VALUE (no spaces allowed in VALUE)
#  leading # sign for comments
#  valid keywords: user, password, server, include, exclude, timeout

if (-f $defaults_file)
{
if ( open(DEF, $defaults_file))
{
my @defaults= DEF;
close DEF;

foreach $_ (@defaults)
{
next if /^\s*#/;
next if /^$/;
$default_user= $1 if /^\s*user\s*=\s*(\S+)/;
$default_password= $1 if /^\s*password\s*=\s*(\S+)/;
$default_server= $1 if   /^\s*server\s*=\s*(\S+)/;
$default_include= $1 if   /^\s*include\s*=\s*(\S+)/;
$default_exclude= $1 if   /^\s*exclude\s*=\s*(\S+)/;
$default_timeout= $1 if   /^\s*exclude\s*=\s*(\S+)/;
}
}
else
{
warn monfailures: cannot open defaults file $defaults_file: 
$!\n;
}
}

my

Re: 'Sunnyvale, You Have a Problem'

2006-05-18 Thread Ed Ravin
On Thu, May 18, 2006 at 06:53:38PM -0400, Jim Trocki wrote:
 Think of a worldwide network of mon servers

And think of your job, and mine, reduced to waiting around for someone
in India to call us and tell us to reboot the server...

And then think of all those idle network engineers in India, after
their jobs move to China or the Philipines or whichever country is
next in the race to undercut high labor costs...

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Using mon to kick things?

2006-05-16 Thread Ed Ravin
I need to automate the kick something when it falls over stage of
system management.  Mon is the way we detect that things have fallen
over, but the host Mon runs on is not the host that has the privileges
to kick things.  So here's my question:

Has anyone built a Mon client that can make decisions (or invoke scripts)
based on the status of a particular service?  I can cook something up
if needed, but thought it would be wise to see what other people are doing.
I think what I need is a client that will return a non-zero exit status
if a particular watch/service is down for N seconds and not acked.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Updates to snmpdiskspace.monitor

2006-03-20 Thread Ed Ravin
 an SNMP timeout.
#Default is retry 5 times.
#
#  --timeout Seconds to wait before declaring a timeout on an SNMP get.
#Default is 20 seconds.
#
#  --freeThe default minimum free space, in a percentage or absolute
#quantity, as per the config file. Thus, arguments of, for
#example, 20%, 1gb, 50mb are all valid.
#Default is 5% free on every partition checked.
#
#  --ifree   The default minimum free inode percentage, specified as
#a percentage.  Default is 5% free.
#
#  --listGive a verbose listing of all partitions checked on all 
#specified hosts.
#
#  --listall like --list, but also lists the thresholds defined for
#each filesystem, so you can doublecheck the config file
#
#  --usemib  Choose which MIB to use: one or more of host, perf, ucd
#Default tries all three, in that order
#
#  --ucddiskmatch regexp to specify which disk devices in the UCD MIB should
#be considered as physical disks.
#
#  --disktypematch Specify (by the last component of the OID, a decimal
#number) the disk types to be monitored when using the MS
#Perf MIB or the HOST MIB.
#
#  --debug   enable debug output for config file parsing, MIB fetching,
#and which filesystems are selected for monitoring.
#
#
# EXIT STATUS
#  Exit status is as follows:
#0 No problems detected.
#1 Free space on any host was below the supplied parameter.
#2 A soft error occurred, either a SNMP library error, 
#  or could not get a response from the server. 
#
#  In the case where both a soft error and a freespace violation are
#  detected, exit status is 1.
#
# BUGS
# When using the net-snmp agent, you must build it with --with-dummy-values
# or the monitor may not parse the Host Resources MIB properly.
#
# The disk types for the HOST/Perf MIBs should be configured in English
# rather than exposing the OIDs to the end user.
#
#
# NOTES
# $Id: snmpdiskspace.monitor,v 1.5 2005/01/13 23:40:35 root Exp root $

#  * disk types now configurable via command line.  Added SNMP
#  version, updated comments. [EMAIL PROTECTED] March 2006
#
#  * Added support for inode status via UCD-SNMP MIB.  Fourth column in config
#  file (optional) is for inode%.
#  * added --debug and --usemib options.  Latter needed so you can force use
#  of UCD mib if you want inode status.
#  * rearranged the error messages to be more Mon-like (hostname first)
#  * added code to synchronize instance numbers when using UCD MIB.  This
#  could solve the sparse MIB problem usually fixed by the
#  --with-dummy-values option in net-snmp if needed for other agents
#  Ed Ravin ([EMAIL PROTECTED]), January 2005
#
#  Added support for regex hostnames and partition names in the config file,
#  'use strict' by andrew ryan [EMAIL PROTECTED].
#
#  Generalised to handle multible mibs by jens persson [EMAIL PROTECTED]
#  Changes Copyright (C) 2000, jens persson
#
#  Modified for use with UCD-SNMP by Johannes Walch for 
#  NWE GmbH ([EMAIL PROTECTED])
#
#  Support for UCD's disk MIB added by Matt Simonsen [EMAIL PROTECTED]
#
#
# SEE ALSO
#  mon: http://www.kernel.org/software/mon/
#
#  This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
#  module. (http://ucd-snmp.ucdavis.edu and CPAN, respectively).
#
#  The Empire SystemEdge SNMP agent: http://www.empire.com
#
#
# COPYRIGHT
#
#Copyright (C) 1998, Jim Trocki
#
#This program is free software; you can redistribute it and/or modify
#it under the terms of the GNU General Public License as published by
#the Free Software Foundation; either version 2 of the License, or
#(at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
use strict;
use SNMP;
use Getopt::Long;

sub readcf;
sub toBytes;
sub get_values;

# setup what mibs to use
# $ENV{MIBS} = 
'RFC1213-MIB:HOST-RESOURCES-MIB:WINDOWS-NT-PERFORMANCE:UCD-SNMP-MIB';
$ENV{MIBS} = 'RFC1213-MIB:HOST-RESOURCES-MIB:UCD-SNMP-MIB';

my %opt;

# parse the commandline
GetOptions (\%opt, community=s, timeout=i, retries=i, config=s,
list, listall, free=i, ifree=n, usemib=s, debug,
ucddiskmatch=s, disktypematch=s, version=i);

die No host arguments given!\n if (@ARGV == 0);

my $RET = 0;   #exit value of script
my @ERRS = (); # array holding detail output
my @HOSTS = ();  # array holding summary output
my @cfgfile = ();  #array holding contents of config file


# Read

Re: opt_d

2006-02-13 Thread Ed Ravin
On Mon, Feb 13, 2006 at 02:07:25PM -0600, Nate Reed wrote:
 How do I set $opt_d for my monitors/alerts?  Is there something in the mon.cf 
 file?

Yes, but -d is usually for debugging options.  If you want to debug
something, it's better to run it from the command line, as some of
the monitor will issue extra messages that will confuse your Mon displays.

What is the actual problem you're trying to solve?

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Announcing ospf.monitor (beta)

2006-02-05 Thread Ed Ravin
Attached is my first try at an OSPF monitoring script meant for use in Mon.
It's loosely based on bgp.monitor.

The script walks through your OSPF interface table, and lists all enabled
OSPF interfaces and whether you have adjacencies on them.  It complains
if it finds any interfaces that OSPF has enabled but hasn't been able
to form an adjacency.

Usage is as follows:

  ospf.monitor [--exclude regexp] [--community blah] router1 router2 ...

Where router1, router2, etc. are the routers you are polling, blah
is the SNMP community, and regexp is either a single IP address of
an interface to ignore, or a regexp in the form ip1|ip2|ip3 if you
want to ignore multiple interfaces.

I was rather surprised when I ran this script on my routers - a few of
my routers were trying to start adjacencies on dead interfaces or in
other places that they shouldn't have been.  Here's a sample output:

$ ./ospf.monitor router1
router1

router1 (ID 10.99.99.99)
router1:   Interface 10.99.37.8.0 state : down [NO ADJACENCY]
router1:   Interface 10.99.37.9.0 state : pointToPoint   
router1:   Interface 10.99.37.10.0state : pointToPoint   
router1:   Interface 10.99.37.11.0state : pointToPoint   
router1:   Interface 10.99.37.12.0state : down [NO ADJACENCY]
router1:   Interface 10.99.38.88.0state : designatedRouter
router1:   Interface 10.99.38.89.0state : loopback [NO ADJACENCY]
router1:   Interface 10.99.38.90.0state : designatedRouter
router1:   Interface 10.99.38.91.0state : designatedRouter

For this router, the two down links are lines that have been disconnected
and I forgot to tell OSPF not to listen on them anymore, and the loopback
line is a loopback interface that I've incorrectly told OSFP to listen on.

The script will also tell you about addressless interfaces that have
OSPF enabled - I need to put a bit more code in the script to identify
them (currently you just see an ifIndex without explanation).

This script is fresh out of the oven, so consider it beta quality or
worse - but I'd be pleased if folks with the right environment (Perl and
the SNMP_Session module from http://www.switch.ch/misc/leinen/snmp/perl)
could give it a spin and let me know what they think of it.  It sticks
to the RFC1253 OSPF MIB, so it should work on any router, not just
Cisco.

-- Ed
#!/usr/bin/perl
#
# Router ospf (Open Shortest Path First) monitor
# Look at each router and get the status of all OSPF neighbors.
# Issue alarm if any interfaces configured for neighbors do not
#   have a full adjacencies
# Detail log shows status of all enabled OSPF interfaces.

# Usage:
# ospf.monitor [--exclude pattern] [--community str] router1 [...]
#
# --exclude - don't alarm for IP addresses that match pattern.  Periods
# in the IP address will be escaped so that they only match periods.  Use
# [0-9] or the like if you need character class matching.  Use 'ip|ip|ip'
# to exclude multiple peers.
#
# --community - SNMPv1 community name to use.  But it's more secure
# to pass the community in via the environment variable COMMUNITY.


#
# Edit history below
# Version 0.1
# 
# By Ed Ravin [EMAIL PROTECTED]  This code is made available courtesy of
# PANIX http://www.panix.com.
# Copyright 2005, by Ed Ravin
#
# License: GNU GPL v2, see http://www.gnu.org/copyleft/gpl.html
#
# Loosely based on bgp.monitor which is:
#   Copyright 2002, by Marc Hauswirth, Safe Host SA [EMAIL PROTECTED]
#
# Some inspiration is taked from others mon monitors and from
# routerinfo.pl by Ben Buxton ([EMAIL PROTECTED]), also under GPL, see 
http://www.zipworld.com.au/~bb/linux/
# and from routerint.monitor by P. Strauss ([EMAIL PROTECTED]) and me self 
([EMAIL PROTECTED]).
#

# This script need the SNMP Session module from Simon Leinen [EMAIL PROTECTED]
#   Wich you could found under http://www.switch.ch/misc/leinen/snmp/perl/
#   It is also part of MRTG (http://people.ee.ethz.ch/~oetiker/webtools/mrtg/)

use SNMP;
use SNMP_Session;
use Getopt::Long;
use strict;

my %opt;

$opt{'community'}= undef;
$opt{'exclude'}= ;
$opt{'debug'}= undef;
my $usage=Usage: [COMMUNITY=str] ospf.monitor [--exclude regexp] [--community 
str] router [...]\n;
GetOptions(\%opt, exclude=s, community=s, debug) or die $usage;

# It's highly unlikely someone wants dots in an IP address to be treated
# as a regexp pattern, so we'll escape them to make behavior more predictable.
# If you really want to use pattern matching, use a character class like
# [0-9] instead.
$opt{'exclude'} =~ s/\./\\./g;
$opt{'exclude'}= '^(' . $opt{exclude} . ')';


## --
my $community = $opt{'community'} || $ENV{'COMMUNITY'} || public;

## --

my @failures;
my @details;

$ENV{'MIBS'}= ; # all OIDs needed are specified in script

# OID's to the SNMP elements that I want to show...
# From Cisco's MIB and RFC's
# http://sunsite.cnlab-switch.ch/ftp/doc/standard/rfc/16xx/1657
# http://www.telecomm.uh.edu/stats/rfc/BGP4-MIB.html

my %oids = ( 
SysUptime

Re: Announcing ospf.monitor (beta-1 :-)

2006-02-05 Thread Ed Ravin
On Mon, Feb 06, 2006 at 01:53:58AM -0500, Ed Ravin wrote:
 Attached is my first try at an OSPF monitoring script meant for use in Mon.
 It's loosely based on bgp.monitor.
...
 The script will also tell you about addressless interfaces that have
 OSPF enabled - I need to put a bit more code in the script to identify
 them (currently you just see an ifIndex without explanation).

I started feeling guilty about not including that feature, so here's
an updated version that properly displays addressless interfaces, and
in addition fixes the --exclude option so the monitor is actually useful.

I'll drop this into the contrib directory after I've had some time to
shake it out and get feedback.

-- Ed
#!/usr/bin/perl
#
# Router ospf (Open Shortest Path First) monitor
# Look at each router and get the status of all OSPF neighbors.
# Issue alarm if any interfaces configured for neighbors do not
#   have a full adjacencies
# Detail log shows status of all enabled OSPF interfaces.

# Usage:
# ospf.monitor [--exclude pattern] [--community str] router1 [...]
#
# --exclude - don't alarm for IP addresses that match pattern.  Periods
# in the IP address will be escaped so that they only match periods.  Use
# [0-9] or the like if you need character class matching.  Use 'ip|ip|ip'
# to exclude multiple peers.
#
# --community - SNMPv1 community name to use.  But it's more secure
# to pass the community in via the environment variable COMMUNITY.


#
# Edit history below
# Version 0.1
# 
# By Ed Ravin [EMAIL PROTECTED]  This code is made available courtesy of
# PANIX http://www.panix.com.
# Copyright 2005, by Ed Ravin
#
# License: GNU GPL v2, see http://www.gnu.org/copyleft/gpl.html
#
# Loosely based on bgp.monitor which is:
#   Copyright 2002, by Marc Hauswirth, Safe Host SA [EMAIL PROTECTED]
#
# Some inspiration is taked from others mon monitors and from
# routerinfo.pl by Ben Buxton ([EMAIL PROTECTED]), also under GPL, see 
http://www.zipworld.com.au/~bb/linux/
# and from routerint.monitor by P. Strauss ([EMAIL PROTECTED]) and me self 
([EMAIL PROTECTED]).
#

# This script need the SNMP Session module from Simon Leinen [EMAIL PROTECTED]
#   Wich you could found under http://www.switch.ch/misc/leinen/snmp/perl/
#   It is also part of MRTG (http://people.ee.ethz.ch/~oetiker/webtools/mrtg/)

use SNMP;
use SNMP_Session;
use Getopt::Long;
use strict;

my %opt;

$opt{'community'}= undef;
$opt{'exclude'}= ;
$opt{'debug'}= undef;
my $usage=Usage: [COMMUNITY=str] ospf.monitor [--exclude regexp] [--community 
str] router [...]\n;
GetOptions(\%opt, exclude=s, community=s, debug) or die $usage;

# It's highly unlikely someone wants dots in an IP address to be treated
# as a regexp pattern, so we'll escape them to make behavior more predictable.
# If you really want to use pattern matching, use a character class like
# [0-9] instead.
$opt{exclude} =~ s/\./\\./g;
$opt{exclude}= '^(' . $opt{exclude} . ')';
$opt{exclude}= NOT_USED if $opt{exclude} eq ^();


## --
my $community = $opt{'community'} || $ENV{'COMMUNITY'} || public;

## --

my @failures;
my @details;

$ENV{'MIBS'}= ; # all OIDs needed are specified in script

# OID's to the SNMP elements that I want to show...
# From Cisco's MIB and RFC's
# http://sunsite.cnlab-switch.ch/ftp/doc/standard/rfc/16xx/1657
# http://www.telecomm.uh.edu/stats/rfc/BGP4-MIB.html

my %oids = ( 
SysUptime =  1.3.6.1.2.1.1.3.0,
ifDescr   =  1.3.6.1.2.1.2.2.1.2,
ospfRouterId  =  1.3.6.1.2.1.14.1.1 ,
ospfIfIpAddress   =  1.3.6.1.2.1.14.7.1.1 ,
ospfAddressLessIf =  1.3.6.1.2.1.14.7.1.2 ,
ospfIfAdminStat   =  1.3.6.1.2.1.14.7.1.5 ,
ospfIfState   =  1.3.6.1.2.1.14.7.1.12 ,
);


my %ospfIfStates = (
1 = down,
2 = loopback,
3 = waiting,
4 = pointToPoint,
5 = designatedRouter,
6 = backupDesignatedRouter,
7 = otherDesignatedRouter,
);

my %ospfAdminStatus = (
1 = enabled,
2 = disabled,
);


my %state;
my $router;

sub snmpget1 # session, oid-hashstr, instance
{
my $session= shift;
my $oidstr= shift;
my $instance = shift;
my $result= $session-get(.$oids{$oidstr}.$instance);

if ($session-{ErrorNum})
{
push @failures, $router;
push @details, $router: error on SNMP get of 
$oidstr.$instance: $session-{ErrorStr};
return 0;
}
return $result;
}
foreach $router (@ARGV) {
# Get some infos about this router
my $sess = new SNMP::Session ( DestHost = $router, Community = 
$community );
if (!defined($sess))
{
push @failures, $router;
push @details, $router: cannot create SNMP session;
next;
}

my $ospfRouterID = snmpget1

Re: Monitor works from the command-line but not from mon

2006-01-27 Thread Ed Ravin
On Fri, Jan 27, 2006 at 02:54:40PM -0500, Brian Landers wrote:
 I have a tacacs+ monitoring script that successfully detects failures
 when run from the command-line but not from within mon.  I'm at a loss
 to troubleshoot this one and am hoping the list can help.  It seems to
 be printing the failure in the proper format and setting the exit code
 properly, but mon never sees the service as down.

You've reset Mon and the service shows up in the listing, right?  Just
checking, sometimes folks skip that step.

Are you using mon.cgi?  Drill down into the list for the tacacs+ test.
You should see a list of things like last time monitor was run and
other timestampes related to when the monitor was run and what the results
were.  Is Mon even running the monitor in the first place?

 -bash-2.05b$  ./tacacs.monitor username password key server1 server2
 server1
 server1: Attempt timed out!
 
 -bash-2.05b$ echo $?
 1

That looks right.

 watch acs_servers
 service tacacs
 description Make sure TACACS is working
 interval 15m
 monitor /usr/local/mon/tacacs.monitor username password key
 period wd {Sun-Sat}
 alert mail.alert [EMAIL PROTECTED]
 upalert mail.alert -S Service is back up [EMAIL PROTECTED]
 alertevery 15m

That looks right too.

BTW, can you share the tacacs+ monitoring script with us?  I know at least
one other Mon user (cough cough) who is interested in it.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: pass failed host from a hostgroup to an alert script?

2005-12-07 Thread Ed Ravin
On Wed, Dec 07, 2005 at 01:30:15PM +0100, J Paston wrote:
 Afternoon,
 I have a hosgroup with 20 hosts. One fails. How can the alert script
 know which one failed? Thanks.

The monitor script is supposed to send the name of the failing host
as the first line of its output (the summary).  Please read over the
mon man page - it covers how information is shared between monitors,
mon, and alerts.

Depending upon what you're doing, you might want to upgrade to the current
mon 1.1 in CVS, since it has a couple of new features for caching of
output from monitors.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: montrap

2005-11-22 Thread Ed Ravin
On Tue, Nov 22, 2005 at 03:08:56PM -, Julian Cable wrote:
   I started using mon about a week ago and got very confused when I
 wanted to send traps and got different answers with different versions
 of mon when using the montrap contributed script. I worked out there
 were a number of issues:
 
 1) montrap didn't actually use the port parameter
 2) mon changed from the opstatus to the retval parameter being
 important between the stable and development versions
 3) montrap didn't set the retval parameter properly
 
 The modified version below seems to fix these problems. Hope it helps
 someone else.

Thanks!  I didn't notice these problems because I was keying off the
presence or absence of the traps rather than looking at the retval.

Also, in the future, please send scripts like this as an attachment
to your message rather than in the main message text - I had to
undo a few changes that were introduced by mail formatting.

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: eval error for dependency starting

2005-11-17 Thread Ed Ravin
On Thu, Nov 17, 2005 at 10:36:47PM +0100, Frank Isemann wrote:
 
 Nov 17 22:28:57 lagoon mon[14598]: eval error for dependency starting at
 ubcom~dns:ping
 
 for each group that have a depend entry: like depend SELF:ping
 
 this error blocks the alert mechanism ...
 
 is that a general error or is that because i have added the ~?? char to
 the group/watch entrys?

Yes - only alphanumeric, dot, dash, and underscore are permitted:

   Hostgroup Entries
   Hostgroup entries begin with the keyword hostgroup, and are followed by
   a hostgroup tag and one or more hostnames or IP addresses, separated by
   whitespace.  The hostgroup tag must be composed of alphanumeric charac-
   ters, a dash (-), a period (.), or an underscore  (_).  Non-blank
   lines  following the first hostgroup line are interpreted as more host-
   names.  The hostgroup definition ends with a blank line. For example:

  hostgroup servers nameserver smtpserver nntpserver
   nfsserver httpserver smbserver

  hostgroup router_group cisco7000 agsplus


___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Prefixing the alert subject

2005-11-03 Thread Ed Ravin
On Thu, Nov 03, 2005 at 12:12:00PM -0500, Bill wrote:
 
 I have two questions:
 
  I am monitoring from a bunch of systems and would love to have a way
 to have the ALERT on my mail alerts have something defined before or
 after it I can set from the mon.cf file.

Quick clip from the Mon man page:

 As  with  monitor programs, alert programs are invoked with environment
 variables defined by the user in the service definition, in addition to
 the following which are explicitly set by the server:

Here's an example from one of my configs:

service freespace
description Is there 5GB free? Enough inodes?
depend SELF:ping
MIBDIRS=/usr/local/share/snmp/local-mibs:/usr/local/share/snmp/mibs
interval 7m
monitor netappfree.monitor

In this case, the monitor script won't work properly without MIBDIRS
defined.  You can use this feature to pass environment vars into your
script, so the same alert script could take different actions or send
different messages based on the contents of an environment var.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Prefixing the alert subject

2005-11-03 Thread Ed Ravin
On Thu, Nov 03, 2005 at 01:44:21PM -0500, Bill wrote:
[about including environment variables in the alert subject]

[...]
  In this case, the monitor script won't work properly without MIBDIRS
  defined.  You can use this feature to pass environment vars into your
  script, so the same alert script could take different actions or send
  different messages based on the contents of an environment var.
 
 Yeah, I was hoping that there was already someway to do it without
 creating another alert program - I loath re-inventing the wheel so to
 speak.  But have no problem doing so if it has not been done.  

I have something like it on my site in our locally modified copy of
mail.alert:

   $desc= $ENV{'MON_DESCRIPTION'} || ;

   [...]

$ALERT= ALERT;
$t= localtime($failtime);
$downmsg= Down for $downtime seconds;
$downmsg .= \n\nNotes: $desc\n if length($desc);

In this case, I'm using the description field in the mon config,
which is also viewable in the GUI.  I use this field for suggestions
on what to do if the service goes down, which helps a lot when someone
less familar with the system has to handle an alarm at 3 AM.

Anyway, this is just straightforward scripting, nothing to be afraid of.

If anyone wants, I'll post our mail.alert.  It has a bunch of fixes in
it for Mon 1.1 compatibility (for new alert types like ackalerts),
which should be available soon from a CVS repository near you

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: A few bugfixes missing from fantabulous mon...

2005-10-21 Thread Ed Ravin
On Wed, Oct 12, 2005 at 11:08:49AM -0400, Ed Ravin wrote:
...
 I also contributed a few fixes to the alerts that don't seem to
 be in mon-1-1-0pre2 - none of the alerts knew about the options for the
 new forms of alerts (like ackalerts and trapalerts).  Here are my local
 patches to snpp.alert:

Oh, but just ignore that last patch set, that's totally the wrong one.  I'm
surprised no one tweaked me on it - maybe no one ever reads my mail all
the way down to the bottom?

Anyway, I've attached a patch set that demonstrates the problem.  Note
the extra options I added for the new alert types, the verbose descriptions
of the alarms more worthy of an IBM operating system manual than a nifty
project like Mon, and the addition of the monitor description (very handy
for reminding folks about what broke, and you can put a hint as to how to
fix it in the description field in mon.cf), and the last monitor output
(which will only work if you commit the fix I submitted in my previous
email).

This patch is against my custom copy of mail.alert, so it won't apply
directly to any of the regular alerts, but you get the idea.

-- Ed
--- mymail.alert2005/04/27 21:18:48 1.1
+++ mymail.alert2005/10/21 20:46:10
@@ -8,7 +8,7 @@
 # By Ed Ravin, [EMAIL PROTECTED], based on the mail.alert by Jim Trocki
 # in mon-0.38.18
 
-# $Header: 
/devel/build/NetBSD/mon/mon-1.1-devel/mon/alert.d/RCS/mymail.alert,v 1.1 
2005/04/27 21:18:48 root Exp $
+# $Header: 
/devel/build/NetBSD/mon/mon-1.1-devel/mon/alert.d/RCS/mymail.alert,v 1.4 
2005/10/21 20:46:03 root Exp $
 
 #
 #This program is free software; you can redistribute it and/or modify
@@ -25,11 +25,11 @@
 #along with this program; if not, write to the Free Software
 #Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 #
-$RCSID='$Id: mymail.alert,v 1.1 2005/04/27 21:18:48 root Exp $';
+$RCSID='$Id: mymail.alert,v 1.4 2005/10/21 20:46:03 root Exp $';
 use Getopt::Std;
 use Text::Wrap;
 
-getopts (S:s:g:h:t:l:u);
+getopts (S:s:g:h:t:l:aDTOu);
 
 $summary=STDIN;
 chomp $summary;
@@ -39,16 +39,34 @@
 $mailaddrs = join (',', @ARGV);
 
 $t = localtime($opt_t);
-$failtime= $ENV{MON_FIRST_FAILURE};
+$failtime= $ENV{'MON_FIRST_FAILURE'};
 $downtime= time - $failtime;
 
+$desc= $ENV{'MON_DESCRIPTION'} || ;
+
 if ($opt_u) {
$ALERT = UPALERT;
$downmsg=Service is back up, was down for $downtime seconds.;
+} elsif ($opt_a) {
+   $ALERT= ACKALERT;
+   $downmsg= Problem has been acknowledged: $summary\n .
+ Last error output: $ENV{'MON_LAST_SUMMARY'};
+} elsif ($opt_D) {
+   $ALERT= DISABLEALERT;
+   $downmsg=
+   A host in this group has been disabled.  The monitoring system will show .
+   successful recovery even if that host's service is still down.;
+} elsif ($opt_T) {
+   $ALERT= TRAPALERT;
+   $downmsg= A trap has been received for this service.;
+} elsif ($opt_O) {
+   $ALERT= TRAPTIMEOUTALERT;
+   $downmsg= A heartbeat trap has failed to arrive for this service.;
 } else {
$ALERT= ALERT;
$t= localtime($failtime);
$downmsg= Down for $downtime seconds;
+   $downmsg .= \n\nNotes: $desc\n if length($desc);
 }
 
 ($wday,$mon,$day,$tm) = split (/\s+/, $t);
@@ -78,13 +96,13 @@
 
 Detailed text (if any) follows:
 ---
+
 EOF
 
 #
 # The remaining lines normally contain more detailed information,
 # but this is monitor-dependent.
 #
-while (STDIN) {
-print MAIL;
-}
+
+print MAIL $ENV{'MON_LAST_OUTPUT'};
 close (MAIL);
___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


A few bugfixes missing from fantabulous mon...

2005-10-12 Thread Ed Ravin
On Wed, Oct 12, 2005 at 09:54:00AM -0400, David Nolan wrote:
 --On Wednesday, October 12, 2005 3:06 AM -0400 Jim Trocki 
 [EMAIL PROTECTED] wrote:
 I think we should just fork the cvs tree and call mon-1-1-0pre2 the
 super fantabulous mon 1.2 (tag it as mon-1-2-0), then the head will
 be 1.3, the work-in-progress, possibly unstable, possibly stable,
 experimental-feature-laden code.
...
 The only thing commited to CVS right now that I don't think belongs in 1.2 
 is the global exclude_period feature I added yesterday.  Thats the only tag 
 since mon-1-1-0pre2, so we can just re-tag those versions as 1.2.

There are a couple of bugfixes that I reported to mon-devel that didn't
seem to make it into mon-1-1-0pre2 - I think they ought to be included
in mon 1.2:

This fix is to prevent an upalert from being issued for an ack'd watch
that sent out an ackalet:

@@ -626,12 +626,12 @@
my $pref = \%{$sref-{periods}-{$periodlabel}};
 
#
-   # skip upalerts not paired with down alerts
+   # skip upalerts/ackalerts not paired with down alerts
# disable by setting no_comp_alerts in period section
#
-   if (!$pref-{no_comp_alerts}  ($flags  $FL_UPALERT)  !$pref-{_a
lert_sent})
+   if (!$pref-{no_comp_alerts}  ($flags  ($FL_UPALERT | $FL_ACKALERT)
)  !$pref-{_alert_sent})
{
-   syslog ('debug', $group/$service/$periodlabel: Suppressing upalert 
since no down alert was sent.);
+   syslog ('debug', $group/$service/$periodlabel: Suppressing upalert 
or ackalert since no down alert was sent.);
next;
}
 
--
This bugfix below makes sure upalerts have the right message from the
last failed monitor.  I forget whether this was related to getting ackalerts
working properly, but it clearly fixes a feature that wasn't doing what
it was supposed to:


@@ -3295,6 +3296,8 @@
 (!defined($sref-{upalertafter}) 
  || (($tmnow - $sref-{_first_failure}) = $sref-{upalertafter
}
{
+   # Save the last failing monitor's output for posterity
+   $sref-{_upalertoutput}= $sref-{_last_output};
do_alert ($group, $service, $sref-{_upalertoutput}, 0, $FL_UPALER
T);
}
 


--
I also contributed a few fixes to the alerts that don't seem to
be in mon-1-1-0pre2 - none of the alerts knew about the options for the
new forms of alerts (like ackalerts and trapalerts).  Here are my local
patches to snpp.alert:

28c28
 use vars qw /$opt_g $opt_q $opt_s $opt_t $opt_u/;
---
 use vars qw /$opt_g $opt_q $opt_s $opt_t/;
50c50
 my $t = localtime ($opt_t || time);
---
 my $t = localtime ($opt_t);
55,57c55
 my $ALERT=   $opt_u ? UPALERT : ALERT;
 my $GROUP=   $opt_g || $ENV{MON_GROUP};
 my $SERVICE= $opt_s || $ENV{MON_SERVICE};
---
 $ALERT = $opt_u ? UPALERT : ALERT;
59c57
 $snpp-send ( Pager = [ @ARGV ], Message = $ALERT $GROUP/$SERVICE: 
$summary ($wday $mon $day $tm) );
---
 $snpp-send ( Pager = [ @ARGV ], Message = $ALERT $opt_g/$opt_s: $summary 
 ($wday $mon $day $tm) );

-

And finally, none of the new alert types (startupalert, ackalert, disablealert)
are documented in the Mon man page.

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Anyone monitoring SMART disk status?

2005-06-17 Thread Ed Ravin
Is anyone using Mon to monitor their IDE disks onboard SMART
monitoring features?

On NetBSD, one can use the atactl command as shown below.  If the
reliability column says negative, the drive may be in trouble.
If an entry marked yes in the critical column is negative, data
loss is imminent.

 # atactl wd0 smart status
 SMART supported, SMART enabled
 id value thresh crit collect reliability descriptionraw
   1 200   51 yes online  positiveRaw read error rate0
   3  94   21 yes online  positiveSpin-up time   2691
   4 1000 no  online  positiveStart/stop count   9
   5 200  140 yes online  positiveReallocated sector count   0
   7 200   51 yes online  positiveSeek error rate0
   9  910 no  online  positivePower-on hours count   7079
  10 100   51 yes online  positiveSpin retry count   0
  11 100   51 no  online  positiveCalibration retry count0
  12 1000 no  online  positiveDevice power cycle count   9
 194 1010 no  online  positiveTemperature42
 196 2000 no  online  positiveReallocated event count0
 197 2000 no  online  positiveCurrent pending sector 0
 198 2000 no  offline positiveOffline uncorrectable  0
 199 2000 no  online  positiveUltra DMA CRC error count  0
 200 200   51 yes offline positiveUnknown0

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: radius monitoring

2005-06-03 Thread Ed Ravin
On Thu, Jun 02, 2005 at 12:07:50PM +0200, Sylvain Clerc wrote:
 I arrive to make freeradius.monitor working when I run it alone like this :
 ./freeradius.monitor -S /etc/raddb/sec.radclient -f
 /etc/raddb/attr.radclient -hosts localhost
  
 I can read in the freeradius log that it sends an access-accept but on
 the freeradius.monitor side, there is nothing appears. First, I
 would know if it's normal

Yes, freeradius.monitor, like most of the monitors in Mon, will exit with
a zero return status and print nothing to indicate that the service being
monitored passes the test.

 and next, when I use this file with mon, I
 haven't any request received in freeradius logs

There's probably some environment difference between the command line you
used and that of the Mon user.  Maybe the permissions on
/etc/raddb/sec.radclient or /etc/raddb/attr.radclient?

Try su - monuser and run the freeradiusd.monitor command from there.
Also, have you turned on monerrfile in mon.cf so that the output to stderr
is captured somewhere?  radclient may be spitting out error messages and
you're losing them.  If you get desparate, temporarily modify
freeradius.monitor to call radclient with strace/ktrace/truss or whatever
your system uses, then review the trace file for clues.

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: ps.monitor

2005-06-02 Thread Ed Ravin
On Thu, Jun 02, 2005 at 10:50:57AM -0400, Allan Wind wrote:

 Was there anything wrong with the ps.monitor?  Been a month now, and I do
 not see it in contrib yet.

My bad, I haven't gotten a chance to try it out and put it into CVS.
Will do so shortly.

BTW, are you also running net-snmp?  You can add statements into
snmpd.conf to have snmpd tell you whether a process by a particular
name is running.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: About radius.monitor

2005-05-24 Thread Ed Ravin
On Tue, May 24, 2005 at 08:20:36AM +0200, Sylvain Clerc wrote:
 I'm trying to use mon 0.99 with freeradius on a debian sarge but I
 think I have a problem with radius.monitor (or my mon configuration
 file). My freeradius doesn't receive any request of mon and I don't
 find why.

I gave up trying to use Mon's radius scripts (and the various Perl modules
needed to support them) and wrote my own instead.  See:

   http://acsys.anu.edu.au/~tpot/hypermail/mon/feb2000/1012.html

The script is very simple - it uses radclient, a test program included
in the freeradius distribution, to do all the dirty work.

I believe the patch I submitted for radclient.c eventually went into
the distribution, so don't apply that unless needed.

Hmm, I should ask the guy who maintains the contrib section in CVS
to add this monitor into the archive.

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: h2ph problem with freespace.monitor

2005-05-04 Thread Ed Ravin
On Tue, May 03, 2005 at 10:48:09AM +0200, Joerg Hartmann wrote:
 i would like to monitor the diskspace on some servers with mon and 
 freespace.monitor.
 mon runs just fine, unfortunately freespace.monitor does not.
 I installed Frilesys::DiskSpace from CPAN several times

Did you do make test when installing the module?

 and i run
 (cd /usr/include; h2ph *.h libnet/*.h) several times.

I thought that h2ph business had been deprecated.  Read the prerequisites
for Filesys::DiskSpace carefully and see if there's anything else you
need to install.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: Socket-related prolem running mon on Solaris

2005-04-11 Thread Ed Ravin
On Mon, Apr 11, 2005 at 03:55:01PM +0100, Alex David Shadrach Hooper wrote:
 I'm trying to run Mon on solaris 
 (SunOS unified-ext 5.8 Generic_108528-13 sun4u sparc SUNW,UltraAX-e2)
 and keep getting the error
 
   Bad arg length for Socket::pack_sockaddr_in, length is 0, should be 4
   at /usr/local/lib/perl5/5.8.0/sun4-solaris/Socket.pm line 373.
 
 which is caused by (I think) implicit calls to Sys::Syslog::setlogsock
 on log attempts.  I just wanted to check whether this was a known
 problem (with a known fix) before scrambling around in the code...

What type of socket does Solaris use for syslog?  Is it one of those
funky door things, a Unix domain socket, or a UDP socket?  You might
want to start by playing with this code in Mon (line numbers approximate):

229
230 setlogsock('unix')
231 if grep /^ $^O $/xo, (linux, openbsd, freebsd, netbsd);
232
233 openlog (mon, cons,pid, $CF{SYSLOG_FACILITY});

And do something with the setlogsock() that's appropriate for Solaris
and/or the way you have syslog set up there.

If that doesn't help, write a quick sample program that sends a few log
entries using the example code in the man page for Sys::Syslog.
Does that reproduce the problem?

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: ORing hosts in a hostgroup (instead of ANDing) for a monitor

2005-03-26 Thread Ed Ravin
On Fri, Mar 25, 2005 at 10:57:26PM -0300, Raul Dias wrote:
 Is it possible to have an monitor to OR the hosts in a hostgroup and if
 one SUCCEED the service is considered SUCCESS?
 
 An example for this is to have a hostgroup with a few internet hosts
 and fping them.  If one of them succeeds then the internet conection is
 ok.  Some may fail and the conection still be ok. 
 
 However if all of them fails, then the internet conection is supposed to
 be considered down.

Yes, you can do this, but you'll need to code it yourself in the monitor
script.  At the beginning of the script, get a count of the hosts being
polled, then at the end of the script, count the failures - if you still
have at least one host up, adjust the exit status appropriately.

You could also write this as a wrapper script that would invoke the
desired monitor script.  Your wrapper would need to know something
about the options syntax for the monitor script, but this could
be done generically by passing the options or a flag of some kind
(like --) to the wrapper.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


Re: monitoring email capability for monitor alerts

2005-03-22 Thread Ed Ravin
On Tue, Mar 22, 2005 at 11:48:50AM -0500, David Nolan wrote:
 You might find that your cellular providers provide a way to verify text 
 message delivery, if you're using their web message submission forms.  But 
 thats problematic because they're likely to redesign their web pages on a 
 whim, so scripting the web interaction will be problematic.

I wrote the pageomat script around four or five years ago, which
automates transactions with Web sites of cell phone and pager providers
for sending pages - the only changes I've had to make are to the
Web site addresses when the providers buy each other up, the actual
transaction code has been very stable.  This was a surpise to me, since
I also expected the vendors to redesign things on a whim.  It's not a big
deal anymore since I added the fallback code - if the vendor changes their
web page, pageomat errors off (the same way it would if the web site was
unreachable) and sends the page via dialup modem instead (i.e. gives it to
qpage).

 SkyTel's SNPP server provides delivery confirmation information if you use 
 two-way pagers.

InfoRad maintains a good list of paging methods for most providers:

   http://www.inforad.com/snpp_adds.html

And Googling for the name of your provider and TAP or IXO usually
turns up their dialup number, if they have one.  With some persistent
searching, I discovered a TAP dialup for one provider who claimed up and
down that they didn't have one.

It also occurs to me that since cell phones almost always show the number
of the calling party, just seeing a phone call from the data center
outgoing line might be enough of an alarm in most cases.  Not as
user-friendly as text-to-speech, but a lot simpler to implement.

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


new and improved nntp.monitor

2005-02-11 Thread Ed Ravin
I've been comparing my Mon setup with 1.0pre5, and turned up a few
fixes that I made years ago that never made it into the mainstream.

One of them was a whole bunch of fixes to nntp.monitor.  In particular,
error messages that formerly were only available with debug options
(like no welcome message) are delivered to Mon as you would expect.
The debug option is still there, but now it dumps the transaction with
the news server so you can see what's going on.  I've also added file
support for the user/password for auth testing, so that you don't need
to let your Mon users see the login info in mon.cgi.  The option
for testing news feeding hosts without mode reader now actually works,
and it even passes perl -w / use strict checks.

The new monitor is attached.

-- Ed
#!/usr/bin/perl -w
#
# Use try to connect to a nntp server, and
# wait for the right output.
#
# For use with mon.
#
my $usage=Usage: nntp.monitor [-p port] [-t timeout] [-g group] [-f] [-A 
authfile] [-u user] [-a password] [-d] host [host...]\n;
#
#  -A authfile # authfile has two lines, first is username, then password
#  # script will use them with authinfo
# -u user -a pass # if you want to pass auth info on command line
#
#  -d for debugging
#
# This monitor connects to the NNTP server(s), checks for a greeting,
# then performs a mode reader and a group (groupname), and then disconnects.
# If the group is not specified by the -g option, then control is assumed.
#
# if -f is supplied, then it is assumed that a feeder is being tested,
# and the mode reader and group (groupname) commands are not executed.
#
# Adapted from http.monitor by
# Jim Trocki, [EMAIL PROTECTED]

# minor bugfixes by [EMAIL PROTECTED] on Thu Mar  1 18:26:43 EST 2001
# news authentication added by [EMAIL PROTECTED] on Wed Jun 27 20:02:08 EDT 2001
# Added auth syntax from Kai Schaetzl/conactive.com to mainstream this version

#
# http.monitor written by
#
# Jon Meek
# American Cyanamid Company
# Princeton, NJ
#
# $Id: nntp.monitor,v 1.2 2005/02/12 03:20:15 root Exp $
#
#Copyright (C) 1998, Jim Trocki
#
#This program is free software; you can redistribute it and/or modify
#it under the terms of the GNU General Public License as published by
#the Free Software Foundation; either version 2 of the License, or
#(at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#

use Getopt::Std;
use English;
use strict;

use vars qw($opt_g $opt_p $opt_t $opt_f $opt_d $opt_A $opt_u $opt_a);
getopts (a:fg:p:t:u:dA:) || die $usage;

my $GROUP = $opt_g || 'control';
my $PORT = $opt_p || 119;
my $TIMEOUT = $opt_t || 30;
my $FEEDER = $opt_f;
my $DEBUG = $opt_d || ;
my $AUTHFILE = $opt_A || ;
my ($username, $password);

my @failures = ();
my @details=   ();


if ($AUTHFILE)
{
open(AUTH, $AUTHFILE)|| die nntp.monitor: (local error) cannot open 
authfile $AUTHFILE: $!\n;
$username= AUTH || die nntp.monitor: (local error) cannot read 
username from authfile $AUTHFILE: $!\n;
$password= AUTH || die nntp.monitor: (local error) cannot read 
password from authfile $AUTHFILE: $!\n;
close AUTH;
}

# if user or pass specified on commandline, override auth file
$username= $opt_u if $opt_u;
$password= $opt_a if $opt_a;

foreach my $host (@ARGV) {

if (! nntpGET($host, $PORT)) {
push (@failures, $host);
}
}

if (@failures == 0) {
exit 0;
}

print join ( , sort @failures), \n;
print sort @details if (scalar @details)  0;

exit 1;


sub nntpGET {
use Socket;
use Sys::Hostname;

my($Server, $Port) = @_;
my($ServerOK, $TheContent);

$ServerOK = 0;

$TheContent = '';

###
eval {

local $SIG{ALRM} = sub { die Timeout Alarm };
alarm $TIMEOUT;
my $result = OpenSocket($Server, $Port); # Open a connection to the 
server
if (!$result) { # Failure to open the socket
print $Server: Unable to open socket\n if $DEBUG;
return '';
}

#
# welcome message
#
transact(, '^2\d\d', $Server: No welcome message\n) || return 0;

if (!$FEEDER) {
#
# mode reader, wait for OK response
#
transact(mode reader, '^2\d\d', $Server: Unable to perform 'mode 
reader') || return 0;

if ($AUTHFILE)
{
transact(authinfo user $username, '^38\d', $Server: 
unexpected response to 'authinfo user') || return 0;
   

new snmpdisk.monitor - monitors inodes, other improvements

2005-01-13 Thread Ed Ravin
On Tue, Jan 11, 2005 at 08:56:49PM -0500, Ed Ravin wrote:
 Has anyone hacked any of the disk space monitors to monitor inode
 consumption? [...]

Oh well, I had to do my own hacking.  Along the way, I fixed a few
things and added a few new features.  If you're using this monitor,
I encourage you to try this version and let me know how it works
for you.  Even if you don't need the inode monitoring feature, you
will probably like the new --listall feature that shows exactly
which filesystems are monitored and what the thresholds are for
that filesystem (i.e. a good way to debug your config file).

To monitor inodes, you need a recent net-snmp (I'm using 5.2.1.rc2,
5.2.1.rc3 just came out and should be just as good), you need to add
the line includeAllDisks into snmpd.conf (or manually add a disk
entry for each filesystem you want monitored, but who's got time for
that?), and you need to run this monitor with the --usemib ucd option.
To change the inode monitoring threshold (default 5%), add a 4th column
into the config file as needed.

Summary of changes:

New stuff:

* monitors inode usage (with --usemib ucd option and 4th column in config
file)

* choose which MIB you want to use (--usemib option)

* list out all monitored filesystems, with the parameters for alarming.
This shows you exactly what the thresholds are for each filesystem.

* debug output (--debug option)

Changed stuff:

* all failure messages now prefixed with the hostname as per normal Mon
style.  Also, failure messages now also display the threshold so you
can tell from the failure message what level you were testing for.

* when using the UCD MIB, no longer fails if the agent has a sparse MIB

* recognize more devices as local disk in UCD MIB.  Bug: this should be
a configurable option.

* added --ifree option to set default inode threshold if needed.

The new snmpdiskspace.monitor and .cf are attached.

-- Ed


#!/usr/local/bin/perl
#
# NAME
#  snmpdiskspace.monitor
#
#
# SYNOPSIS
#  snmpdiskspace.monitor [--list] [--timeout seconds] [--config filename]
#[--community string] [--free minfree] 
#[--retries retries] [--usemib mibtype] host...
#
#
# DESCRIPTION
#  This script uses the Host Resources MIB (RFC1514), and optionally
#  the MS Windows NT Performance MIB, or UCD-SNMP extensions
#  (enterprises.ucdavis.dskTable.dskEntry) to monitor diskspace on hosts
#  via SNMP.
#
#  snmpdiskspace.monitor uses a config file to allow the specification of
#  minimum free space on a per-host and per-partition basis. The config 
#  file allows the use of regular expressions, so it is quite flexible in
#  what it can allow. See the sample config file for more details and
#  syntax.
#
#  The script only checks disks marked as FixedDisks by the Host MIB,
#  which should help cut down on the number of CD-ROM drives 
#  erroneously reported as being full! Since the drive classification
#  portion of the UCD Host MIB isn't too great on many OS'es, though,
#  this won't buy you a lot. Empire's SNMP agent gets this right on
#  all the hosts that I checked, though. Not sure about the MS MIB.
#  UCD-SNMP only checks specific partition types (md, hd, sd, ida)
# 
#  snmpdiskspace.monitor is intended for use as a monitor for the mon
#  network monitoring package.
#
#
# OPTIONS
#  --community   The SNMP community string to use. Default is public.
#  --config  The config file to use. Default is either 
#/etc/mon/snmpdiskspace.cf or 
#/usr/lib/mon/mon.d/snmpdiskspace.cf, in that order.
#  --retries The number of retries to use, if we get an SNMP timeout.
#Default is retry 5 times.
#  --timeout Seconds to wait before declaring a timeout on an SNMP get.
#Default is 20 seconds.
#  --freeThe default minimum free space, in a percentage or absolute
#quantity, as per the config file. Thus, arguments of, for
#example, 20%, 1gb, 50mb are all valid.
#Default is 5% free on every partition checked.
#
#  --ifree   The default minimum free inode percentage, specified as
#a percentage.  Default is 5% free.
#
#  --listGive a verbose listing of all partitions checked on all 
#specified hosts.
#
#  --listall like --list, but also lists the thresholds defined for
#each filesystem, so you can doublecheck the config file
#
#  --usemib  Choose which MIB to use: one or more of host, perf, ucd
#Default tries all three, in that order
#
#  --debug   enable debug output for config file parsing and MIB fetching
#
#
# EXIT STATUS
#  Exit status is as follows:
#0 No problems detected.
#1 Free space on any host was below the supplied parameter.
#2 A soft error occurred, either a SNMP library error, 
#  or could not get a response from the server. 
#
#  In the case where both a soft error

minor fix for freespace monitors to hide community name

2005-01-11 Thread Ed Ravin
Two patches for free space monitors, to give them the same feature: the
ability to specify an SNMP community name in the environment, and thus
not display it in the command line for peering eyes to see in the Mon
interface (with the name of the monitor program in the details view).

-- Ed

--- snmpdiskspace.monitor   2004/05/05 13:34:42 1.1
+++ snmpdiskspace.monitor   2005/01/12 01:32:13
@@ -130,7 +130,7 @@


 # Read in defaults
-my $COMM   = $opt{community} || public;
+my $COMM   = $opt{community} || $ENV{COMMUNITY} || public;
 my $TIMEOUT= $opt{timeout} * 10 || 200;   #default timeout is 20 
seconds
 my $RETRIES= $opt{retries} || 5;
 my $CONFIG = $opt{config} || (-d /etc/mon ? /etc/mon : 
/usr/lib/mon/mon.d)


--
--- netsnmp-freespace.monitor   2005/01/12 01:46:25 1.1
+++ netsnmp-freespace.monitor   2005/01/12 01:46:36
@@ -43,7 +43,7 @@
 $ENV{'MIBS'} = UCD-SNMP-MIB;

 getopts(c:);
-$community = $opt_c || 'public';
+$community = $opt_c || $ENV{'COMMUNITY'} || 'public';

 $RETVAL = 0;

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


monitoring inodes

2005-01-11 Thread Ed Ravin
Has anyone hacked any of the disk space monitors to monitor inode
consumption?  I've already hacked this into the netapp monitor,
but would much prefer to steal someone else's code if this has been
done already.

Thanks,

-- Ed

___
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon


fping.monitor improvement

2004-11-29 Thread Ed Ravin
I needed fping to use a larger packet size in order to monitor when a
tunnel loses the ability to pass full-sized packets.  Hence the patches
below to fping.monitor to pass on the -b size option.  Patch is also
attached to avoid mail client munging.

-- Ed

--- fping.monitor   2004/11/24 23:20:36 1.1
+++ fping.monitor   2004/11/24 23:23:53
@@ -4,7 +4,7 @@
 #
 # Jim Trocki, [EMAIL PROTECTED]
 #
-# $Id: fping.monitor,v 1.1 2004/11/24 23:20:36 root Exp root $
+# $Id: fping.monitor,v 1.1.1.1.2.1 2004/07/29 21:10:19 trockij Exp $
 #
 #Copyright (C) 1998, Jim Trocki
 #
@@ -27,7 +27,7 @@
 use Getopt::Std;
 
 my %opt;
-getopts (ahr:s:t:T, \%opt);
+getopts (ab:hr:s:t:T, \%opt);
 
 sub usage
 {
@@ -35,6 +35,7 @@
 usage: fping.monitor [-a] [-r num] [-s num] [-t num] [-T] host [host...]
 
 -a only report failure if all hosts are unreachable
+-b num  send num bytes of ping data (default 56, like regular ping)
 -r num retry num times for each host before reporting failure
 -s num consider hosts which respond in over num msecs failures
 -t num wait num msecs before sending retries
@@ -50,7 +51,8 @@
 
 my $TIMEOUT = $opt{t} || 2000;
 my $RETRIES = $opt{r} || 3;
-my $CMD = fping -e -r $RETRIES -t $TIMEOUT;
+my $NUMBYTES = $opt{b} || 56;
+my $CMD = fping -e -r $RETRIES -t $TIMEOUT -b $NUMBYTES;
 my $START_TIME = time;
 my $END_TIME;
 my %details;
--- fping.monitor   2004/11/24 23:20:36 1.1
+++ fping.monitor   2004/11/24 23:23:53
@@ -4,7 +4,7 @@
 #
 # Jim Trocki, [EMAIL PROTECTED]
 #
-# $Id: fping.monitor,v 1.1 2004/11/24 23:20:36 root Exp root $
+# $Id: fping.monitor,v 1.1.1.1.2.1 2004/07/29 21:10:19 trockij Exp $
 #
 #Copyright (C) 1998, Jim Trocki
 #
@@ -27,7 +27,7 @@
 use Getopt::Std;
 
 my %opt;
-getopts (ahr:s:t:T, \%opt);
+getopts (ab:hr:s:t:T, \%opt);
 
 sub usage
 {
@@ -35,6 +35,7 @@
 usage: fping.monitor [-a] [-r num] [-s num] [-t num] [-T] host [host...]
 
 -a only report failure if all hosts are unreachable
+-b num  send num bytes of ping data (default 56, like regular ping)
 -r num retry num times for each host before reporting failure
 -s num consider hosts which respond in over num msecs failures
 -t num wait num msecs before sending retries
@@ -50,7 +51,8 @@
 
 my $TIMEOUT = $opt{t} || 2000;
 my $RETRIES = $opt{r} || 3;
-my $CMD = fping -e -r $RETRIES -t $TIMEOUT;
+my $NUMBYTES = $opt{b} || 56;
+my $CMD = fping -e -r $RETRIES -t $TIMEOUT -b $NUMBYTES;
 my $START_TIME = time;
 my $END_TIME;
 my %details;
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: more mon.cgi login problems

2004-11-14 Thread Ed Ravin
On Sun, Nov 14, 2004 at 06:46:14PM -0800, Joubin Moshrefzadeh wrote:

 mon is running fine, I can login from the commandline and execute
 commands with moncmd/monshow. But for some reason mon.cgi doesn't
 want to login. every time i click on the log in link, enter my
 id and password, and voila, I'm returned to the main summary page
 with no indication that I'm now logged in. All the admin commands
 remain grayed out.

Have you looked for error messages from mon.cgi in your web server logs?

Also, does your password have a space in it?  One of my users discovered that
with certain browsers a password with a space wouldn't work.  One of the
browsers was ICAB for the Mac - I forget if that was the only one where
it worked or the only one where it didn't work.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Generating mon trap for use as heartbeat

2004-11-09 Thread Ed Ravin
 On Tue, Nov 09, 2004 at 01:52:32PM -0800, Michael Vogt wrote:
  
  I am planning to monitor some application servers on a datacenter with
  a custom monitor plugin.  I want to have another monitor running at a
  remote location to monitor the main monitor at the datacenter (and
  vice-versa).  It looks like I should use mon traps in heartbeat mode.
  How do I create the heartbeats. 

Feel free to play around with my mon trap client, text is below.  I've
also attached it in case people's mail clients do naughty things with
the formatting.

-- Ed

---

#!/usr/bin/perl

use strict;
use Getopt::Std;
use Mon::Client;

my @opstrings= (
fail, ok, coldstart, warmstart, linkdown,
unknown, timeout, untested,
);

my $usage= montrap [-p port] [-r retval] -o opstatus -s summary [-d detail] 
host group:service\n;

use vars qw($opt_p $opt_r $opt_o $opt_s $opt_d);
getopts(p:r:o:s:d:);


die $usage unless @ARGV == 2 and $ARGV[1] =~ /[^:]+:[^:]+/;

my $host= $ARGV[0];
my ($group, $service)= $ARGV[1] =~ /^([^:]+):([^:]+)/;

my $port= $opt_p || 2583;
my $retval= $opt_r || 255;
my $opstatus= $opt_o || die montrap: '-o opstatus' required\n;

die montrap: unrecognized opstatus: $opstatus\n unless
grep $opstatus, @opstrings;


my $summary= $opt_s  || die montrap: '-s summary' required\n;
my $detail= $opt_d || ;

my $mon;

if (!defined ($mon = Mon::Client-new)) {
die $0: could not create client object: $@;
}
$mon-host($host);

$mon-send_trap(
group= $group,
service= $service,
retval= $retval,
opstatus= $opstatus,
summary= $summary,
detail = $detail,
);

#!/usr/bin/perl

use strict;
use Getopt::Std;
use Mon::Client;

my @opstrings= (
fail, ok, coldstart, warmstart, linkdown,
unknown, timeout, untested,
);

my $usage= montrap [-p port] [-r retval] -o opstatus -s summary [-d detail] 
host group:service\n;

use vars qw($opt_p $opt_r $opt_o $opt_s $opt_d);
getopts(p:r:o:s:d:);


die $usage unless @ARGV == 2 and $ARGV[1] =~ /[^:]+:[^:]+/;

my $host= $ARGV[0];
my ($group, $service)= $ARGV[1] =~ /^([^:]+):([^:]+)/;

my $port= $opt_p || 2583;
my $retval= $opt_r || 255;
my $opstatus= $opt_o || die montrap: '-o opstatus' required\n;

die montrap: unrecognized opstatus: $opstatus\n unless
grep $opstatus, @opstrings;


my $summary= $opt_s  || die montrap: '-s summary' required\n;
my $detail= $opt_d || ;

my $mon;

if (!defined ($mon = Mon::Client-new)) {
die $0: could not create client object: $@;
}
$mon-host($host);

$mon-send_trap(
group= $group,
service= $service,
retval= $retval,
opstatus= $opstatus,
summary= $summary,
detail = $detail,
);

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cf in a sql database ?

2004-09-30 Thread Ed Ravin
On Thu, Sep 30, 2004 at 05:34:04PM -0400, David Nolan wrote:

 --On Thursday, September 30, 2004 3:36 PM +0200 Brice Beauvillain 
 [EMAIL PROTECTED] wrote:
 
 Is it possible for mon to have the mon.cf file in a database ?

 There's no way to do that directly,

But you can do it indirectly.  Use the esyscmd macro in m4:

 esyscmd Pass its first argument to a shell and returns the
 shell's standard output.  Note that the shell shares its
 standard input and standard error with m4

So write a script that queries your database and retrieves Mon's
configuration, and then call it in mon.m4 with esyscmd.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: status of Mon development failures?

2004-06-03 Thread Ed Ravin
On Thu, Jun 03, 2004 at 08:07:43AM -0400, David Nolan wrote:
 Plus the complete lack of any non-development releases on Mon for almost 3 
 years makes me a bit wary of spending large amounts of effort on rewriting 
 the internals on Mon.  Especially since all of the patches I've previously 
 submitted haven't been integrated into the development branch yet.

I'm rather upset by this too.  Things seem to be all but frozen - Jim is
making limited releases, and doesn't seem to want to open the project up
to other developers.  I know I've submitted several improvements
to the monitor scripts in recent months and haven't gotten any feedback.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: status of Mon development failures?

2004-06-03 Thread Ed Ravin
On Thu, Jun 03, 2004 at 10:30:06AM -0700, Jim Trocki wrote:
 On Thu, 3 Jun 2004, Ed Ravin wrote:
 
  to other developers.  I know I've submitted several improvements
  to the monitor scripts in recent months and haven't gotten any feedback.
 
 netapfree.monitor?
 
 (0) uplift /home/trockij/mon/patches $ perl -c netappfree.monitor
 Format not terminated at netappfree.monitor line 307, at end of line
 netappfree.monitor had compilation errors.

That's odd, line 307 is a single period to end the format definition,
Jim, perhaps the SMTP daemon decided to eat that period?  Please
try again with the attached, if you're still having trouble, let me know.

-- Ed
#!/usr/bin/perl -w
#
# Use SNMP to get free disk space or inode status from a Network Appliance
#
# exit values:
#  1 - free space or inodes on any host dropped below the supplied parameter
#  2 - network or SNMP error (SNMP library error, no response from server)
#  3 - config error - (filesystem in config file does not exist on filer)

# USAGE
#  [--community=SNMP COMMUNITY] [--timeout=seconds]
#  [--config=/path/to/configfile] [--list]  host1 host2 ...

# EXAMPLES
# --list option will dump current status from requested hosts:
#  netappfree.monitor --list filer1 filer2 filer3
# sample output:
# filer  ONTAP   filesystem KB total KB avail   Inode%
# 
# filer1 6.1.2R3 /vol/vol0/   61092616  677341686
# filer1 6.1.2R3 /vol/vol0/.snaps  2545524  1260240 0

# sample invocation in mon.cf, with local MIB directory for the Netapp MIB
# NETWORK-APPLIANCE-MIB.txt (copy from /etc/mib/netapp.mib on filer):
#service freespace
#description test freespace and inodes on Netapp filers
#depend SELF:ping
#MIBDIRS=/usr/local/share/snmp/mibs
#interval 7m
#monitor netappfree.monitor


# CONFIG FILE FORMAT
#
#  Run netappfree --list host1 host2 ... first to get list of filesystems
# and whether inodes are properly reported.  If you don't want to monitor
# inodes for a particular FS, leave tha column blank.
#
#
# host  filesystem freespace  [InodeThreshold]
#   (in kb, gb, or mb)  (in % or k)
#
# filer1/vol/main/   5gb  90%
# filer2/vol/vol0/   5gb  500k


#
# This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
# module.
#
# Originally by Jim Trocki.  Modified by Theo Van Dinter
# ([EMAIL PROTECTED], [EMAIL PROTECTED]) to add verbose error output,
# more error checking, etc.  Can be used in conjunction with
# snapdelete.alert to auto-remove snapshots if needed.
# Modified December 2003 by Ed Ravin ([EMAIL PROTECTED]) to add inode
# checking, detect nonexistent filesystem in config file, pass perl -w
# checks, added more info to error messages for clarity, updated doc comments
# above.


# $Id: netappfree.monitor,v 1.2 2003/12/20 20:31:05 root Exp root $
#
#Copyright (C) 1998, Jim Trocki
#Copyright (C) 1999-2001, Theo Van Dinter
#
#This program is free software; you can redistribute it and/or modify
#it under the terms of the GNU General Public License as published by
#the Free Software Foundation; either version 2 of the License, or
#(at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
#
use SNMP;
use Getopt::Long;

sub list;
sub readcf;
sub toKB;

$ENV{MIBS} = 'RFC1213-MIB:NETWORK-APPLIANCE-MIB';

GetOptions (\%opt, community=s, timeout=i, retries=i, config=s, list);

die no host arguments\n if (@ARGV == 0);

$RET = 0;
@ERRS = ();
%HOSTS = ();

$COMM = $opt{community} || public;
$TIMEOUT = $opt{timeout} || 2; $TIMEOUT *= 1000 * 1000;
$RETRIES = $opt{retries} || 5;
$CONFIG = $opt{config} || (-d /etc/mon ? /etc/mon : /usr/lib/mon/etc)
. /netappfree.cf;

($dfIndex, $dfFileSys, $dfKBytesTotal, $dfKBytesAvail,
$dfInodesFree, $dfPerCentInodeCapacity) = (0..5);

list (@ARGV) if ($opt{list});

readcf ($CONFIG) || die could not read config: $!\n;

foreach $host (@ARGV) {
next if (!defined $FREE{$host});

if (!defined($s = new SNMP::Session (DestHost = $host,
Timeout = $TIMEOUT, Community = $COMM,
Retries = $RETRIES))) {
$RET = ($RET == 1) ? 1 : 2;
$HOSTS{$host} ++;
push (@ERRS, could not create session to $host:  . $SNMP::Session::ErrorStr);
next;
}

$v = new SNMP::VarList (
['dfIndex'],
['dfFileSys'],
['dfKBytesTotal

Re: status of Mon development failures?

2004-06-03 Thread Ed Ravin
On Thu, Jun 03, 2004 at 11:02:22AM -0700, Jim Trocki wrote:
 wtf, the one you just sent has the same problem. maybe it's an mua
 problem on your end?

Curiouser and curiouser.  I don't have any problems with the message
from the mailing list that I received in my mailbox, so I don't think
my MUA is at fault.

 i had a look at the mail as delivered by the mta
 on my end and it doesn't have an ending --BXVAT5kNtrzKuDFl. maybe that
 line after the format STDOUT thing which begins with --- is messing
 things up somehow, since there's a blank line after that and nothing else.

Sounds like you have a buggy MIME decoder somewhere.  That looks fine in
the copy I got from the mailing list.

 try gzipping the thing first then sending that as an attachment.

OK, see attached.  Also, try this:

  wget http://www.panix.com/~eravin/netappfree.monitor

Which should get an adulterated version.




netappfree.monitor.gz
Description: application/gunzip
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: david nolan's patches

2004-06-03 Thread Ed Ravin
On Thu, Jun 03, 2004 at 10:52:03AM -0700, Jim Trocki wrote:
 this is a matter of historical record which should be public. rather
 than post his patched version to the mailing list for everyone to have a
 gander at and do something with if they chose, he sent them only to me

Sounds like he wanted to respect your role as maintainer of Mon, and run
major changes by you before releasing them to anyone else.  The patches
probably arrived at a moment when you didn't have time to look at them,
allowing the misunderstanding and subsequent miscommunication to fester.

These kinds of problems would be less likely to happen if we were using
Sourceforge or the like, since both the latest development version and
submitted patches would be publicly visible to all.


-- Ed

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Multi-line descriptions?

2004-05-31 Thread Ed Ravin
On Mon, May 31, 2004 at 07:33:53PM -0500, Tim Klein wrote:
 I'd like to be able to put a whole paragraph in the
 description, rather than just one line.  I could put
 the paragraph in comments instead, but then alerts and
 clients couldn't make use of it.

There's a little-known feature of Mon that might help you
here.  You can set environment variables in the Mon entry
for a service:

watch mumble
  service diskspace
  description less than 5% disk space free?
  MIBDIRS=/usr/local/share/snmp/mibs
  COMMUNITY=ims03l33t
  interval 15m
  monitor snmpdiskspace.monitor

The example above sets the environment variables MIBDIRS and
COMMUNITY, which will be available to the monitor program.
The mon man page says they will also be available to alert
programs.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: alert scripts

2004-05-11 Thread Ed Ravin
On Tue, May 11, 2004 at 11:26:08AM +0200, [EMAIL PROTECTED] wrote:
 Mon works properly with scripts that came with the program, but isn't able
 to launch my handmade scripts.

Try adding this to the top of the script:

  exec /tmp/script.debug.out 21
  set -x

And review /tmp/script.debug.out after Mon calls the script.

Scripts that work off the command line, but don't work when invoked
by a daemon like Mon, usually have some problem with an environment
variable like PATH or a permissions problem with the script or something
the script is calling.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: mon.cgi + auth

2004-01-06 Thread Ed Ravin
On Tue, Jan 06, 2004 at 11:06:25AM +0200, Eugene Filatov wrote:
 Hello!
 
 I just configured mon (mon-0-99-3.39)  for my site and it works well. But
 I have problems  with auth trough mon.cgi.
 
 I have default auth.cf with one uncommented line:
 * mon monpassword

That looks like an entry for the trap section - the user passwords
go in monusers.cf, in standard Unix encrypted format.  Suggest you
go over the mon manpage again, and read up on monusers.cf and auth.cf.
Remember, auth.cf is just authorization, not authentication.

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


updated netappfree.monitor - inode checking, other improvements

2003-12-20 Thread Ed Ravin
I needed to be able to alarm if we ran out of inodes on our Netapps, so
I took the axe to netappfree.monitor.  I've attached the new version.
Improvements are:

  * inode checking - can check for exceeding NN%, or less than N
  * complain if a non-existent filesystem is in config file.  This
one hit me - we upgraded the Netapp and the name of the filesystem
changed on the new box, it turned out netappfree.monitor had silently
stopped testing it.
  * improved error messages to list the thresholds and other info
  * perl -w compliant.
  * updated internal docs in program

-- Ed



#!/usr/bin/perl -w
#
# Use SNMP to get free disk space or inode status from a Network Appliance
#
# exit values:
#  1 - free space or inodes on any host dropped below the supplied parameter
#  2 - network or SNMP error (SNMP library error, no response from server)
#  3 - config error - (filesystem in config file does not exist on filer)

# USAGE
#  [--community=SNMP COMMUNITY] [--timeout=seconds]
#  [--config=/path/to/configfile] [--list]  host1 host2 ...

# EXAMPLES
# --list option will dump current status from requested hosts:
#  netappfree.monitor --list filer1 filer2 filer3
# sample output:
# filer  ONTAP   filesystem KB total KB avail   Inode%
# 
# filer1 6.1.2R3 /vol/vol0/   61092616  677341686
# filer1 6.1.2R3 /vol/vol0/.snaps  2545524  1260240 0

# sample invocation in mon.cf, with local MIB directory for the Netapp MIB
# NETWORK-APPLIANCE-MIB.txt (copy from /etc/mib/netapp.mib on filer):
#service freespace
#description test freespace and inodes on Netapp filers
#depend SELF:ping
#MIBDIRS=/usr/local/share/snmp/mibs
#interval 7m
#monitor netappfree.monitor


# CONFIG FILE FORMAT
#
#  Run netappfree --list host1 host2 ... first to get list of filesystems
# and whether inodes are properly reported.  If you don't want to monitor
# inodes for a particular FS, leave tha column blank.
#
#
# host  filesystem freespace  [InodeThreshold]
#   (in kb, gb, or mb)  (in % or k)
#
# filer1/vol/main/   5gb  90%
# filer2/vol/vol0/   5gb  500k

#!/usr/bin/perl -w
#
# Use SNMP to get free disk space or inode status from a Network Appliance
#
# exit values:
#  1 - free space or inodes on any host dropped below the supplied parameter
#  2 - network or SNMP error (SNMP library error, no response from server)
#  3 - config error - (filesystem in config file does not exist on filer)

# USAGE
#  [--community=SNMP COMMUNITY] [--timeout=seconds]
#  [--config=/path/to/configfile] [--list]  host1 host2 ...

# EXAMPLES
# --list option will dump current status from requested hosts:
#  netappfree.monitor --list filer1 filer2 filer3
# sample output:
# filer  ONTAP   filesystem KB total KB avail   Inode%
# 
# filer1 6.1.2R3 /vol/vol0/   61092616  677341686
# filer1 6.1.2R3 /vol/vol0/.snaps  2545524  1260240 0

# sample invocation in mon.cf, with local MIB directory for the Netapp MIB
# NETWORK-APPLIANCE-MIB.txt (copy from /etc/mib/netapp.mib on filer):
#service freespace
#description test freespace and inodes on Netapp filers
#depend SELF:ping
#MIBDIRS=/usr/local/share/snmp/mibs
#interval 7m
#monitor netappfree.monitor


# CONFIG FILE FORMAT
#
#  Run netappfree --list host1 host2 ... first to get list of filesystems
# and whether inodes are properly reported.  If you don't want to monitor
# inodes for a particular FS, leave tha column blank.
#
#
# host  filesystem freespace  [InodeThreshold]
#   (in kb, gb, or mb)  (in % or k)
#
# filer1/vol/main/   5gb  90%
# filer2/vol/vol0/   5gb  500k


#
# This requires the UCD SNMP library and G.S. Marzot's Perl SNMP
# module.
#
# Originally by Jim Trocki.  Modified by Theo Van Dinter
# ([EMAIL PROTECTED], [EMAIL PROTECTED]) to add verbose error output,
# more error checking, etc.  Can be used in conjunction with
# snapdelete.alert to auto-remove snapshots if needed.
# Modified December 2003 by Ed Ravin ([EMAIL PROTECTED]) to add inode
# checking, detect nonexistent filesystem in config file, pass perl -w
# checks, added more info to error messages for clarity, updated doc comments
# above.


# $Id: netappfree.monitor,v 1.2 2003/12/20 20:31:05 root Exp root $
#
#Copyright (C) 1998, Jim Trocki
#Copyright (C) 1999-2001, Theo Van Dinter
#
#This program is free software; you can redistribute it and/or modify
#it under the terms of the GNU General Public License as published by
#the Free Software Foundation; either version 2 of the License, or
#(at your option) any later

mon, mon.cgi, and funny passwords

2003-10-27 Thread Ed Ravin
One of my co-workers finally got around to telling me that he hadn't
been using Mon for the last three years because mon.cgi kept demanding
that he re-enter his password every time he went to a new screen.

At first I thought the problem was due to the fact that he was using
a Mac, and we'd hit on some incompatibility with the icab browers.  But
IE and Netscape on the same box also had the problem.  Furthermore,
lynx on a Unix host had the same problem.

Then the co-worker asks me I've got a space character in my password.
Could that be part of the problem?  I reset his password and lo, it
works now.

Andrew, could this be a mon.cgi bug?

-- Ed
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


patch to bgp.monitor to correct peer address display

2003-10-21 Thread Ed Ravin
The April 5, 2002 version of bgp.monitor currently on the Mon contrib
site does not seem to properly display the IP address of the remote
peer when there is an error condition.  At least that's the case for
me right now, with a Cisco 7513 running IOS 12.0.23s.  Here is a patch
that corrects the problem, by displaying bgpPeerRemoteAddr rather
than bgpPeerIdentifier.  Comments welcome, since I'm not all that
familiar with this MIB.  It seems that bgpPerIdentifier is set to
0.0.0.0 in my Cisco's MIB when the session goes down, which is not
all that useful in a Mon alarm.

--- bgp.monitor 2003/10/21 17:10:52 1.3
+++ bgp.monitor 2003/10/21 17:11:33
@@ -128,7 +128,7 @@
$details .= Router: $router  (AS $bgpLocalAs) Id : $bgpIdentifier\n;
 
# Get trougnt the SNMP tree to fetch all peer infos
-   my $vars  = new 
SNMP::VarList([$oids{bgpPeerIdentifier}],[$oids{bgpPeerRemoteAs}],[$oids{bgpPeerState}],[$oids{bgpPeerFsmEstablishedTime}],[$oids{bgpPeerAdminStatus}]);
+   my $vars  = new 
SNMP::VarList([$oids{bgpPeerIdentifier}],[$oids{bgpPeerRemoteAs}],[$oids{bgpPeerState}],[$oids{bgpPeerFsmEstablishedTime}],[$oids{bgpPeerAdminStatus}],
 [$oids{bgpPeerRemoteAddr}]);
for (my @vals = $sess-getnext($vars);
$vars-[0]-tag =~ /15\.3\.1\.1/   # still in table (Did 
you have a cleaner solutions ?)
and 
@@ -137,11 +137,11 @@
{
my $textState = $BgpPeerState{$vals[2]};
my $texttime = sectotime($vals[3]);
-   $details .= sprintf(   Neighbor %-16s  AS %-5u   status : %-15s   
since : %-16s\n,$vals[0], $vals[1], $textState, $texttime); 
+   $details .= sprintf(   Neighbor %-16s  AS %-5u   status : %-15s   
since : %-16s\n,$vals[5], $vals[1], $textState, $texttime); 
 
# if bgpPeerState != established and bgpPeerAdminStatus == start
if ($vals[2] != 6 and $vals[4] == 2) {
-   $summary .= Neighbor relation : $router - $vals[0] (AS 
$vals[1]) is in state $textState ;
+   $summary .= Neighbor relation : $router - $vals[5] (AS 
$vals[1]) is in state $textState ;
};
}
$details .= \n;
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: can i start eth0 and eth0:0 sepeartly on boot

2003-07-06 Thread Ed Ravin
On Mon, Jul 07, 2003 at 03:05:58AM +, Eric Bond wrote:
 One question:I want to setup alias ip of eth0, becuase sometime the 
 original ip of eth0 has already been used by some other computer.

Whoa, hold it right there.  Who's managing this network?  You should solve
the problem whereby machines steal each other's IP addresses - either
with better policies, or by using a DHCP server.

 As I know 
 when i start the computer in this case,
 eth0 will fail to get ip and therefore eth0:0 can't get ip either although 
 eth0:0 has no confliction.

The complete answer to this question depends on what operating system
you are using, which you haven't specified.  But if you really think you
need to do this, assign eth0 a private IP address that no one else is
likely to steal, and then you can use whatever you need for the eth0:0,
eth0:1, etc.

But like I said before, you should solve the real problem - well-behaved
hosts do not steal each other's IP numbers.
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


monitoring cobalt qube3

2003-06-16 Thread Ed Ravin
A while ago I asked on the list if anyone was using Mon to keep track
of their Cobalt Qubes or other Cobalt boxes.  I've since implemented
something that exports Cobalt's Active Monitor status so that Mon
can figure out if the Cobalt is complaining about any problems.  Please
write to me if you're interested in trying it out.

-- Ed
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: NFS monitor?

2003-06-09 Thread Ed Ravin
On Mon, Jun 09, 2003 at 06:23:37PM -0700, Hugh Caley wrote:
 I have a script for checking space on my mounts.  It parses the output 
 from the df command

If the NFS server is not available, won't that cause df to hang,
eventually clogging up your mon server?

 I don't see a NFS monitor anywhere. I was wondering if any of you had
 one you used? 
 
 In thinking about it I suspect the easiest way to test it may be to
 mount each share and run `cat $file` to ensure a known good file is
 readable.

Same problem.  Alas, most NFS failure modes cause the server to hang.
I would never mount any NFS filesystems on my mon server.
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: Mon causing system to freeze

2003-06-06 Thread Ed Ravin
On Thu, Jun 05, 2003 at 10:23:28AM -0700, Donald MacDougall wrote:
 Whenever I run mon it causes the system it's running on to freeze
 up instantly anywhere from a few minutes after it is started to
 several days later.  It's instantaneous when it happens, with no
 sign of building memory usage or CPU usage before it happens.
 
 I've run mon on two different debian systems.  One is the current
 stable the other is the current testing.  stable uses perl 5.6.1,
 testing 5.8.0.  Testing is running on an AMD Athlon processor,
 stable on a Pentium 4.
 
 Both systems are workstations with X and all sorts of other things
 running. [...]
 It often happens while I'm doing something and I'll be running the
 mouse across the screen and the mouse cursor will just suddenly
 stop and that is it.

My guess is that there are some bugs in X that are being provoked
by Mon's use of CPU and memory that the X server used to have for
itself.  Do you have a box that isn't running X, or can you upgrade
to  a newer X server?

___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Re: New release of mon?

2003-03-19 Thread Ed Ravin
On Wed, Mar 19, 2003 at 02:49:43PM +0100, Hans Kinwel wrote:
 Why not put mon on sourceforge?

It's already there - see sourceforge.net/projects/mon .  Alas,
the latest version there is 0.38.20.  And Jim is both the admin
and the only developer registered.

Jim, have you thought about reactivating Soureforge and delegating
responsibility for various parts of Mon?  At the very least, you could
let others maintain the monitors and alerts, where a lot of the useful
patches have taken place.

-- Ed
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon


Anyone monitoring Cobalt servers?

2003-01-23 Thread Ed Ravin
Is anyone monitoring Cobalt servers with Mon?  I'm particularly
curious if anyone's tried to carry Cobalt's Active Monitor
outputs visible over to Mon.

-- Ed
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon



Re: restricting views with mon.cgi or mon?

2002-12-12 Thread Ed Ravin
Andrew Ryan writes:
 
 On Wed, 11 Dec 2002, Ed Ravin wrote:
  I'm playing with using watch = xxx in mon.cgi.cf but it seems to
  be an all-or-nothing deal - there doesn't seem to be a way to restrict
  views based on individual users.

 Since no one had really asked for it, I thought I'd add that to mon.cgi
 and see if anyone actually used it, at least it could provide some basic
 access control. Using separate directories like this, each with their own
 mon.cgi.cf file, you could accomplish some neat things with apache
 authentication and access control directives, with URL's like this:
 http://your.mon.server/customer1/mon.cgi
 http://your.mon.server/customer2/mon.cgi

The issue is when customer1 tries to use customer2's URL - since Mon
is doing the authentication, not mon.cgi, how do we keep customer1
from using his password to view customer2's information?

 BTW, there are some patches to mon.cgi needed for the watch keyword to
 work as intended. Contact me if you need them.

Perhaps the answer is in the patches?  Please send them...
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon



Re: Very Simple Question (Hopefully)

2002-09-24 Thread Ed Ravin

William Bartholomew writes:
 
 Having very little experience with Perl, what is the best way to start
 mon as a non-root user under rc.local (if this is indeed possible).

In rc.local, something like:

   su MONUSER -c /usr/local/mon/mon OPTIONS ARGUMENTS
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon



Re: Very Simple Question (Hopefully)

2002-09-24 Thread Ed Ravin

William Bartholomew writes:
 
 MONUSER obviously needs access to the mon directory and all
 subdirectories, does it need any special permissions to the perl or perl
 library directories or will the defaults suffice?

The defaults are sufficient.  I run mon this way (as a regular
user) - it needs read access to the mon stuff and configuration
files, write access to the mon state directory, and that's it.
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon



m4 eats the word include from mon.m4?

2002-09-19 Thread Ed Ravin

In my mon.m4, I have a router monitoring script with an argument like
this:

   -include=WAN

Without any problems.  Yesterday I added another invocation of that
script with the argument:

   -include=Serial10/4

And much to my surprise, the script didn't run - it turns out that
m4 was turning the second line into:

   -=Serial10/4

Since the script in question is using Perl's GetOpt::Long, which is
case-insensitive with verbose options, I changed the line to:

   -Include=Serial/10/4

Which still worked for the script but didn't trigger m4's line eater.

I know include is special to m4, but why would it eat one of those
lines but not the other?
___
mon mailing list
[EMAIL PROTECTED]
http://linux.kernel.org/mailman/listinfo/mon



montrap, a command line client

2002-06-08 Thread Ed Ravin

I suspect I'm not the first to write something like this, but I don't
recall anyone posting it to the mon list.  This is a simple command
line client for sending mon traps.

  montrap [-p port] [-r retval] -o opstatus -s summary [-d detail]
  host group:service

  host - Mon host to send trap to
  opstatus - one of fail, ok, coldstart, warmstart, etc.

The original plan was to have a Mon watch for every critical
cron job, have those critical cron jobs run montrap at the end
of every successful run to indicate that they were OK, and have
Mon page if the trap isn't received in the specified interval.
I haven't put this together yet, but I'm sure someone can think
of other interesting uses for it.

Note that it doesn't support passwords - I'd rather run with
no password than specify it on the command line, but it's easy
enough to hard-code it or read from a file if someone wants.

-- Ed


#!/usr/bin/perl

use strict;
use Getopt::Std;
use Mon::Client;

my opstrings= (
fail, ok, coldstart, warmstart, linkdown,
unknown, timeout, untested,
);

my $usage= montrap [-p port] [-r retval] -o opstatus -s summary [-d detail] host 
group:service\n;

use vars qw($opt_p $opt_r $opt_o $opt_s $opt_d);
getopts(p:r:o:s:d:);


die $usage unless ARGV == 2 and $ARGV[1] =~ /[^:]+:[^:]+/;

my $host= $ARGV[0];
my ($group, $service)= $ARGV[1] =~ /^([^:]+):([^:]+)/;

my $port= $opt_p || 2583;
my $retval= $opt_r || 255;
my $opstatus= $opt_o || die montrap: '-o opstatus' required\n;

die montrap: unrecognized opstatus: $opstatus\n unless
grep $opstatus, opstrings;


my $summary= $opt_s  || die montrap: '-s summary' required\n;
my $detail= $opt_d || ;

my $mon;

if (!defined ($mon = Mon::Client-new)) {
die $0: could not create client object: $;
}
$mon-host($host);

$mon-send_trap(
group= $group,
service= $service,
retval= $retval,
opstatus= $opstatus,
summary= $summary,
detail = $detail,
);




Re: monitor a pid?

2002-05-21 Thread Ed Ravin

Scott Prater writes:
 
 I believe the unary operator expected error is occurring in the line:
 
 if [ $9 = -u ]; then
 
 Whenever I do shell scripting, I always use the trick explained in _Unix
 Power Tools_:  instead of
 if [ $9 = -u ]
 do
 if [ X$9 = X-u ]
 
 That way, the test will never fail, even if $9 doesn't exist.

If you're using quotes, then you don't need the X kludge.  For example;

if [ $9 = -u ]

Will work just fine.  As long as you have quotes around them, a null
string will be passed as an argument to the test.

BTW, I always use set -u at the top of my shell scripts, and the
default construction to avoid errors:

set -u
if [ ${9:-} = -u ]

That way, misspelt shell variables will turn up a LOT sooner.



Re: New BGP Sessions monitor

2002-04-08 Thread Ed Ravin

Marc Hauswirth writes:
 I wrote a small monitor usefull if you run BGP (Border 
 Gateway Protocol) on your network.
 This monitor permit you to check that all your BGP peer are up and
 running. 

Marc, this is extremely useful!  Thank you for the contribution!

However, in my environment, we have several BGP sessions that are
in shutdown status, i.e. the line

  neighbor 10.2.2.2 shutdown

is in the Cisco config.  bgp.monitor, as originally released, thinks
those sessions are broken, rather than ignoring them.  Attached
is a patch that will check for:

  bgp.bgpPeerTable.bgpPeerEntry.bgpPeerAdminStatus = stop

and not complain about a session that has been administratively
shut down.  Note that if you have a buggy version of Cisco IOS 
(like some 12.0(S) versions) where this MIB variable is not supported
properly this patch will only make things worse for you.  Marc, perhaps
it should be an option?

One other quibble, I think the summary output of the script is too verbose -
how about something like:

  routername(BGP 20.2.3.44/AS 1234/idle)

so it can fit on a pager screen?

Thanks again,

-- Ed



--- bgp.monitor 2002/04/08 21:25:21 1.1
+++ bgp.monitor 2002/04/08 21:25:39
 -109,6 +109,11 
6 = established
);
 
+my %BgpAdminStatus = (
+   1 = stop,
+   2 = start,
+   );
+
 
 my %state;
 
 -121,7 +126,7 
$details .= Router: $router  (AS $bgpLocalAs) Id : $bgpIdentifier\n;
 
# Get trougnt the SNMP tree to fetch all peer infos
-   my $vars  = new 
SNMP::VarList([$oids{bgpPeerIdentifier}],[$oids{bgpPeerRemoteAs}],[$oids{bgpPeerState}],[$oids{bgpPeerFsmEstablishedTime}]);
+   my $vars  = new 
+SNMP::VarList([$oids{bgpPeerIdentifier}],[$oids{bgpPeerRemoteAs}],[$oids{bgpPeerState}],[$oids{bgpPeerFsmEstablishedTime}],[$oids{bgpPeerAdminStatus}]);
for (my vals = $sess-getnext($vars);
$vars-[0]-tag =~ /15\.3\.1\.1/   # still in table (Did 
you have a cleaner solutions ?)
and 
 -132,8 +137,9 
my $texttime = sectotime($vals[3]);
$details .= sprintf(   Neighbor %-16s  AS %-5u   status : %-15s   
since : %-16s\n,$vals[0], $vals[1], $textState, $texttime); 
 
-   if ($vals[2] != 6) {
-   $summary .= Neighbor relation : $router - $vals[0] (AS 
$vals[1]) is in state $textState;
+   # if bgpPeerState != established and bgpAdminStatus == start
+   if ($vals[2] != 6 and $vals[4] == 2) {
+   $summary .= Neighbor relation : $router - $vals[0] (AS 
+$vals[1]) is in state $textState ;
};
}
$details .= \n;
 -178,4 +184,4 

$texttime .= int($sec/60) . min;
return ($texttime);
-};
\ No newline at end of file
+};



Re: traptimeout

2002-03-19 Thread Ed Ravin

TORRESANI, Roberto writes:
 
 I'm trying to do a thing like :
 - do something and let me know when you have finished (with a mon trap)
 - if mon doesn't get the trap in a reasonable period of time,
 the service is considered failed.

Check the mon man page for traptimeout and trapduration.
That will let you do what you want.  Here's a snippet from my config:

 watch trapthing
service whereareyou
description go red if we don't hear from you
traptimeout 5m
trapduration 1s

The above service will go into failure mode if a trap is not
received in 5 minutes (traptimeout).  After the trap is received,
the service will be marked OK after 1 second (trapduration).



Re: fping.monitor 0.99.2

2002-02-21 Thread Ed Ravin

Steven F Siirila writes:
 
 Bug Report:
 If fping.monitor parses an unrecognized line from fping, it writes that
 line to stderr BEFORE writing anything else to stderr/stdout.  This causes
 the resulting error message to appear in the summary line instead of the
 sorted list of hosts appearing there.

Do you have the patch I posted a while ago for fping.monitor?  It adds
code to separate hosts and details the way most monitors do.  I see
that I didn't fix the STDERR problem though.  Try applying my patch
and then making this additional change:


delete the line:
print STDERR unidentified output from fping: [$_]\n;

and replace with:
$details{unknown}= $_;
push @unreachable, unknown;

Which should print all unknown messages at the end, with the fake
hostname unknown.  If this isn't sufficient, please post the messages
you're getting.

-- Ed


--- fping.monitor   2001/10/12 04:37:52 1.1
+++ fping.monitor   2001/10/12 04:39:13
@@ -53,6 +53,7 @@
 my $CMD = fping -e -r $RETRIES -t $TIMEOUT;
 my $START_TIME = time;
 my $END_TIME;
+my %details;
 
 exit 0 if (@ARGV == 0);
 
@@ -91,6 +92,16 @@
push @unreachable, $1;
 }
 
+# ICMP Host Unreachable from 1.2.3.4 for ICMP Echo sent to 2.4.6.8
+
+   elsif (/^ICMP (.*) for ICMP Echo sent to (\S+)/)
+   {
+   if (! exists $details{$2})
+   {
+   $details{$2}= $_;
+   }
+   }
+
 else
 {
print STDERR unidentified output from fping: [$_]\n;
@@ -149,6 +160,11 @@
 }
 
 print \n;
+
+   foreach my $ipnum (@unreachable)
+   {
+   print $ipnum,  : , $details{$ipnum}, \n if exists $details{$ipnum};
+   }
 }
 
 



Re: suggestion: dependency check on error

2002-02-14 Thread Ed Ravin

[EMAIL PROTECTED] writes:
 
 Currently, when I use the depend keyword to suppress alerts after a
 failure, I occasionally get an alert from the dependent monitor because the
 service it depends on goes down before its monitor is next run.

This problem has been discussed several times on the list in the past.
The typical fix is to check the lower-level services at least
twice as often as the upper-level ones - for example, I check ping
every 30 or 45 seconds, and TCP layer services no more than every
3 minutes.  Of course, this just drastically shortens the window
for the extra alarms, when we'd rather eliminate it entirely.

How to fix this properly?  Here's one suggestion (clipped from
a mon discussion in March 2001):

Ed Ravin wrote:
 *) make sure all dependencies are current before alerting - before
 sending an alert for a problem, make sure that all tests that the currently
 failed test depends on have been tested AFTER the item in question.  For
 example, if mailserver:smtp depends on mailserver:ping and mailrouter:ping,
 and mailserver:smtp has failed at 10:00 PM, suppress the alert for
 mailerserver:smtp until the other two tests have been run.

Jim Trocki wrote:
 Good behavior. This minimizes alerts, imposes no extra burdon on
 the network, and minimal burdon on the CPU (a few extra conditionals
 evaluated, depending on how elaborate your dependencies are).  The
 trade-off is that it delays the alert for a failure in order to be sure
 all the dependencies are satisfied. If you do not carefully configure
 your poll intervals, you'll surely get delayed alerts.  This isn't bad,
 however. It's probably a good practice to sort your dependencies into
 a tree, where the services which are root dependencies get checked more
 frequently than the lower levels.

Back in the present [EMAIL PROTECTED] wrote:
 It would be nice if there was some way to force mon to run all of the
 monitors in a dependency tree, starting from the top of the tree, when an
 error is detected. This would completely eliminate any spurious alerts and
 make it clear what the underlying problem is.

That was also discussed back in March:

Jim Trocki wrote:
 *) force all dependencies to be tested before alerting

 Not good behavior. It accomplishes the task of minimizing alerts due
 to dependency problems, but at an extreme cost when failures occur.
 I can imagine a not-too-complicated setup where frequently failing leaf
 nodes trigger large system load because all the service checks of the
 dependency graph get scheduled. The trade-off is that it minimizes the
 delay between the time a failure is detected and the alert is sent,
 at the expense of lots of extra CPU cycles.

I had also proposed a compromise, of sorts:

 *) depend_maxage parameter - allow user to specify in mon.cf how old
 a dependency can be before it should be re-tested (or waited for) - for
 example, depend_maxage 30 means that if a dependency for a currently
 failed test has not been tested in the last 30 seconds, make sure it is
 re-tested before alerting.

I think this parameter, if implemented, would let the system
administrator specify exactly how much tradeoff they wanted
between prompt alerts and accurate alerts.  Here's a possible
specification:

depend_maxage SECONDS  - when a service fails, do not send
  an alert unless all of the services it depends on have also
  been tested within the past SECONDS seconds.  If SECONDS is zero,
  then the alert for this service is not sent until the scheduler
  detects that all dependent tests have been run at least once
  since the first_failure time of this service.

Now all we need is someone who wants to code it!




Re: Mon under HPUX anyone?

2002-01-08 Thread Ed Ravin

Jerry Grooms writes:
 After following the routine setup, including the creation of the
 perl *.ph files, a simple start-up test resulted in a flame-out...
 
 apparently something is amiss in the generated *.ph files, which, of
 course, is a perl/HPUX problem and apparently no fault of Mon's.

This is a common setup problem - ISTR that you can just delete
the erroneous lines from the *.ph files and then everything
will work swimmingly.

 ./mon -f -c mon.cf -b `pwd`
 Bareword found where operator expected at 
 /opt/local/apps/perl5/lib/site_perl/5.005/PA-RISC2.0/sys/stdsyms.ph line 193, 
 near _FILE_OFFSET_BITS
 (Missing operator before _FILE_OFFSET_BITS?)
 
 blah, blah, repeat infinatum
 ...



odd dependency problem?

2002-01-08 Thread Ed Ravin

I just upgraded a host to mon 0.99.2.  One of my services was as follows:

service router_bgp
description BGP session with ISP
depend SELF:interfaces
interval 8m
monitor snmpvar.monitor

This syntax worked in mon 0.38.21, but in 0.99.2, it said the service
was unchecked and there was no time scheduled for the next check.
It also said that the result of the last dependency was 0.  Does
that have anything to do with not having any alerts defined?  When
I removed the depend SELF:interfaces line it worked properly.

PS: Jim, thanks for the extra syntax checking, it uncovered a
handful of m4 typos that I had made that mon had just silently
ignored until now.

-- Ed




New mon client: monfailures

2002-01-02 Thread Ed Ravin

Attached is monfailures, a simple command line Mon client that does
a list failures and outputs a short summary.  It's meant for the staff
at my shop who are too lazy to fire up their Web browsers to check
Mon.  I can add this command to their .profile so at least they
will see any failures when they log in each morning.

A typical invocation of monfailures would look like this (we hope):

  $ monfailures
  No failures found.

And if something Mon was watching had failed, you might see a list
like this:

   Hostgroup:Service   Down Since   Error Summary
   -   --   -
   frog-servers:ping   Fri Jan 11 04:58:40  [acked] frog1
   msmail:smtp999  Fri Jan 11 04:58:54  msmail33

Note the [acked] flag for an acknowledged failure.

Since in my shop, the Mon username and password are hardcoded in
the script (got to make it easy for those lazy operators :-), I
put the script into inetd on a high-numbered port so the staff can
do something like netcat monhost 12345 to see the current status
(actually, I wrote a short script to do that to further ease the
burden on the command-line challenged folks).

-- Ed


#!/usr/bin/perl -w

# Quickly show Mon failure status from command line.

# to configure, hard-code the user and password for either
# your public Mon username or a username that is only allowed
# to use the list command and nothing else.  I run this
# script out of inetd on the mon server so the people who can
# see its results can't read the script (and see the hard-coded
# password).

# Written by Ed Ravin [EMAIL PROTECTED] Wed Jan  2 12:23:44 EST 2002
# Release Version: 1.2


# $Header$

use strict;


my %opt;
use Getopt::Long;
GetOptions (\%opt, debug,  server=s, port=s, user=s, password=s);

  configurable stuff 
my $default_user=readonly;
my $default_password= public;
 


my $debug= $opt{'debug'} || 0; 

my (%failures);
my ($now);


use Mon::Client;

my $mon;

# find the client

if (!defined ($mon = Mon::Client-new)) {
die $0: could not create client object: $@;
}

if (defined $opt{'server'}) {
$mon-host ($opt{'server'});
}
else {
$mon-host (localhost);
}

$mon-port ($opt{'port'})   if (defined $opt{'port'});
$mon-username($opt{'user'} || $default_user);
$mon-password($opt{'password'} || $default_password);

$mon-connect;
die $0: Could not connect to server:  . $mon-error . \n
unless $mon-connected;

$mon-login;
die $0: login failure:  . $mon-error . \n if $mon-error;


# Load data from Mon


%failures = $mon-list_failures;
die $0: Error doing list_failures :  . $mon-error
if ($mon-error);

$now= time;  # time mon data was fetched


# group=thathost service=port opstatus=0 last_opstatus=0 exitval=1 timer=11
# last_success=0 last_trap=0 last_check=955058065 ack=0 ackcomment=''
# alerts_sent=0 depstatus=0 depend='' monitor='tcp.monitor -p '
# last_summary='thathost'
# last_detail='\0athathost could not connect: Connection refused\0a'
# last_failure=955058067 interval=60 first_failure=955055062
# failure_duration=3052

my ($watch, $service, $downtime, $summary, $acked);
format STDOUT_TOP =

Hostgroup:Service   Down Since   Error Summary
-   --   -
.

format STDOUT =
@  @  @
$watch . : . $service,   $downtime, $summary
.

# list out any failures
if (%failures)
{
foreach $watch (keys %failures) {
   foreach $service (keys %{$failures{$watch}}) {
my $sref= \%{$failures{$watch}-{$service}};
$downtime= localtime $sref-{'first_failure'};
$acked= $sref-{'ack'} !=0;
$summary= $sref-{'last_summary'};

$summary= [acked] $summary if $acked;
write;
}
}
print \n;
exit(1);
}
else
{
print No failures found.\n;
exit(0);
}



Patch for minor startup nits in Mon 0.99.2

2001-11-16 Thread Ed Ravin

The patch below fixes two nits in Mon 0.99.2:

(1) NetBSD startup scripts (known as rc.d scripts) work much
better when the name of the daemon is in the command line.

(2) NetBSD, like Linux and OpenBSD and FreeBSD, default to
a Unix socket for syslog.  Without this patch, mon's syslog
messages go into the bit bucket on NetBSD (and probably FreeBSD).

-- Ed



--- mon 2001/10/12 03:37:32 1.2
+++ mon 2001/11/17 01:07:27
@@ -29,6 +29,9 @@
 my $AUTHOR='[EMAIL PROTECTED]';
 my $RELEASE='$ProjectVersion: mon-0-99-2.6 $';
 
+$0= mon .   . join( , @ARGV)
+   if $^O eq netbsd; # NetBSD rc.d script compatibility
+
 #
 # modules in the perl distribution
 #
@@ -224,7 +227,7 @@
 }
 }
 
-($^O eq linux || $^O eq openbsd)  setlogsock ('unix');
+($^O eq linux || $^O =~ ^(open|free|net)bsd\$)  setlogsock ('unix');
 
 openlog (mon, cons,pid, $CF{SYSLOG_FACILITY});