On Tue, May 16, 2006 at 02:46:54PM -0400, Ed Ravin wrote: > I need to automate the "kick something when it falls over" stage of > system management. Mon is the way we detect that things have fallen > over, but the host Mon runs on is not the host that has the privileges > to kick things. So here's my question: > > Has anyone built a Mon client that can make decisions (or invoke scripts) > based on the status of a particular service? I can cook something up > if needed, but thought it would be wise to see what other people are doing. > I think what I need is a client that will return a non-zero exit status > if a particular watch/service is down for N seconds and not acked.
Didn't get any responses. I ended up updating the "monfailures" client to give it a few new features to dump out the individual fields in Mon's entry for the service, and to control listing based on the value of a field. A bit primitive, but useful for when you need simple information that would otherwise require putting the Mon API interface into some other script. The new version of monfailures is attached. I also improved the -include and -exclude features to work in more cases, and to work for service names as well as watch names, and added a perldoc man page. -- Ed
#!/usr/local/bin/perl5.6.1 -w # Quickly show Mon failure status from command line. # to configure, hard-code the user and password for either # your public Mon username or a username that is only allowed # to use the "list" command and nothing else. I run this # script out of inetd on the mon server so the people who can # see its results can't read the script (and see the hard-coded # password). # use --exclude or --include (or set their default values in the # script) to exclude or include only particular regexp matches of # watches. # other features (-fields, -match) for getting out more data # or for testing for failed services via command line # Written by Ed Ravin <[EMAIL PROTECTED]> Jan 2002. # made available to the public by courtesy of PANIX (http://www.panix.com). # This script is licensed under the GPL. # Updated May 2006 with field control and other features # $Header: /devel/build/NetBSD/mon/mon-1.1-devel/mon/clients/RCS/monfailures,v 1.8 2006/05/20 01:23:52 root Exp $ use strict; my %opt; use Getopt::Long; my $usage="Usage: monfailures [--server host] [--port port] [--user user] [--password pw] [--timeout n] [--include watch-regexp] [--exclude watch-regexp] [--fields {ALL|f1,f2,...}] [--testfield 'fieldname op value']\n"; die $usage unless GetOptions (\%opt, "debug", "testfield=s", "fields=s", "server=s", "port=s", "timeout=i", "user=s", "password=s", "include=s", "exclude=s"); ############################ configurable stuff - or put in defaults file my $defaults_file= "/etc/mon/monfailures.cf"; my $default_user="public"; my $default_password= "readonly"; my $default_server= "localhost"; my $default_timeout= 120; my $default_include= ".*"; my $default_exclude= ""; ############################ my $debug= $opt{'debug'} || 0; my @fields= (); if (exists($opt{'fields'})) { @fields= split ',' , $opt{'fields'}; } my $teststr= $opt{'testfield'} || ""; my ($testfield, $testop, $testval)= ("", "", ""); if (length($teststr)) { ($testfield, $testop, $testval)= split(' ', $teststr); warn "testfield=$testfield, testop=$testop, testval=$testval\n" if $debug; die "$0: illegal characters in --testfield option\n" if $testfield =~ /[`'"$ ]/; die "$0: illegal fieldname in --testfield option\n" unless $testfield =~ /^\w+$/; die "$0: illegal test operator in --testfield option\n" unless ($testop eq "+" or $testop eq "-" or $testop eq "==" or $testop eq "!=" or $testop eq ">" or $testop eq "<"); die "$0: illegal integer value in --testfield option\n" unless $testval =~ /^-?\d+$/; } my (%failures, %disabled); my ($now); use Mon::Client; # format of defaults file: # keyword = VALUE (no spaces allowed in VALUE) # leading # sign for comments # valid keywords: user, password, server, include, exclude, timeout if (-f $defaults_file) { if ( open(DEF, "<$defaults_file")) { my @defaults= <DEF>; close DEF; foreach $_ (@defaults) { next if /^\s*#/; next if /^$/; $default_user= $1 if /^\s*user\s*=\s*(\S+)/; $default_password= $1 if /^\s*password\s*=\s*(\S+)/; $default_server= $1 if /^\s*server\s*=\s*(\S+)/; $default_include= $1 if /^\s*include\s*=\s*(\S+)/; $default_exclude= $1 if /^\s*exclude\s*=\s*(\S+)/; $default_timeout= $1 if /^\s*exclude\s*=\s*(\S+)/; } } else { warn "monfailures: cannot open defaults file $defaults_file: $!\n"; } } my $include_filter= $opt{'include'} || $default_include; my $exclude_filter= $opt{'exclude'} || $default_exclude; my $timeout= $opt{'timeout'} || $default_timeout; my $mon; # find the client if (!defined ($mon = Mon::Client->new)) { die "$0: could not create client object: $@"; } $mon->host ($opt{'server'} || $default_server); $mon->port ($opt{'port'}) if (defined $opt{'port'}); $mon->username($opt{'user'} || $ENV{'MONFAILURES_USER'} || $default_user); $mon->password($opt{'password'} || $ENV{'MONFAILURES_PASSWORD'} || $default_password); alarm($timeout); # die if we get stuck talking to Mon $mon->connect; die "$0: Could not connect to server: " . $mon->error . "\n" unless $mon->connected; $mon->login; die "$0: login failure: " . $mon->error . "\n" if $mon->error; # Load data from Mon %disabled= $mon->list_disabled; die "$0: Error doing list_disabled : " . $mon->error if ($mon->error); %failures = $mon->list_failures; die "$0: Error doing list_failures : " . $mon->error if ($mon->error); $now= time; # time mon data was fetched # group=thathost service=port8888 opstatus=0 last_opstatus=0 exitval=1 timer=11 # last_success=0 last_trap=0 last_check=955058065 ack=0 ackcomment='' # alerts_sent=0 depstatus=0 depend='' monitor='tcp.monitor -p 8888' # last_summary='thathost' # last_detail='\0athathost could not connect: Connection refused\0a' # last_failure=955058067 interval=60 first_failure=955055062 # failure_duration=3052 my ($watch, $service, $downtime, $summary, $acked); format STDOUT_TOP = Hostgroup:Service Down Since Error Summary ----------------- ---------- ------------- . format STDOUT = @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<<<<<<<<<<<< $watch . ":" . $service, $downtime, $summary . # list out any failures my $failures_shown= 0; if (%failures) { foreach $watch (keys %failures) { next if exists($disabled{"watches"}{$watch}); foreach $service (keys %{$failures{$watch}}) { next if length($exclude_filter) and "$watch:$service" =~ $exclude_filter; next unless "$watch:$service" =~ $include_filter; next if exists($disabled{"services"}{$watch}{$service}); my $sref= \%{$failures{$watch}->{$service}}; # It's on the include list, and it's down. # Now we test an individual field if asked on command line if (length($teststr)) { warn "$0: testing $teststr\n" if $debug; die "$0: field $testfield does not exist, aborting.\n" unless exists($sref->{$testfield}); next unless eval "($sref->{$testfield} $testop $testval)"; } # print out the summary failure info for the service # or print specific field-based info as per command-line args if (@fields == 0) { $downtime= localtime $sref->{'first_failure'}; $acked= $sref->{'ack'} !=0; $summary= $sref->{'last_summary'}; $summary= "[acked] $summary" if $acked; write; } else { print "$watch:$service: "; if (@fields == 1 && $fields[0] eq "ALL") { foreach my $field (keys %{$sref}) { print "$field=$sref->{$field}\t"; } } else { foreach my $field (@fields) { print "$field=$sref->{$field}\t" if exists($sref->{$field}); } } print "\n"; } $failures_shown= 1; } } if ($failures_shown) { print "\n"; exit(1); } } print "No failures found.\n"; exit(0); __END__ =head1 NAME monfailures - display failed services in Mon =head1 SYNOPSIS B<monfailures> [--server I<host>] [--port I<port>] [--timeout I<seconds>] [--user I<username>] [--password I<password>] [--include I<watch-regexp>] [--exclude I<watch-regexp>] [--fields { ALL | f1,f2,f3 [...] } ] [--testfield "I<fieldname op value>" ] =head1 DESCRIPTION B<monfailures> queries a Mon server and displays a quick summary of failed services. With the available options, you may display only a subset of the services being monitored, exclude a subset of services, display the individual fields in Mon's record for a service, and only display a service if a particular field's value passes a numeric test. B<monfailures> will attempt to read in a configuration file from /etc/mon/monfailures.cf . Several of its options can be set there, in the form: =item B<keyword> = I<value> Where B<keyword> is one of B<user>, B<password>, B<server>, B<timeout>, B<include>, or B<exclude>. Blank lines or lines that begin with a # sign are ignored. Options specified on the command line override any found in the configuration file. =head1 OPTIONS =head2 Connecting to the Mon server =item B<--user I<username>> =item B<--password I<password>> Specify the username and/or password to connect to the Mon server. The default values (public/readonly) are hard-coded in the script. The username used by B<monfailures> needs only permissions to the "list" command in Mon. =item B<--server I<hostname>> =item B<--port I<port-number>> Specify the hostname and port number of the Mon server. The defaults are "localhost" and 2583. =item B<--timeout I<seconds>> Abort if the transaction with the Mon server takes longer than the specified number of seconds. The default value is 120. =head2 Filter options =item B<--include I<watch-regexp>> Only list failed services whose servicenames match Perl regular expression I<watch-regexp>. The watch and servicename are concatenated together with a colon character, similar to the way they are referenced in dependency clauses in the Mon configuration file, and the combined watch:servicename are compared against I<watch-regexp>. If the failed service matches, it is considered for display by B<monfailures>, otherwise it is skipped. =item B<--exclude I<watch-regexp>> Do not list failed services woh servicenames match Perl regular expresson I<watch-regexp>. This option overrides any matches from the B<--include> option. =head2 Field options =item B<--fields> { ALL | I<f1>,I<f2>,I<f3>[, ...] } Instead of displaying a quick summary of the failed service, display all the raw fields used by Mon to track the service (the B<--fields ALL> option) or just the raw fields specified in a comma-separated list. =item B<--testfield> "I<fieldname operator value>" Only display the failed service if field I<fieldname> matches the numeric test specified. I<operator> must be one of the four relational operators: == (test for equality), != (test for inequality), < (test if field is less than), or > (test if field is greater than). I<value> must be an integer. If the specified expression evaluated to true, B<monfailures> will display the service. This feature is intended to allow scripts calling B<monfailures> to make decisions (for example, to reboot a server after a service has been down for longer than N seconds). =head1 RETURN VALUE B<monfailures> will return zero if no failures were found that matched the requested criteria (and passed any specified test), or 1 if any failures were displayed. =head1 EXAMPLES =item B<monfailures --include ^mailservers:smtp --testfield "exitval == 1"> Check Mon for failed services with the watch name "mailservers" and a service name beginning with "smtp", and report on the failure if the Mon field "exitval" has a value of 1. =item B<monfailures --fields ALL> Report on all failed services in raw format - this will let you see the field names to choose for the B<--fields> or B<--testfields> options. =head1 NOTES Only one instance of any option may be specified. Disabled services are not listed. =head1 AUTHOR B<monfailures> was written by Ed Ravin <[EMAIL PROTECTED]>, and has been made available to the public under the GNU Public License courtesy of PANIX (http://panix.com).
_______________________________________________ mon mailing list mon@linux.kernel.org http://linux.kernel.org/mailman/listinfo/mon