On Tue, May 16, 2006 at 02:46:54PM -0400, Ed Ravin wrote:
> I need to automate the "kick something when it falls over" stage of
> system management.  Mon is the way we detect that things have fallen
> over, but the host Mon runs on is not the host that has the privileges
> to kick things.  So here's my question:
> 
> Has anyone built a Mon client that can make decisions (or invoke scripts)
> based on the status of a particular service?  I can cook something up
> if needed, but thought it would be wise to see what other people are doing.
> I think what I need is a client that will return a non-zero exit status
> if a particular watch/service is down for N seconds and not acked.

Didn't get any responses.  I ended up updating the "monfailures" client
to give it a few new features to dump out the individual fields in
Mon's entry for the service, and to control listing based on the value
of a field.  A bit primitive, but useful for when you need simple
information that would otherwise require putting the Mon API interface
into some other script.  The new version of monfailures is attached.
I also improved the -include and -exclude features to work in more cases,
and to work for service names as well as watch names, and added a perldoc
man page.

        -- Ed

#!/usr/local/bin/perl5.6.1 -w

# Quickly show Mon failure status from command line.

# to configure, hard-code the user and password for either
# your public Mon username or a username that is only allowed
# to use the "list" command and nothing else.  I run this
# script out of inetd on the mon server so the people who can
# see its results can't read the script (and see the hard-coded
# password).

# use --exclude or --include (or set their default values in the
# script) to exclude or include only particular regexp matches of
# watches.

# other features (-fields, -match) for getting out more data
# or for testing for failed services via command line

# Written by Ed Ravin <[EMAIL PROTECTED]> Jan  2002.
# made available to the public by courtesy of PANIX (http://www.panix.com).
# This script is licensed under the GPL.

# Updated May 2006 with field control and other features


# $Header: /devel/build/NetBSD/mon/mon-1.1-devel/mon/clients/RCS/monfailures,v 
1.8 2006/05/20 01:23:52 root Exp $

use strict;


my %opt;
use Getopt::Long;

my $usage="Usage: monfailures [--server host] [--port port] [--user user] 
[--password pw] [--timeout n] [--include watch-regexp] [--exclude watch-regexp] 
[--fields {ALL|f1,f2,...}] [--testfield 'fieldname op value']\n";

die $usage unless
  GetOptions (\%opt, "debug",  "testfield=s", "fields=s", "server=s", "port=s", 
 "timeout=i", "user=s", "password=s", "include=s", "exclude=s");

############################  configurable stuff  - or put in defaults file
my $defaults_file= "/etc/mon/monfailures.cf";

my $default_user="public";
my $default_password= "readonly";
my $default_server= "localhost";
my $default_timeout= 120;

my $default_include= ".*";
my $default_exclude= "";
############################ 


my $debug= $opt{'debug'} || 0; 

my @fields= ();
if (exists($opt{'fields'}))
{
        @fields= split ',' , $opt{'fields'};
}

my $teststr= $opt{'testfield'} || "";
my ($testfield, $testop, $testval)= ("", "", "");
if (length($teststr))
{
        ($testfield, $testop, $testval)= split(' ', $teststr);
        warn "testfield=$testfield, testop=$testop, testval=$testval\n" if 
$debug;

        die "$0: illegal characters in --testfield option\n"
                if $testfield =~ /[`'"$ ]/;

        die "$0: illegal fieldname in --testfield option\n"
                unless $testfield =~ /^\w+$/;

        die "$0: illegal test operator in --testfield option\n"
        unless ($testop eq "+" or $testop eq "-" or $testop eq "=="
                        or $testop eq "!=" or $testop eq ">" or $testop eq "<");

        die "$0: illegal integer value in --testfield option\n" unless
                $testval =~ /^-?\d+$/;
}

my (%failures, %disabled);
my ($now);

use Mon::Client;

# format of defaults file:
#  keyword = VALUE (no spaces allowed in VALUE)
#  leading # sign for comments
#  valid keywords: user, password, server, include, exclude, timeout

if (-f $defaults_file)
{
        if ( open(DEF, "<$defaults_file"))
        {
                my @defaults= <DEF>;
                close DEF;

                foreach $_ (@defaults)
                {
                        next if /^\s*#/;
                        next if /^$/;
                        $default_user= $1 if     /^\s*user\s*=\s*(\S+)/;
                        $default_password= $1 if /^\s*password\s*=\s*(\S+)/;
                        $default_server= $1 if   /^\s*server\s*=\s*(\S+)/;
                        $default_include= $1 if   /^\s*include\s*=\s*(\S+)/;
                        $default_exclude= $1 if   /^\s*exclude\s*=\s*(\S+)/;
                        $default_timeout= $1 if   /^\s*exclude\s*=\s*(\S+)/;
                }
        }
        else
        {
                warn "monfailures: cannot open defaults file $defaults_file: 
$!\n";
        }
}

my $include_filter= $opt{'include'} || $default_include;
my $exclude_filter= $opt{'exclude'} || $default_exclude;
my $timeout= $opt{'timeout'} || $default_timeout;

my $mon;

# find the client

    if (!defined ($mon = Mon::Client->new)) {
                die "$0: could not create client object: $@";
    }

        $mon->host ($opt{'server'} || $default_server);
        $mon->port ($opt{'port'})   if (defined $opt{'port'});
        $mon->username($opt{'user'} ||
                $ENV{'MONFAILURES_USER'} ||
                        $default_user);
        $mon->password($opt{'password'} ||
                $ENV{'MONFAILURES_PASSWORD'} ||
                        $default_password);

        alarm($timeout);        # die if we get stuck talking to Mon

        $mon->connect;
        die "$0: Could not connect to server: " . $mon->error . "\n"
                unless $mon->connected;

        $mon->login;
        die "$0: login failure: " . $mon->error . "\n" if $mon->error;


        # Load data from Mon

        %disabled= $mon->list_disabled;
        die "$0: Error doing list_disabled : " . $mon->error
                if ($mon->error);

        %failures = $mon->list_failures;
        die "$0: Error doing list_failures : " . $mon->error
                if ($mon->error);

        $now= time;  # time mon data was fetched


# group=thathost service=port8888 opstatus=0 last_opstatus=0 exitval=1 timer=11
# last_success=0 last_trap=0 last_check=955058065 ack=0 ackcomment=''
# alerts_sent=0 depstatus=0 depend='' monitor='tcp.monitor -p 8888'
# last_summary='thathost'
# last_detail='\0athathost could not connect: Connection refused\0a'
# last_failure=955058067 interval=60 first_failure=955055062
# failure_duration=3052

my ($watch, $service, $downtime, $summary, $acked);
format STDOUT_TOP =

Hostgroup:Service               Down Since           Error Summary
-----------------               ----------           -------------
.

format STDOUT =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<  @<<<<<<<<<<<<<<<<<<<<<<<<<
$watch . ":" . $service,   $downtime,             $summary
.

# list out any failures
my $failures_shown= 0;
if (%failures)
{
        foreach $watch (keys %failures) {
           next if exists($disabled{"watches"}{$watch});
           foreach $service (keys %{$failures{$watch}}) {
                        next if length($exclude_filter) and
                                "$watch:$service" =~ $exclude_filter;
                        next unless "$watch:$service" =~ $include_filter;
                        next if exists($disabled{"services"}{$watch}{$service});
                        my $sref= \%{$failures{$watch}->{$service}};

                        # It's on the include list, and it's down.
                        # Now we test an individual field if asked on command 
line

                        if (length($teststr)) {
                                warn "$0: testing $teststr\n" if $debug;
                                die "$0: field $testfield does not exist, 
aborting.\n"
                                        unless exists($sref->{$testfield});
                                next unless eval "($sref->{$testfield} $testop 
$testval)";
                        }

                        # print out the summary failure info for the service
                        # or print specific field-based info as per 
command-line args

                        if (@fields == 0) {
                                $downtime= localtime $sref->{'first_failure'};
                                $acked= $sref->{'ack'} !=0;
                                $summary= $sref->{'last_summary'};

                                $summary= "[acked] $summary" if $acked;
                                write;
                                } else {
                                        print "$watch:$service: ";
                                        if (@fields == 1 && $fields[0] eq 
"ALL") {
                                                foreach my $field (keys 
%{$sref}) {
                                                        print 
"$field=$sref->{$field}\t";
                                                }
                                        } else {
                                                foreach my $field (@fields) {
                                                        print 
"$field=$sref->{$field}\t"
                                                                if 
exists($sref->{$field});
                                                }
                                        }
                                        print "\n";
                                }
                                $failures_shown= 1;
                        }
        }
        if ($failures_shown)
        {
                print "\n";
                exit(1);
        }
}
print "No failures found.\n";
exit(0);

__END__

=head1 NAME

monfailures - display failed services in Mon

=head1 SYNOPSIS

B<monfailures> [--server I<host>] [--port I<port>] [--timeout I<seconds>]
[--user I<username>] [--password I<password>]
[--include I<watch-regexp>] [--exclude I<watch-regexp>]
[--fields { ALL | f1,f2,f3 [...] } ]
[--testfield "I<fieldname op value>" ]

=head1 DESCRIPTION

B<monfailures> queries a Mon server and displays a quick summary of
failed services.  With the available options, you may display only
a subset of the services being monitored, exclude a subset of services,
display the individual fields in Mon's record for a service, and only
display a service if a particular field's value passes a numeric test.

B<monfailures> will attempt to read in a configuration file from
/etc/mon/monfailures.cf .  Several of its options can be set there,
in the form:

=item B<keyword> = I<value>

Where B<keyword> is one of B<user>, B<password>, B<server>, B<timeout>,
B<include>, or B<exclude>.  Blank lines or lines that begin with a # sign
are ignored.  Options specified on the command line override any found
in the configuration file.

=head1 OPTIONS

=head2 Connecting to the Mon server

=item B<--user I<username>>

=item B<--password I<password>>  Specify the username and/or password
to connect to the Mon server.  The default values (public/readonly) are
hard-coded in the script.  The username used by B<monfailures> needs only
permissions to the "list" command in Mon.

=item B<--server I<hostname>>

=item B<--port I<port-number>>  Specify the hostname and port number
of the Mon server.  The defaults are "localhost" and 2583.

=item B<--timeout I<seconds>>  Abort if the transaction with the Mon
server takes longer than the specified number of seconds.  The default
value is 120.


=head2 Filter options

=item B<--include I<watch-regexp>>  Only list failed services whose
servicenames match Perl regular expression I<watch-regexp>.  The watch
and servicename are concatenated together with a colon character, similar
to the way they are referenced in dependency clauses in the Mon configuration
file, and the combined watch:servicename are compared against I<watch-regexp>.
If the failed service matches, it is considered for display by B<monfailures>,
otherwise it is skipped.

=item B<--exclude I<watch-regexp>>  Do not list failed services woh
servicenames match Perl regular expresson I<watch-regexp>.  This
option overrides any matches from the B<--include> option.

=head2 Field options

=item B<--fields> { ALL | I<f1>,I<f2>,I<f3>[, ...] }  Instead of displaying
a quick summary of the failed service, display all the raw fields used
by Mon to track the service (the B<--fields ALL> option) or just the
raw fields specified in a comma-separated list.

=item B<--testfield> "I<fieldname operator value>"  Only display the
failed service if field I<fieldname> matches the numeric test specified.
I<operator> must be one of the four relational operators: == (test for
equality), != (test for inequality), < (test if field is less than), or
> (test if field is greater than).  I<value> must be an integer.  If the
specified expression evaluated to true, B<monfailures> will display the
service.  This feature is intended to allow scripts calling B<monfailures>
to make decisions (for example, to reboot a server after a service
has been down for longer than N seconds).

=head1 RETURN VALUE

B<monfailures> will return zero if no failures were found that matched
the requested criteria (and passed any specified test), or 1 if any
failures were displayed.

=head1 EXAMPLES

=item B<monfailures --include ^mailservers:smtp --testfield "exitval == 1">

Check Mon for failed services with the watch name "mailservers" and a
service name beginning with "smtp", and report on the failure if the Mon
field "exitval" has a value of 1.

=item B<monfailures --fields ALL>

Report on all failed services in raw format - this will let you see
the field names to choose for the B<--fields> or B<--testfields> options.


=head1 NOTES

Only one instance of any option may be specified.

Disabled services are not listed.

=head1 AUTHOR

B<monfailures> was written by Ed Ravin <[EMAIL PROTECTED]>, and has been
made available to the public under the GNU Public License courtesy
of  PANIX (http://panix.com).

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to