David,  off list reply.

Have a look at this script.  I use it on my border router to monitor the
health of my client's sites.  You might get some clues from it to adapt it
to your needs.  Use swatch on the /var/log/messages file, or use the
output directly to trigger your qpage server.  Runs from root cron every 6
minutes.

Please do not publish it.

-- 
Howard.
______________________________________________________
LANNet Computing Associates <http://www.lannet.com.au>

#!/bin/sh

LINKS=" gw.alb2.Albury.telstra.net"
LINKS=${LINKS}" janus.lannet.com.au"
LINKS=${LINKS}" keep.lannet.com.au"
LINKS=${LINKS}" hero.lannet.com.au"
LINKS=${LINKS}" scribe.lannet.com.au"
# LINKS=${LINKS}" guru.lannet.com.au"
LINKS=${LINKS}" acay.aircentre.com"
LINKS=${LINKS}" ac.aircentre.com"
LINKS=${LINKS}" acbth.aircentre.com"
# LINKS=${LINKS}" saxon.af.com.au"
# LINKS=${LINKS}" murray.af.com.au"
LINKS=${LINKS}" scout.auf.asn.au"
LINKS=${LINKS}" stratos.auf.asn.au"
LINKS=${LINKS}" skydart.auf.asn.au"
# LINKS=${LINKS}" gw.skills.asn.au"
# LINKS=${LINKS}" rsi.skills.asn.au"
# LINKS=${LINKS}" sit.skills.asn.au"
LINKS=${LINKS}" cwsvr.caterworld.com.au"
LINKS=${LINKS}" cway.caterworld.com.au"
LINKS=${LINKS}" cwbdg.caterworld.com.au"
LINKS=${LINKS}" cwglg.caterworld.com.au"
LINKS=${LINKS}" cwml.caterworld.com.au"
LINKS=${LINKS}" cwsht.caterworld.com.au"
LINKS=${LINKS}" cwwg.caterworld.com.au"
# LINKS=${LINKS}" atelal.atel.com.au"
# LINKS=${LINKS}" atelwd.atel.com.au"
# LINKS=${LINKS}" atelwg.atel.com.au"
# LINKS=${LINKS}" atelwn.atel.com.au"

##### Add additional sites above here (note leading space in string) ######
###########################################################################
##### Nothing to change below here ########################################

# Mods register
# 25 Feb 00 - change crontab cycle from 5 to 6 minutes
# 25 Feb 00 - change log count before blowing whistle from 3 to 2
# 25 Feb 00 - the link down alarm response time is now reduced
#             from >10<15 mins to >6<12 mins
#

PATH=/bin:/usr/bin:/sbin:/usr/sbin

docheck() {
  # try twice more before we log it
  ping -qnc1 $i >/dev/null
  PINGERROR=$?
  if [ ${PINGERROR}x != 0x ]; then
    # first retry
    ping -qnc1 $i >/dev/null
    PINGERROR=$?
    if [ ${PINGERROR}x != 0x ]; then
      # time to take some action like logging it
      date -u +"%Y-%m-%d %H:%M:%S %Z" >> /tmp/$i
      if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then
        # this is the second logging (6 failures) so blow the whistle
        MSG="LINK DOWN ALARM $i"
        logger -i -t `basename $0` ${MSG}
        mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
        sendmail -q
      fi
    fi
  fi
}

for i in ${LINKS}; do
  ping -qnc1 $i >/dev/null
  PINGERROR=$?
  if [ ${PINGERROR}x = 0x ]; then
    # the machine is talking to us
    if [ -e /tmp/$i ]; then
      # we had a problem but all is now well
      if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then 
        # if we sent an alarm we must cancel the alarm
        MSG="CANCEL ALARM $i"
        logger -i -t `basename $0` ${MSG}
        mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
        sendmail -q
      fi
      # get rid of the logging file
      rm -f /tmp/$i
    fi
  else
    if [ -e /tmp/$i ]; then
      # we have already had some failures
      if [ `wc -l /tmp/$i | awk '{print $1}'`x != 2x ]; then
        # but not yet enough to blow the whistle
        docheck
      fi
    else
      docheck
    fi
  fi
done



--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug

Reply via email to