David, off list reply.
Have a look at this script. I use it on my border router to monitor the
health of my client's sites. You might get some clues from it to adapt it
to your needs. Use swatch on the /var/log/messages file, or use the
output directly to trigger your qpage server. Runs from root cron every 6
minutes.
Please do not publish it.
--
Howard.
______________________________________________________
LANNet Computing Associates <http://www.lannet.com.au>
#!/bin/sh
LINKS=" gw.alb2.Albury.telstra.net"
LINKS=${LINKS}" janus.lannet.com.au"
LINKS=${LINKS}" keep.lannet.com.au"
LINKS=${LINKS}" hero.lannet.com.au"
LINKS=${LINKS}" scribe.lannet.com.au"
# LINKS=${LINKS}" guru.lannet.com.au"
LINKS=${LINKS}" acay.aircentre.com"
LINKS=${LINKS}" ac.aircentre.com"
LINKS=${LINKS}" acbth.aircentre.com"
# LINKS=${LINKS}" saxon.af.com.au"
# LINKS=${LINKS}" murray.af.com.au"
LINKS=${LINKS}" scout.auf.asn.au"
LINKS=${LINKS}" stratos.auf.asn.au"
LINKS=${LINKS}" skydart.auf.asn.au"
# LINKS=${LINKS}" gw.skills.asn.au"
# LINKS=${LINKS}" rsi.skills.asn.au"
# LINKS=${LINKS}" sit.skills.asn.au"
LINKS=${LINKS}" cwsvr.caterworld.com.au"
LINKS=${LINKS}" cway.caterworld.com.au"
LINKS=${LINKS}" cwbdg.caterworld.com.au"
LINKS=${LINKS}" cwglg.caterworld.com.au"
LINKS=${LINKS}" cwml.caterworld.com.au"
LINKS=${LINKS}" cwsht.caterworld.com.au"
LINKS=${LINKS}" cwwg.caterworld.com.au"
# LINKS=${LINKS}" atelal.atel.com.au"
# LINKS=${LINKS}" atelwd.atel.com.au"
# LINKS=${LINKS}" atelwg.atel.com.au"
# LINKS=${LINKS}" atelwn.atel.com.au"
##### Add additional sites above here (note leading space in string) ######
###########################################################################
##### Nothing to change below here ########################################
# Mods register
# 25 Feb 00 - change crontab cycle from 5 to 6 minutes
# 25 Feb 00 - change log count before blowing whistle from 3 to 2
# 25 Feb 00 - the link down alarm response time is now reduced
# from >10<15 mins to >6<12 mins
#
PATH=/bin:/usr/bin:/sbin:/usr/sbin
docheck() {
# try twice more before we log it
ping -qnc1 $i >/dev/null
PINGERROR=$?
if [ ${PINGERROR}x != 0x ]; then
# first retry
ping -qnc1 $i >/dev/null
PINGERROR=$?
if [ ${PINGERROR}x != 0x ]; then
# time to take some action like logging it
date -u +"%Y-%m-%d %H:%M:%S %Z" >> /tmp/$i
if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then
# this is the second logging (6 failures) so blow the whistle
MSG="LINK DOWN ALARM $i"
logger -i -t `basename $0` ${MSG}
mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
sendmail -q
fi
fi
fi
}
for i in ${LINKS}; do
ping -qnc1 $i >/dev/null
PINGERROR=$?
if [ ${PINGERROR}x = 0x ]; then
# the machine is talking to us
if [ -e /tmp/$i ]; then
# we had a problem but all is now well
if [ `wc -l /tmp/$i | awk '{print $1}'`x = 2x ]; then
# if we sent an alarm we must cancel the alarm
MSG="CANCEL ALARM $i"
logger -i -t `basename $0` ${MSG}
mail -s "** ${MSG}" [EMAIL PROTECTED] <. >/dev/null
sendmail -q
fi
# get rid of the logging file
rm -f /tmp/$i
fi
else
if [ -e /tmp/$i ]; then
# we have already had some failures
if [ `wc -l /tmp/$i | awk '{print $1}'`x != 2x ]; then
# but not yet enough to blow the whistle
docheck
fi
else
docheck
fi
fi
done
--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug