On a RHEL client host that is monitored, I have a scheduled cron job that
starts every five minutes, runs, and completes in under a minute.
Every once in a while the job hangs.
I would like to create a test for this process in Zenoss, so that if the job is
up for a long time (e.g. an hour), an alert is raised.
I have searched the forums, howtos and admin guide
http://www.zenoss.com/community/docs/zenoss-guide/2.2.3/ch12s03.html WRT
process monitoring. This is a little different from httpd or mysqld process
monitoring, in that this is not a daemon that is "always up." The job is
supposed to start and stop. Under correct conditions it is "not there" more
often than it is there.
Has anyone done this before?
One thought I has was to test for the process showing up for 20 consecutive
counts in the process monitoring on Zenoss. If Zenoss polls every 3 minutes
and sees 20x3 minutes worth of counts, then that process has been up for 60
minutes. I don't see how to set that zProperty. As well, the process comes
and goes, which might screw up Zenoss process counting.
A second thought is to create a shell script that tests the output of ps -eo
bsdstart, and compares it to the current HH:MM time; if the value is too
large, set a flag in a file. Report the flag through net-snmpd to zenoss.
As an example, this is what I would try on the RHEL client:
Code:
client$ ps -eo pid,bsdtime,bsdstart,args | grep process-name | grep -v grep
12007 0:02 11:15 /opt/process-name
client$ ps -eo pid,bsdtime,bsdstart,args | grep process-name | grep -v grep |
awk ' { print $3 }'
11:15
client$ date +%H:%M
11:23
client$
I'd subtract the differences in hours and it it was 1 or higher, write that to
a file, then have net-snmpd report that back to Zenoss. I've done that before
with mailq output. From the zenoss server, this shows that the output of mailq
is 36 (the value in the file /var/net-snmp/mailqcount)
Code:
[EMAIL PROTECTED] ~ $ snmpwalk -v2c -c itsasecret client .1.3.6.1.4.1.2021.8.1
UCD-SNMP-MIB::extIndex.1 = INTEGER: 1
UCD-SNMP-MIB::extNames.1 = STRING: getMailQ
UCD-SNMP-MIB::extCommand.1 = STRING: /bin/cat /var/net-snmp/mailqcount
UCD-SNMP-MIB::extResult.1 = INTEGER: 0
UCD-SNMP-MIB::extOutput.1 = STRING: 36
UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::extErrFixCmd.1 = STRING:
[EMAIL PROTECTED] ~ $
Any advice or suggestions would be appreciated. Thank you in advance.
David
-------------------- m2f --------------------
Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=24655#24655
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users