On Sep 4, 2008, at 2:45 PM, david_sloboda wrote:
On a RHEL client host that is monitored, I have a scheduled cron job that starts every five minutes, runs, and completes in under a minute.
Every once in a while the job hangs.
I would like to create a test for this process in Zenoss, so that if the job is up for a long time (e.g. an hour), an alert is raised. I have searched the forums, howtos and admin guide http://www.zenoss.com/community/docs/zenoss-guide/2.2.3/ch12s03.html WRT process monitoring. This is a little different from httpd or mysqld process monitoring, in that this is not a daemon that is "always up." The job is supposed to start and stop. Under correct conditions it is "not there" more often than it is there.

Has anyone done this before?
One thought I has was to test for the process showing up for 20 consecutive counts in the process monitoring on Zenoss. If Zenoss polls every 3 minutes and sees 20x3 minutes worth of counts, then that process has been up for 60 minutes. I don't see how to set that zProperty. As well, the process comes and goes, which might screw up Zenoss process counting.

A second thought is to create a shell script that tests the output of ps -eo bsdstart, and compares it to the current HH:MM time; if the value is too large, set a flag in a file. Report the flag through net-snmpd to zenoss.

As an example, this is what I would try on the RHEL client:

Code:

client$ ps -eo pid,bsdtime,bsdstart,args | grep process-name | grep - v grep
12007   0:02  11:15 /opt/process-name
client$ ps -eo pid,bsdtime,bsdstart,args | grep process-name | grep - v grep | awk ' { print $3 }'
11:15
client$ date +%H:%M
11:23
client$




I'd subtract the differences in hours and it it was 1 or higher, write that to a file, then have net-snmpd report that back to Zenoss. I've done that before with mailq output. From the zenoss server, this shows that the output of mailq is 36 (the value in the file /var/net-snmp/mailqcount)


Code:

[EMAIL PROTECTED] ~ $ snmpwalk -v2c -c itsasecret client . 1.3.6.1.4.1.2021.8.1
UCD-SNMP-MIB::extIndex.1 = INTEGER: 1
UCD-SNMP-MIB::extNames.1 = STRING: getMailQ
UCD-SNMP-MIB::extCommand.1 = STRING: /bin/cat /var/net-snmp/mailqcount
UCD-SNMP-MIB::extResult.1 = INTEGER: 0
UCD-SNMP-MIB::extOutput.1 = STRING: 36
UCD-SNMP-MIB::extErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::extErrFixCmd.1 = STRING:
[EMAIL PROTECTED] ~ $




Any advice or suggestions would be appreciated.  Thank you in advance.

You could have snmpd invoke this proctime.py script I just put up. It returns the number of seconds the given process has been running.

http://chet.crashed.net/proctime.py

_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to