I'm not sure this patch fixes anything. Do you have a test case in
which the original code was failing?
You have a good point about the heartbeat being very slow. Sending a
heartbeat with every plugin run could be a little too frequent if you
are running a few hundred commands every 60 seconds, several dozen in
parallel at any one time.
I made a bug for it:
http://dev.zenoss.org/trac/ticket/250
-Eric
Willi Langenberger wrote:
According to Antonio Paneiro:
I did find some errors on a testing plugin:
2006-08-09 23:00:11 ERROR zen.zenagios: Command timed out on device SNFEX01:
/usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80
When I issue the same command on linux prompt:
[EMAIL PROTECTED] log]$ /usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80
TCP
OK - 0.001 second response time on port
80|time=0.001391s;0.000000;0.000000;0.000000;10.000000
There is bug in the current zenagios.py version. It doesnt handle
failures in the process execution right. I patched zenagios.py in the
following way (but surely there are better ways):
-8<-------------------------------------------------------------------
Index: zenagios.py
===================================================================
--- zenagios.py (revision 2073)
+++ zenagios.py (working copy)
@@ -87,6 +87,7 @@
def processEnded(self, reason):
"notify the starter that their process is complete"
+ self.reason = reason # can be a failure.Failure instance
self.exitCode = reason.value.exitCode
self.output = [s.strip() for s in self.output.split('\n')][0]
if self.stopped:
@@ -264,15 +265,19 @@
def processEnded(self, pr):
+ """ return value goes to znagios.finished
+ can be a Cmd or failure.Failure instance"""
+ reason, pr.reason = pr.reason, None # del attribute; needed?
self.result = pr
self.lastStop = time.time()
- if not isinstance(pr, failure.Failure):
+ if isinstance(reason, failure.Failure) and pr.exitCode != 0:
+ return reason
+ else:
log.debug('Process %s stopped (%s), %f elapsed' % (
self.name(),
pr.exitCode,
self.lastStop - self.lastStart))
return self
- return pr
def updateConfig(self,device,ipAddress, username, password,
-8<-------------------------------------------------------------------
I can't see any heartbeat errors, however it seems to reset (clear) every
1800 sec
Another bug in znagios.py. The heartbeat() method is only called in
"updateConfig" (default cycle time: every 30min). Probably it should
also be called in ProcessRunner or so...
\wlang{}
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users