I'm not sure this patch fixes anything. Do you have a test case in which the original code was failing?

You have a good point about the heartbeat being very slow. Sending a heartbeat with every plugin run could be a little too frequent if you are running a few hundred commands every 60 seconds, several dozen in parallel at any one time.

I made a bug for it:

   http://dev.zenoss.org/trac/ticket/250

-Eric

Willi Langenberger wrote:
According to Antonio Paneiro:
I did find some errors on a testing plugin:

2006-08-09 23:00:11 ERROR zen.zenagios: Command timed out on device SNFEX01:
/usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80

When I issue the same command on linux prompt:

[EMAIL PROTECTED] log]$  /usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80 
TCP
OK - 0.001 second response time on port
80|time=0.001391s;0.000000;0.000000;0.000000;10.000000

There is bug in the current zenagios.py version. It doesnt handle
failures in the process execution right. I patched zenagios.py in the
following way (but surely there are better ways):

-8<-------------------------------------------------------------------

Index: zenagios.py
===================================================================
--- zenagios.py (revision 2073)
+++ zenagios.py (working copy)
@@ -87,6 +87,7 @@
def processEnded(self, reason):
         "notify the starter that their process is complete"
+        self.reason = reason    # can be a failure.Failure instance
         self.exitCode = reason.value.exitCode
         self.output = [s.strip() for s in self.output.split('\n')][0]
         if self.stopped:
@@ -264,15 +265,19 @@
def processEnded(self, pr):
+        """ return value goes to znagios.finished
+            can be a Cmd or failure.Failure instance"""
+        reason, pr.reason = pr.reason, None        # del attribute; needed?
         self.result = pr
         self.lastStop = time.time()
-        if not isinstance(pr, failure.Failure):
+        if isinstance(reason, failure.Failure) and pr.exitCode != 0:
+           return reason
+        else:
             log.debug('Process %s stopped (%s), %f elapsed' % (
                 self.name(),
                 pr.exitCode,
                 self.lastStop - self.lastStart))
             return self
-        return pr
def updateConfig(self,device,ipAddress, username, password,

-8<-------------------------------------------------------------------

I can't see any heartbeat errors, however it seems to reset (clear) every
1800 sec

Another bug in znagios.py. The heartbeat() method is only called in
"updateConfig" (default cycle time: every 30min). Probably it should
also be called in ProcessRunner or so...


\wlang{}


_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to