Paneiro

Eric Newton Thu, 10 Aug 2006 06:06:27 -0700

I'm not sure this patch fixes anything. Do you have a test case inwhich the original code was failing?

You have a good point about the heartbeat being very slow. Sending aheartbeat with every plugin run could be a little too frequent if youare running a few hundred commands every 60 seconds, several dozen inparallel at any one time.


I made a bug for it:

   http://dev.zenoss.org/trac/ticket/250

-Eric

Willi Langenberger wrote:

According to Antonio Paneiro:

I did find some errors on a testing plugin:

2006-08-09 23:00:11 ERROR zen.zenagios: Command timed out on device SNFEX01:
/usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80

When I issue the same command on linux prompt:

[EMAIL PROTECTED] log]$  /usr/local/zenoss/libexec/check_tcp -H SNFEX01 -p 80 
TCP
OK - 0.001 second response time on port
80|time=0.001391s;0.000000;0.000000;0.000000;10.000000


There is bug in the current zenagios.py version. It doesnt handle
failures in the process execution right. I patched zenagios.py in the
following way (but surely there are better ways):

-8<-------------------------------------------------------------------

Index: zenagios.py
===================================================================
--- zenagios.py (revision 2073)
+++ zenagios.py (working copy)
@@ -87,6 +87,7 @@

def processEnded(self, reason):

         "notify the starter that their process is complete"
+        self.reason = reason    # can be a failure.Failure instance
         self.exitCode = reason.value.exitCode
         self.output = [s.strip() for s in self.output.split('\n')][0]
         if self.stopped:
@@ -264,15 +265,19 @@

def processEnded(self, pr):

+        """ return value goes to znagios.finished
+            can be a Cmd or failure.Failure instance"""
+        reason, pr.reason = pr.reason, None        # del attribute; needed?
         self.result = pr
         self.lastStop = time.time()
-        if not isinstance(pr, failure.Failure):
+        if isinstance(reason, failure.Failure) and pr.exitCode != 0:
+           return reason
+        else:
             log.debug('Process %s stopped (%s), %f elapsed' % (
                 self.name(),
                 pr.exitCode,
                 self.lastStop - self.lastStart))
             return self
-        return pr

def updateConfig(self,device,ipAddress, username, password,


-8<-------------------------------------------------------------------

I can't see any heartbeat errors, however it seems to reset (clear) every
1800 sec


Another bug in znagios.py. The heartbeat() method is only called in
"updateConfig" (default cycle time: every 30min). Probably it should
also be called in ProcessRunner or so...


\wlang{}


_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Re: [zenoss-users] RE: [Zen-WebForm]contribute_0000407 [email protected]/Antonio/Paneiro

Reply via email to