Hi all,
I have observed very similar issue with fresh 2.3.3 install when using SSH to 
monitor linux servers. Performance data gathering and uptime update just stops 
until restart. This happens randomly.
First clue for finding root cause of this was that zencommand run -v10 log 
showed negative command elapsed time numbers.
What I have found is that system clock for some reason is sometimes making step 
backward for few seconds on one specific server.
If this happens during command execution, it result in zencommand scheduler 
stuck.

More exactly, when it happens, processEnded() function in 
$ZENHOME/Products/ZenRRD/zencommand.py stores process completion time which is 
less than process start time. Then processSchedule() function will consider 
this process as "running" because lastStop < lastStart (this means "process 
ended in the past but is running again"). Which in turn results in process not 
completed from scheduler point of view.

I have simple workaround for that:
--- zencommand.py.orig  2009-04-29 09:32:35.000000000 -0500
+++ zencommand.py       2009-04-29 17:56:31.000000000 -0500
@@ -322,6 +323,9 @@
     def processEnded(self, pr):
         self.result = pr
         self.lastStop = time.time()
+        if self.lastStop < self.lastStart:
+           log.debug('System clock went back?')
+           self.lastStop = self.lastStart
         if not isinstance(pr, failure.Failure):
             log.debug('Process %s stopped (%s), %f elapsed' % (
                 self.name(),

This fixed issue in my case

but weird behavior of system clock is still mistery for me ;)




-------------------- m2f --------------------

Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=34052#34052

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to