Hi all,
I have observed very similar issue with fresh 2.3.3 install when using SSH to
monitor linux servers. Performance data gathering and uptime update just stops
until restart. This happens randomly.
First clue for finding root cause of this was that zencommand run -v10 log
showed negative command elapsed time numbers.
What I have found is that system clock for some reason is sometimes making step
backward for few seconds on one specific server.
If this happens during command execution, it result in zencommand scheduler
stuck.
More exactly, when it happens, processEnded() function in
$ZENHOME/Products/ZenRRD/zencommand.py stores process completion time which is
less than process start time. Then processSchedule() function will consider
this process as "running" because lastStop < lastStart (this means "process
ended in the past but is running again"). Which in turn results in process not
completed from scheduler point of view.
I have simple workaround for that:
--- zencommand.py.orig 2009-04-29 09:32:35.000000000 -0500
+++ zencommand.py 2009-04-29 17:56:31.000000000 -0500
@@ -322,6 +323,9 @@
def processEnded(self, pr):
self.result = pr
self.lastStop = time.time()
+ if self.lastStop < self.lastStart:
+ log.debug('System clock went back?')
+ self.lastStop = self.lastStart
if not isinstance(pr, failure.Failure):
log.debug('Process %s stopped (%s), %f elapsed' % (
self.name(),
This fixed issue in my case
but weird behavior of system clock is still mistery for me ;)
-------------------- m2f --------------------
Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=34052#34052
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users