It may not be you.  Zenwin has some conditions which cause it to hang.  These 
conditions only seem to happen in certain network/server configurations and the 
team hasn't been able to reproduce the error.

My network is/has been suffering from this phenomena and I managed to eliminate 
one source of error as follows:  

Zenoss machine is monitoring two networks through two separate NICs.  Some of 
the monitored servers are also connected to both networks and sometimes a 
zenoss agent would go down when polling one of these servers.

I believe the problem was zenoss would send a query packet on one network and 
the answer would come back on the other network.  Solution was to alter the 
metric on the servers so that the answer would always come back on the main 
network.  Prior to this the NICs had the same metrics automatically defined by 
the OS.

This solved some of the agent problems, but I still have frequent heartbeat 
failures on zenwin.

I believe Zenoss 2.2 is going to have a watchdog process that will restart any 
hung agent as a stopgap measure.  But who's going to watch the watchdog? ;))

JM




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=16192#16192

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to