It may not be you. Zenwin has some conditions which cause it to hang. These conditions only seem to happen in certain network/server configurations and the team hasn't been able to reproduce the error.
My network is/has been suffering from this phenomena and I managed to eliminate one source of error as follows: Zenoss machine is monitoring two networks through two separate NICs. Some of the monitored servers are also connected to both networks and sometimes a zenoss agent would go down when polling one of these servers. I believe the problem was zenoss would send a query packet on one network and the answer would come back on the other network. Solution was to alter the metric on the servers so that the answer would always come back on the main network. Prior to this the NICs had the same metrics automatically defined by the OS. This solved some of the agent problems, but I still have frequent heartbeat failures on zenwin. I believe Zenoss 2.2 is going to have a watchdog process that will restart any hung agent as a stopgap measure. But who's going to watch the watchdog? ;)) JM -------------------- m2f -------------------- Read this topic online here: http://community.zenoss.com/forums/viewtopic.php?p=16192#16192 -------------------- m2f -------------------- _______________________________________________ zenoss-users mailing list [email protected] http://lists.zenoss.org/mailman/listinfo/zenoss-users
