On Dec 19, 2008, at 12:59 AM, nixinfo wrote: > We are evaluating Zenoss in an active/passive clustered > environment. The current setup is as follows > > CentOS 4.4 > HeartBeat 2.1 > Zenoss 2.1.3 > > Since there is only one instance active at a time, we get lot of > following messages in event console, with count increasing every > minute: > > ---- 8< ---- > > zenwinmodeler /Status/Heartbeat zenoss02 zenwinmodeler heartbeat > failure 28:47.0 29:47.0 2 > zenwin /Status/Heartbeat zenoss02 zenwin heartbeat failure > 28:47.0 > 29:47.0 2 > zentrap /Status/Heartbeat zenoss02 zentrap heartbeat failure > 28:47.0 > 29:47.0 2 > zensyslog /Status/Heartbeat zenoss02 zensyslog heartbeat failure > > 28:47.0 29:47.0 2 > zenstatus /Status/Heartbeat zenoss02 zenstatus heartbeat failure > > 28:47.0 29:47.0 2 > zenprocess /Status/Heartbeat zenoss02 zenprocess heartbeat failure > > 28:47.0 29:47.0 2 > zenping /Status/Heartbeat zenoss02 zenping heartbeat failure > 28:47.0 > 29:47.0 2 > zenperfsnmp /Status/Heartbeat zenoss02 zenperfsnmp heartbeat failure > > 28:47.0 29:47.0 2 > zenmodeler /Status/Heartbeat zenoss02 zenmodeler heartbeat failure > > 28:47.0 29:47.0 2 > zeneventlog /Status/Heartbeat zenoss02 zeneventlog heartbeat failure > > 28:47.0 29:47.0 2 > zencommand /Status/Heartbeat zenoss02 zencommand heartbeat failure > > 28:47.0 29:47.0 2 > zenactions /Status/Heartbeat zenoss02 zenactions heartbeat failure > > 28:47.0 29:47.0 2 > > ---- 8< ---- > > We have set zEventAction to drop in /Events/Heartbeat for a quick > fix but there could be some side effects since all the Heartbeat > messages are now going to be dropped. Is there any other way to fix > such problem like only dropping the message with certain string?
Due to this specific problem we changed the way heartbeats are handled for Zenoss 2.3. The "device" field of the heartbeat now comes in as the name of the collector (i.e. localhost) instead of the FQDN of the Zenoss server. This allows failover to occur without generating a full set of heartbeat failures. I'd recommend trying the latest version (2.3.2) if you want to really solve this problem. Otherwise you're going to have to clear all heartbeats as part of your failover process. _______________________________________________ zenoss-users mailing list [email protected] http://lists.zenoss.org/mailman/listinfo/zenoss-users
