On Dec 19, 2008, at 12:59 AM, nixinfo wrote:
> We are evaluating Zenoss in an active/passive clustered  
> environment.  The current setup is as follows
>
> CentOS 4.4
> HeartBeat 2.1
> Zenoss 2.1.3
>
> Since there is only one instance active at a time, we get lot of  
> following messages in event console, with count increasing every  
> minute:
>
> ---- 8< ----
>
> zenwinmodeler /Status/Heartbeat       zenoss02 zenwinmodeler heartbeat  
> failure       28:47.0 29:47.0 2
> zenwin        /Status/Heartbeat       zenoss02 zenwin heartbeat failure       
> 28:47.0  
> 29:47.0       2
> zentrap       /Status/Heartbeat       zenoss02 zentrap heartbeat failure      
> 28:47.0  
> 29:47.0       2
> zensyslog     /Status/Heartbeat       zenoss02 zensyslog heartbeat failure    
>  
> 28:47.0       29:47.0 2
> zenstatus     /Status/Heartbeat       zenoss02 zenstatus heartbeat failure    
>  
> 28:47.0       29:47.0 2
> zenprocess    /Status/Heartbeat       zenoss02 zenprocess heartbeat failure   
>  
> 28:47.0       29:47.0 2
> zenping       /Status/Heartbeat       zenoss02 zenping heartbeat failure      
> 28:47.0  
> 29:47.0       2
> zenperfsnmp   /Status/Heartbeat       zenoss02 zenperfsnmp heartbeat failure  
>  
> 28:47.0       29:47.0 2
> zenmodeler    /Status/Heartbeat       zenoss02 zenmodeler heartbeat failure   
>  
> 28:47.0       29:47.0 2
> zeneventlog   /Status/Heartbeat       zenoss02 zeneventlog heartbeat failure  
>  
> 28:47.0       29:47.0 2
> zencommand    /Status/Heartbeat       zenoss02 zencommand heartbeat failure   
>  
> 28:47.0       29:47.0 2
> zenactions    /Status/Heartbeat       zenoss02 zenactions heartbeat failure   
>  
> 28:47.0       29:47.0 2
>
> ---- 8< ----
>
> We have set zEventAction to drop in /Events/Heartbeat for a quick  
> fix but there could be some side effects since all the Heartbeat  
> messages are now going to be dropped.  Is there any other way to fix  
> such problem like only dropping the message with certain string?


Due to this specific problem we changed the way heartbeats are handled  
for Zenoss 2.3. The "device" field of the heartbeat now comes in as  
the name of the collector (i.e. localhost) instead of the FQDN of the  
Zenoss server. This allows failover to occur without generating a full  
set of heartbeat failures.

I'd recommend trying the latest version (2.3.2) if you want to really  
solve this problem. Otherwise you're going to have to clear all  
heartbeats as part of your failover process.
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to