I am attempting to configure device dependencies in ZenOSS 2.2.3 and have run 
into some issues.

I am following the directions from this page:

http://www.zenoss.com/Members/netdata/create-a-device-dependency/

The main question I have is what the return value of device.pingStatus() 
represents.  If a device is up and operational, the pingStatus() should be zero 
according to the logic in the link above.  It is fairly trivial for me to 
construct a situation where this is not the case.

I have two machines configured in ZenOSS and a dependency configured using 
event transformations as the link above suggests.  If machine-B goes down while 
machine-A is down, the alert should be suppressed.

The event transformation code:


Code:
import logging
log = logging.getLogger("zen.ZenTcpClient")

if (device.id == 'machine-B'):
    parent_host = device.findDevice('machine-A')
    
    log.info("DEBUG>> child_host = %s : %s" % (device.id, device.pingStatus()))
    log.info("DEBUG>> parent_host = %s : %s" % (parent_host.id, 
parent_host.pingStatus()))

    if (parent_host.pingStatus() > 0):
        log.info("DEBUG>> setting event state to 2 (INFO)")
        evt.eventState = 2



The event for machine-B is suppressed if machine-A is down.  The questionable 
behavoir happens when machine-A comes back up while machine-B stays down.  
Alerts are never sent for machine-B because the pingStatus() for machine-A 
stays at a non-zero value.

The progression of events:

1.  Take machine-A down, get an alert from ZenOSS.  Machine-A now has a 
non-zero pingStatus().
2.  Take machine-B down, see the alert transform rule suppress the alert due to 
the non-zero pingStatus() of machine-A.
3.  Bring machine-A back up, get a CLEAR alert from ZenOSS.  Machine-A should 
have a pingStatus() of zero.  It doesn't.
4.  No alerts are sent for machine-B, even though it is still down and marked 
as such in ZenOSS.  Machine-A has a non-zero pingStatus() even though ZenOSS 
reports it as being 'up' with all services active.

After running through the steps above, machine-A is up, machine-B is down.  
This is what my zenhub.log file reads:


Code:

2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> child_host = machine-B : 4
2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> parent_host = machine-A : 4
2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> setting event state to 2 
(INFO)




Machine-A still has a pingStatus() of 4.  Why?  The host is reported as being 
up by the ZenOSS web interface.

What does the return value of pingStatus() represent?

If I am supposed to be using this value to determine whether or not a host is 
in an 'up' state, why does my recovered host still have a non-zero pingStatus() 
value?

This behavior means that I will never get alerted for devices that stay down 
after their parent (device they depend on) comes back up.




-------------------- m2f --------------------

Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=25466#25466

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to