I am attempting to configure device dependencies in ZenOSS 2.2.3 and have run into some issues.
I am following the directions from this page: http://www.zenoss.com/Members/netdata/create-a-device-dependency/ The main question I have is what the return value of device.pingStatus() represents. If a device is up and operational, the pingStatus() should be zero according to the logic in the link above. It is fairly trivial for me to construct a situation where this is not the case. I have two machines configured in ZenOSS and a dependency configured using event transformations as the link above suggests. If machine-B goes down while machine-A is down, the alert should be suppressed. The event transformation code: Code: import logging log = logging.getLogger("zen.ZenTcpClient") if (device.id == 'machine-B'): parent_host = device.findDevice('machine-A') log.info("DEBUG>> child_host = %s : %s" % (device.id, device.pingStatus())) log.info("DEBUG>> parent_host = %s : %s" % (parent_host.id, parent_host.pingStatus())) if (parent_host.pingStatus() > 0): log.info("DEBUG>> setting event state to 2 (INFO)") evt.eventState = 2 The event for machine-B is suppressed if machine-A is down. The questionable behavoir happens when machine-A comes back up while machine-B stays down. Alerts are never sent for machine-B because the pingStatus() for machine-A stays at a non-zero value. The progression of events: 1. Take machine-A down, get an alert from ZenOSS. Machine-A now has a non-zero pingStatus(). 2. Take machine-B down, see the alert transform rule suppress the alert due to the non-zero pingStatus() of machine-A. 3. Bring machine-A back up, get a CLEAR alert from ZenOSS. Machine-A should have a pingStatus() of zero. It doesn't. 4. No alerts are sent for machine-B, even though it is still down and marked as such in ZenOSS. Machine-A has a non-zero pingStatus() even though ZenOSS reports it as being 'up' with all services active. After running through the steps above, machine-A is up, machine-B is down. This is what my zenhub.log file reads: Code: 2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> child_host = machine-B : 4 2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> parent_host = machine-A : 4 2008-09-23 10:14:44 INFO zen.ZenTcpClient: DEBUG>> setting event state to 2 (INFO) Machine-A still has a pingStatus() of 4. Why? The host is reported as being up by the ZenOSS web interface. What does the return value of pingStatus() represent? If I am supposed to be using this value to determine whether or not a host is in an 'up' state, why does my recovered host still have a non-zero pingStatus() value? This behavior means that I will never get alerted for devices that stay down after their parent (device they depend on) comes back up. -------------------- m2f -------------------- Read this topic online here: http://forums.zenoss.com/viewtopic.php?p=25466#25466 -------------------- m2f -------------------- _______________________________________________ zenoss-users mailing list [email protected] http://lists.zenoss.org/mailman/listinfo/zenoss-users
