From: Yujun Zhang <[email protected]>
Date: Sunday, 15 January 2017 at 17:53


About fault and alarm, what I was thinking about the causal/deducing chain in 
root cause analysis.

Fault state means the resource is not fully functional and it is evaluated by 
related indicators. There are alarms on events like power loss or measurands 
like CPU high, memory low, temperature high. There are also alarms based on 
deduced state, such as "host fault", "instance fault".

So an example chain would be
·         "FAULT: power line cut off" =(monitor)=> "ALARM: host power loss" 
=(inspect)=> "FAULT: host is unavailable" =(action)=> "ALARM: host fault"
·         "FAULT: power line cut off" =(monitor)=> "ALARM: host power loss" 
=(inspect)=> "FAULT: host is unavailable" =(inspect)=> "FAULT: instance is 
unavailable" =(action)=> "ALARM: instance fault"
If we omit the resource, then we get the causal chain as it is in Vitrage
·         "ALARM: host power loss" =(causes)=> "ALARM: host fault"
·         "ALARM: host power loss" =(causes)=> "ALARM: instance fault"
But what the user care about might be there "FAULT: power line cut off" causes 
all these alarms. What I haven't made clear yet is the equivalence between 
fault and alarm.

I may have made it more complex with my immature thoughts. It could be even 
more complex if we consider multiple upstream causes and downstream outcome. It 
may be an interesting topic to be discussed in design session.


[Ifat] I agree. Let’s discuss this in the next design session we’ll have


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to