Hi Yinliyin,
There are two use cases:
One is yours, where you have a single monitor that generates “real” alarms, and
Vitrage that generates deduced alarms.
Another is where someone has a few monitors, and there might be a
collision/equivalence between their alarms.
The solution that you suggested might solve the first use case, but I wouldn’t
want to ignore the second one, which is also valid.
Regarding some of your specific suggestions:
1. In templates, we only define the alarm entity for the datasource that
the alarm is reported by, such as Nagios.
[Ifat] This will only work for a single monitor.
2. When evaluator deduce an alarm, it would raise the alarm with the
type set to be the datasource that would report the alarm, not be vitrage.
[Ifat] I don’t think this is right. In Vitrage Alarm view in the UI, displaying
the deduced alarm as “Nagios” is misleading, since Nagios did not report this
alarm.
I can think of a solution that is specific to the deduced alarms case, where we
will replace a Vitrage alarm with a “real” alarm whenever there is a collision.
This solution is easier, but we should carefully examine all use cases to make
sure there is no ambiguity. However, for the more general use case I would
prefer the option that we discussed in a previous mail, of having two (or more)
alarms connected with a ‘equivalent’ relationship.
What do you think?
Ifat.
From: "[email protected]" <[email protected]>
Date: Saturday, 14 January 2017 at 09:57
· It won’t solve the general problem of two different monitors that
raise the same alarm
· [yinliyin] Generally, we would only deploy one monitor for a same
alarm.
· It won’t solve possible conflicts of timestamp and severity between
different monitors
· [yinliyin] Please see the following contents.
· It will make the decision of when to delete the alarm more complex
(delete it when the deduced alarm is deleted? When Nagios alarm is deleted?
both? And how to change the timestamp and severity in these cases?)
· [yinliyin] Please see the following contents.
The following is the basic idea of solving the problem in this situation:
1. In templates, we only define the alarm entity for the datasource
that the alarm is reported by, such as Nagios.
2. When evaluator deduce an alarm, it would raise the alarm with the
type set to be the datasource that would report the alarm, not be vitrage.
3. When entity_graph get the events from the "evaluator_queue"(all the
alarms in the "evaluator_queue" are deduced alarms), it queries the graph to
find out whether there was a same alarm reported by datasource. If it was
true, it would discard the alarm.
4. When entity_graph get the events from "queue", it queries the graph
to find out whether there was a same alarm deduced by evaluator. If it was
true, it would replace the alarm in the graph with the newly arrived alarm
reported by the datasource.
5. When the evaluator deduced that an alarm would be deleted, it deletes
the alarm whatever the generation type of the alarm be(Generated by datasource
or deduced by evaluator).
6. When datasource reports recover event of an alarm, entity_graph would
query graph to find out whether the alarm was exist. If the alarm was not
exist, entity_graph would discard the event.
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev