I have just realized abstract alarm is not a good term. What I was talking about is *fault* and *alarm*.
Fault is what actually happens, and alarm is how it is detected (or deduced). On Wed, Jan 11, 2017 at 5:13 PM Yujun Zhang <[email protected]> wrote: > Yes, if we consider the Vitrage scenario evaluator as a pseudo monitor. > > I think YinLiYin's idea is a reasonable requirement from end user. They > care more about the *real faults* in the system, not how they are > detected. Though it will bring much challenge to design and engineering, it > creates value for customers. I'm quite positive on this evolution. > > One possible solution would be introducing a high level (abstract) > template from users view. Then convert it to Vitrage scenario templates (or > directly to graph). The *more sources* (nagios, vitrage deduction) for an > abstract alarm we get from the system, the *more confidence* we get for a > real fault. And the confidence of an alarm could be included in the > scenario condition. > > On Wed, Jan 11, 2017 at 4:08 PM Afek, Ifat (Nokia - IL) < > [email protected]> wrote: > > You are right. But as I see it, the case of Vitrage suspect vs. the real > Nagios alarm is just one example of the more general case of two monitors > reporting the same alarm. > > Don’t you think so? > > > > *From: *Yujun Zhang <[email protected]> > > > *Reply-To: *"OpenStack Development Mailing List (not for usage > questions)" <[email protected]> > > *Date: *Wednesday, 11 January 2017 at 09:46 > *To: *"OpenStack Development Mailing List (not for usage questions)" < > [email protected]>, "[email protected]" < > [email protected]> > *Cc: *"[email protected]" <[email protected]>, " > [email protected]" <[email protected]>, "[email protected]" > <[email protected]>, "[email protected]" < > [email protected]>, "[email protected]" <[email protected]> > > > *Subject: *Re: [openstack-dev] [Vitrage] About alarms reported by > datasource and the alarms generated by vitrage evaluator > > > > Hi, Ifat > > > > If I understand it correctly, your concerns are mainly on same alarm from > different monitor, but not "suspect" status as discussed in another thread. > > > > On Tue, Jan 10, 2017 at 10:21 PM Afek, Ifat (Nokia - IL) < > [email protected]> wrote: > > Hi Yinliyin, > > > > At first I thought that changing the deduced to be a property on the alarm > might help in solving your use case. But now I think most of the problems > will remain the same: > > > > · It won’t solve the general problem of two different monitors that > raise the same alarm > > · It won’t solve possible conflicts of timestamp and severity between > different monitors > > · It will make the decision of when to delete the alarm more complex > (delete it when the deduced alarm is deleted? When Nagios alarm is deleted? > both? And how to change the timestamp and severity in these cases?) > > > > So I don’t think that making this change is beneficial. > > What do you think? > > > > Best Regards, > > Ifat. > > > > > > *From: *"[email protected]" <[email protected]> > *Date: *Monday, 9 January 2017 at 05:29 > *To: *"Afek, Ifat (Nokia - IL)" <[email protected]> > *Cc: *"[email protected]" < > [email protected]>, "[email protected]" < > [email protected]>, "[email protected]" <[email protected]>, " > [email protected]" <[email protected]>, " > [email protected]" <[email protected]>, "[email protected]" > <[email protected]> > *Subject: *Re: [openstack-dev] [Vitrage] About alarms reported by > datasource and the alarms generated by vitrage evaluator > > > > Hi Ifat, > > I think there is a situation that all the alarms are reported by > the monitored system. We use vitrage to: > > 1. Found the relationships of the alarms, and find the root > cause. > > 2. Deduce the alarm before it really occured. This comprise > two aspects: > > 1) A cause B: When A occured, we deduce that B would > occur > > 2) B is caused by A: When B occured, we deduce that A > must occured > > In "2", we do expect vitrage to raise the alarm before the > alarm is reported because the alarm would be lost or be delayed for some > reason. So we would write "raise alarm" actions in the scenarios of the > template. I think that the alarm is reported or is deduced should be a > state property of the alarm. The vertex reported and the vertex deduced of > the same alarm should be merged to one vertex. > > > > Best Regards, > > Yinliyin. > > 原始邮件 > > *发件人:* <[email protected]>; > > *收件人:* <[email protected]>; > > *抄送人:*韩静00006838;王维雅00042110;章宇军10200531;贾培源10101785;龚亚辉6092001895 > <(609)%20200-1895>; > > *日* *期* *:*2017年01月07日 02:18 > > *主* *题* *:**Re: [openstack-dev] [Vitrage] About alarms reported by > datasource and the alarms generated by vitrage evaluator* > > > > Hi YinLiYin, > > > > This is an interesting question. Let me divide my answer to two parts. > > > > First, the case that you described with Nagios and Vitrage. This problem > depends on the specific Nagios tests that you configure in your system, as > well as on the Vitrage templates that you use. For example, you can use > Nagios/Zabbix to monitor the physical layer, and Vitrage to raise deduced > alarms on the virtual and application layers. This way you will never have > duplicated alarms. If you want to use Nagios to monitor the other layers > as well, you can simply modify Vitrage templates so they don’t raise the > deduced alarms that Nagios may generate, and use the templates to show RCA > between different Nagios alarms. > > > > Now let’s talk about the more general case. Vitrage can receive alarms > from different monitors, including Nagios, Zabbix, collectd and Aodh. If > you are using more than one monitor, it is possible that the same alarm > (maybe with a different name) will be raised twice. We need to create a > mechanism to identify such cases and create a single alarm with the > properties of both monitors. This has not been designed in details yet, so > if you have any suggestion we will be happy to hear them. > > > > Best Regards, > > Ifat. > > > > > > *From: *"[email protected]" <[email protected]> > *Reply-To: *"OpenStack Development Mailing List (not for usage > questions)" <[email protected]> > *Date: *Friday, 6 January 2017 at 03:27 > *To: *"[email protected]" < > [email protected]> > *Cc: *"[email protected]" <[email protected]>, " > [email protected]" <[email protected]>, "[email protected]" < > [email protected]>, "[email protected]" <[email protected]>, > "[email protected]" <[email protected]> > *Subject: *[openstack-dev] [Vitrage] About alarms reported by datasource > and the alarms generated by vitrage evaluator > > > > Hi all, > > Vitrage generate alarms acording to the templates. All the alarms > raised by vitrage has the type "vitrage". Suppose Nagios has an alarm A. > Alarm A is raised by vitrage evaluator according to the action part of a > scenario, type of alarm A is "vitrage". If Nagios reported alarm A latter, > a new alarm A with type "Nagios" would be generator in the entity graph. > There would be two vertices for the same alarm in the graph. And we have > to define two alarm entities, two relationships, two scenarios in the > template file to make the alarm propagation procedure work. > > It is inconvenient to describe fault model of system with lot of > alarms. How to solve this problem? > > > > 殷力殷 YinLiYin > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
