I prefer 2.b from instinct. Not sure it could be linked to the vitrage_id[1] evolution. If an uuid is created for the alarm, the implementation could be quite straightforward.
[1]: https://blueprints.launchpad.net/vitrage/+spec/standard-vitrage-id On Tue, Jan 10, 2017 at 1:55 AM Afek, Ifat (Nokia - IL) <ifat.a...@nokia.com> wrote: > Hi Yujun, > > > > I understand the use case now, thanks for the detailed explanation. > > > > Supporting this use case will require some development in Vitrage. Let me > try to list down the requirements and options that we have. > > > > 1. Requirement: Raise ‘suspect’ deduced alarms in Vitrage. > > Implementation: Quite straight forward. There is no way to set ‘suspect’ > property in Vitrage right now, but it should be easy to add this option. > > > > 2. Requirement: Change a ‘suspect’ alarm of type ‘vitrage’ to a > ‘real’ alarm of type ‘nagios’. > > Implementation: There are a few alternatives how to achieve this goal > > > > a. Delete the ‘suspect’ alarm and create the ‘real’ alarm. This > will require supporting ‘not’ condition in the templates. An example > scenario: > > condition: vm_alarm and not nagios_alarm: > > (action: create vitrage alarm) > > condition: nagios_alarm and vitrage_alarm: > > (action: delete vitrage_alarm) > > > > b. Have both ‘suspect’ alarm and ‘real’ alarm, and create a > ‘equivalent’ relationship between them. Configuring the template should be > easy, however it won’t look nice in the UI. In past discussions we > mentioned an option to group some vertices together in the UI. If we have > this option, we might want to group these two alarms together. > > > > c. Merge the two alarms. This solution seems the most reasonable > one at first, but it is not trivial. For example: suppose one alarm is > defined as ‘critical’ and was raised at 10:01, and the other alarm was > defined as ‘warning’ and was raised at 10:02. How will you combine the two? > And what if the ‘critical’ alarm then goes down, will you know how to > change the severity back to ‘warning’? in case of vitrage vs. nagios we > would like to prefer nagios; but let’s think of the more general case of > two different monitors. > > > > 3. In one of your emails you mentioned an option of having two > ‘suspects’. Suppose vm_alarm is raised, will you raise two suspect vitrage > alarms, e.g. host_alarm and switch_alarm? And if you then receive > host_alarm from nagios, would you like to delete the suspect switch_alarm, > or keep it? If you would like to delete it, it will require supporting > ‘not’ in the template condition. > > > > Personally I would go for option 2b, but I will be happy to hear your > thoughts about it. > > > > Hope I helped, but I suspect I just made things more complicated ;-) > > Ifat. > > > > > > *From: *Yujun Zhang <zhangyujun+...@gmail.com> > > > *Reply-To: *"OpenStack Development Mailing List (not for usage > questions)" <openstack-dev@lists.openstack.org> > > *Date: *Sunday, 8 January 2017 at 17:38 > > > *To: *"OpenStack Development Mailing List (not for usage questions)" < > openstack-dev@lists.openstack.org> > *Cc: *"han.jin...@zte.com.cn" <han.jin...@zte.com.cn>, " > wang.we...@zte.com.cn" <wang.we...@zte.com.cn>, "gong.yah...@zte.com.cn" < > gong.yah...@zte.com.cn>, "jia.peiy...@zte.com.cn" <jia.peiy...@zte.com.cn>, > "zhang.yuj...@zte.com.cn" <zhang.yuj...@zte.com.cn> > *Subject: *Re: [openstack-dev] [Vitrage] About alarms reported by > datasource and the alarms generated by vitrage evaluator > > Maybe I have missed something in the scenario template, but it seems you > have understood my idea quite correctly :-) > > > > See further explanation inline > > On Sun, Jan 8, 2017 at 3:06 PM Afek, Ifat (Nokia - IL) < > ifat.a...@nokia.com> wrote: > > Hi Yujun, > > > > Thanks for the explanation, but I still don’t fully understand. > > > > Let me start with the current state: > > 1. introduce a flexible `metadata` dict in to ALARM entity > > [Ifat] Already exists. An alarm is represented as a vertex in the entity > graph, with a dictionary of properties. > > > > [yujunz] Can the alarm vertex be updated by scenario action? e.g. raise > an alarm and set the property `suspect` to true. > > > > 2. Allow generating update event[1] on metadata change > > 3. Allow using ALARM metadata in scenario condition > > [Ifat] Already exists. You can define properties in the ‘entities’ section > in Vitrage templates > > > > [yujunz] How do I specify the condition if one specified alarm is > 'suspicious', e.g. condition: host_alarm.suspect ? > > > > 4. Allow setting ALARM metadata in scenario action > > > > If I understand correctly, you are suggesting that one scenario will add > metadata to an existing alarm, which will trigger an event, and as a result > another scenario might be executed? > > > > [yujunz] Exactly > > > > Can you describe a use case where this behavior will help calculating the > root cause? > > > > [yujunz] Here's the simplified case derived from YinLiYin's example. > Suppose we add a causal relationship from `host_alarm` to `instance_alarm`, > i.e. host alarm will cause instance alarm. If an instance alarm is detected > (but no host alarm). It is "suspicious" that it may be caused by host > alarm. The reason could be event delay or lost. Instead of waiting for > snapshot service to update the host status, we want to run a diagnostic > action to check it initiatively. > > > > In this case, we want to set the upstream (host) of a confirmed alarm > (instance) to "suspect" and trigger an diagnostic action on this change. > > > > Hope that I have made the use case clear. > > > > Thanks, > > Ifat. > > > > > > *From: *Yujun Zhang <zhangyujun+...@gmail.com> > > > *Reply-To: *"OpenStack Development Mailing List (not for usage > questions)" <openstack-dev@lists.openstack.org> > > *Date: *Saturday, 7 January 2017 at 09:27 > > > *To: *"OpenStack Development Mailing List (not for usage questions)" < > openstack-dev@lists.openstack.org> > > *Cc: *"han.jin...@zte.com.cn" <han.jin...@zte.com.cn>, " > wang.we...@zte.com.cn" <wang.we...@zte.com.cn>, "gong.yah...@zte.com.cn" < > gong.yah...@zte.com.cn>, "jia.peiy...@zte.com.cn" <jia.peiy...@zte.com.cn>, > "zhang.yuj...@zte.com.cn" <zhang.yuj...@zte.com.cn> > *Subject: *Re: [openstack-dev] [Vitrage] About alarms reported by > datasource and the alarms generated by vitrage evaluator > > > > The two questions raised by YinLiYin is actually one, i.e. *how to enrich > the alarm properties *that can be used as an condition in root cause > deducing. > > > > Both 'suspect' or 'datasource' are additional information that may be > referred as a condition in general fault model, a.k.a. scenario in vitrage. > > > > It seems it could be done by > > 1. introduce a flexible `metadata` dict in to ALARM entity > > 2. Allow generating update event[1] on metadata change > > 3. Allow using ALARM metadata in scenario condition > > 4. Allow setting ALARM metadata in scenario action > > This will leave the flexibility to continuous development by defining a > complex scenario template and keep the vitrage evaluator simple and generic. > > > > My two cents. > > > > [1]: > http://docs.openstack.org/developer/vitrage/scenario-evaluator.html#concepts-and-guidelines > > > > > On Sat, Jan 7, 2017 at 2:23 AM Afek, Ifat (Nokia - IL) < > ifat.a...@nokia.com> wrote: > > Hi YinLiYin, > > > > This is an interesting question. Let me divide my answer to two parts. > > > > First, the case that you described with Nagios and Vitrage. This problem > depends on the specific Nagios tests that you configure in your system, as > well as on the Vitrage templates that you use. For example, you can use > Nagios/Zabbix to monitor the physical layer, and Vitrage to raise deduced > alarms on the virtual and application layers. This way you will never have > duplicated alarms. If you want to use Nagios to monitor the other layers as > well, you can simply modify Vitrage templates so they don’t raise the > deduced alarms that Nagios may generate, and use the templates to show RCA > between different Nagios alarms. > > > > Now let’s talk about the more general case. Vitrage can receive alarms > from different monitors, including Nagios, Zabbix, collectd and Aodh. If > you are using more than one monitor, it is possible that the same alarm > (maybe with a different name) will be raised twice. We need to create a > mechanism to identify such cases and create a single alarm with the > properties of both monitors. This has not been designed in details yet, so > if you have any suggestion we will be happy to hear them. > > > > Best Regards, > > Ifat. > > > > > > *From: *"yinli...@zte.com.cn" <yinli...@zte.com.cn> > *Reply-To: *"OpenStack Development Mailing List (not for usage > questions)" <openstack-dev@lists.openstack.org> > *Date: *Friday, 6 January 2017 at 03:27 > *To: *"openstack-dev@lists.openstack.org" < > openstack-dev@lists.openstack.org> > *Cc: *"gong.yah...@zte.com.cn" <gong.yah...@zte.com.cn>, " > han.jin...@zte.com.cn" <han.jin...@zte.com.cn>, "wang.we...@zte.com.cn" < > wang.we...@zte.com.cn>, "jia.peiy...@zte.com.cn" <jia.peiy...@zte.com.cn>, > "zhang.yuj...@zte.com.cn" <zhang.yuj...@zte.com.cn> > *Subject: *[openstack-dev] [Vitrage] About alarms reported by datasource > and the alarms generated by vitrage evaluator > > Hi all, > > Vitrage generate alarms acording to the templates. All the alarms > raised by vitrage has the type "vitrage". Suppose Nagios has an alarm A. > Alarm A is raised by vitrage evaluator according to the action part of a > scenario, type of alarm A is "vitrage". If Nagios reported alarm A latter, > a new alarm A with type "Nagios" would be generator in the entity graph. > There would be two vertices for the same alarm in the graph. And we have > to define two alarm entities, two relationships, two scenarios in the > template file to make the alarm propagation procedure work. > > It is inconvenient to describe fault model of system with lot of > alarms. How to solve this problem? > > > > 殷力殷 YinLiYin > > > > > > *Error! Filename not specified.* > > *Error! Filename not specified.* > > 上海市浦东新区碧波路889号中兴研发大楼D502 > D502, ZTE Corporation R&D Center, 889# Bibo Road, > Zhangjiang Hi-tech Park, Shanghai, P.R.China, 201203 > T: +86 21 68896229 <+86%2021%206889%206229> > M: +86 13641895907 <+86%20136%204189%205907> > E: yinli...@zte.com.cn > www.zte.com.cn > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev