Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-18 Thread Afek, Ifat (Nokia - IL)
From: Yujun Zhang Date: Tuesday, 17 January 2017 at 02:41 Sounds good. Have you created an etherpad page for collecting topics, Ifat? Here: https://etherpad.openstack.org/p/vitrage-pike-design-sessions

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-16 Thread Yujun Zhang
Sounds good. Have you created an etherpad page for collecting topics, Ifat? On Mon, Jan 16, 2017 at 10:43 PM Afek, Ifat (Nokia - IL) < ifat.a...@nokia.com> wrote: > > > *From: *Yujun Zhang > *Date: *Sunday, 15 January 2017 at 17:53 > > > > About fault and alarm, what

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-16 Thread Afek, Ifat (Nokia - IL)
From: Yujun Zhang Date: Sunday, 15 January 2017 at 17:53 About fault and alarm, what I was thinking about the causal/deducing chain in root cause analysis. Fault state means the resource is not fully functional and it is evaluated by related indicators. There are

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-15 Thread Yujun Zhang
About fault and alarm, what I was thinking about the causal/deducing chain in root cause analysis. Fault state means the resource is not fully functional and it is evaluated by related indicators. There are alarms on events like power loss or measurands like CPU high, memory low, temperature

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-15 Thread Afek, Ifat (Nokia - IL)
From: Yujun Zhang Date: Thursday, 12 January 2017 at 17:37 On Thu, Jan 12, 2017 at 5:12 PM Afek, Ifat (Nokia - IL) > wrote: 'deduced' vs 'monitored' would be good enough for most cases. Unless we have identify some real

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-12 Thread Yujun Zhang
Hi, Ifat You comments is quite right. See my additional explanation inline. On Thu, Jan 12, 2017 at 5:12 PM Afek, Ifat (Nokia - IL) wrote: > > > One possible solution would be introducing a high level (abstract) > template from users view. Then convert it to Vitrage

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-12 Thread Afek, Ifat (Nokia - IL)
Hi Yujun, See my comments inline. Ifat. From: Yujun Zhang Date: Wednesday, 11 January 2017 at 12:12 I have just realized abstract alarm is not a good term. What I was talking about is fault and alarm. Fault is what actually happens, and alarm is how it is detected

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-11 Thread Yujun Zhang
I have just realized abstract alarm is not a good term. What I was talking about is *fault* and *alarm*. Fault is what actually happens, and alarm is how it is detected (or deduced). On Wed, Jan 11, 2017 at 5:13 PM Yujun Zhang wrote: > Yes, if we consider the Vitrage

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-11 Thread Yujun Zhang
Yes, if we consider the Vitrage scenario evaluator as a pseudo monitor. I think YinLiYin's idea is a reasonable requirement from end user. They care more about the *real faults* in the system, not how they are detected. Though it will bring much challenge to design and engineering, it creates

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-11 Thread Afek, Ifat (Nokia - IL)
You are right. But as I see it, the case of Vitrage suspect vs. the real Nagios alarm is just one example of the more general case of two monitors reporting the same alarm. Don’t you think so? From: Yujun Zhang Reply-To: "OpenStack Development Mailing List (not for

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-10 Thread Yujun Zhang
Hi, Ifat If I understand it correctly, your concerns are mainly on same alarm from different monitor, but not "suspect" status as discussed in another thread. On Tue, Jan 10, 2017 at 10:21 PM Afek, Ifat (Nokia - IL) < ifat.a...@nokia.com> wrote: Hi Yinliyin, At first I thought that changing

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-10 Thread Afek, Ifat (Nokia - IL)
Hi Yinliyin, At first I thought that changing the deduced to be a property on the alarm might help in solving your use case. But now I think most of the problems will remain the same: * It won’t solve the general problem of two different monitors that raise the same alarm * It won’t

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-09 Thread Yujun Zhang
I prefer 2.b from instinct. Not sure it could be linked to the vitrage_id[1] evolution. If an uuid is created for the alarm, the implementation could be quite straightforward. [1]: https://blueprints.launchpad.net/vitrage/+spec/standard-vitrage-id On Tue, Jan 10, 2017 at 1:55 AM Afek, Ifat

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-09 Thread Afek, Ifat (Nokia - IL)
Hi Yujun, I understand the use case now, thanks for the detailed explanation. Supporting this use case will require some development in Vitrage. Let me try to list down the requirements and options that we have. 1. Requirement: Raise ‘suspect’ deduced alarms in Vitrage. Implementation:

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-09 Thread yinliyin
Hi Ifat, I think there is a situation that all the alarms are reported by the monitored system. We use vitrage to: 1. Found the relationships of the alarms, and find the root cause. 2. Deduce the alarm before it really occured. This comprise two aspects:

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-08 Thread Yujun Zhang
Maybe I have missed something in the scenario template, but it seems you have understood my idea quite correctly :-) See further explanation inline On Sun, Jan 8, 2017 at 3:06 PM Afek, Ifat (Nokia - IL) wrote: > Hi Yujun, > > > > Thanks for the explanation, but I still

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-07 Thread Afek, Ifat (Nokia - IL)
Hi Yujun, Thanks for the explanation, but I still don’t fully understand. Let me start with the current state: 1. introduce a flexible `metadata` dict in to ALARM entity [Ifat] Already exists. An alarm is represented as a vertex in the entity graph, with a dictionary of properties. 2.

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-06 Thread Yujun Zhang
The two questions raised by YinLiYin is actually one, i.e. *how to enrich the alarm properties *that can be used as an condition in root cause deducing. Both 'suspect' or 'datasource' are additional information that may be referred as a condition in general fault model, a.k.a. scenario in

Re: [openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-06 Thread Afek, Ifat (Nokia - IL)
Hi YinLiYin, This is an interesting question. Let me divide my answer to two parts. First, the case that you described with Nagios and Vitrage. This problem depends on the specific Nagios tests that you configure in your system, as well as on the Vitrage templates that you use. For example,

[openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

2017-01-05 Thread yinliyin
Hi all, Vitrage generate alarms acording to the templates. All the alarms raised by vitrage has the type "vitrage". Suppose Nagios has an alarm A. Alarm A is raised by vitrage evaluator according to the action part of a scenario, type of alarm A is "vitrage". If Nagios reported alarm A