From: Yujun Zhang
Date: Tuesday, 17 January 2017 at 02:41
Sounds good.
Have you created an etherpad page for collecting topics, Ifat?
Here: https://etherpad.openstack.org/p/vitrage-pike-design-sessions
Sounds good.
Have you created an etherpad page for collecting topics, Ifat?
On Mon, Jan 16, 2017 at 10:43 PM Afek, Ifat (Nokia - IL) <
ifat.a...@nokia.com> wrote:
>
>
> *From: *Yujun Zhang
> *Date: *Sunday, 15 January 2017 at 17:53
>
>
>
> About fault and alarm, what
From: Yujun Zhang
Date: Sunday, 15 January 2017 at 17:53
About fault and alarm, what I was thinking about the causal/deducing chain in
root cause analysis.
Fault state means the resource is not fully functional and it is evaluated by
related indicators. There are
About fault and alarm, what I was thinking about the causal/deducing chain
in root cause analysis.
Fault state means the resource is not fully functional and it is evaluated
by related indicators. There are alarms on events like power loss or
measurands like CPU high, memory low, temperature
From: Yujun Zhang
Date: Thursday, 12 January 2017 at 17:37
On Thu, Jan 12, 2017 at 5:12 PM Afek, Ifat (Nokia - IL)
> wrote:
'deduced' vs 'monitored' would be good enough for most cases. Unless we have
identify some real
Hi, Ifat
You comments is quite right. See my additional explanation inline.
On Thu, Jan 12, 2017 at 5:12 PM Afek, Ifat (Nokia - IL)
wrote:
>
>
> One possible solution would be introducing a high level (abstract)
> template from users view. Then convert it to Vitrage
Hi Yujun,
See my comments inline.
Ifat.
From: Yujun Zhang
Date: Wednesday, 11 January 2017 at 12:12
I have just realized abstract alarm is not a good term. What I was talking
about is fault and alarm.
Fault is what actually happens, and alarm is how it is detected
I have just realized abstract alarm is not a good term. What I was talking
about is *fault* and *alarm*.
Fault is what actually happens, and alarm is how it is detected (or
deduced).
On Wed, Jan 11, 2017 at 5:13 PM Yujun Zhang
wrote:
> Yes, if we consider the Vitrage
Yes, if we consider the Vitrage scenario evaluator as a pseudo monitor.
I think YinLiYin's idea is a reasonable requirement from end user. They
care more about the *real faults* in the system, not how they are detected.
Though it will bring much challenge to design and engineering, it creates
You are right. But as I see it, the case of Vitrage suspect vs. the real Nagios
alarm is just one example of the more general case of two monitors reporting
the same alarm.
Don’t you think so?
From: Yujun Zhang
Reply-To: "OpenStack Development Mailing List (not for
Hi, Ifat
If I understand it correctly, your concerns are mainly on same alarm from
different monitor, but not "suspect" status as discussed in another thread.
On Tue, Jan 10, 2017 at 10:21 PM Afek, Ifat (Nokia - IL) <
ifat.a...@nokia.com> wrote:
Hi Yinliyin,
At first I thought that changing
Hi Yinliyin,
At first I thought that changing the deduced to be a property on the alarm
might help in solving your use case. But now I think most of the problems will
remain the same:
* It won’t solve the general problem of two different monitors that raise
the same alarm
* It won’t
I prefer 2.b from instinct.
Not sure it could be linked to the vitrage_id[1] evolution. If an uuid is
created for the alarm, the implementation could be quite straightforward.
[1]: https://blueprints.launchpad.net/vitrage/+spec/standard-vitrage-id
On Tue, Jan 10, 2017 at 1:55 AM Afek, Ifat
Hi Yujun,
I understand the use case now, thanks for the detailed explanation.
Supporting this use case will require some development in Vitrage. Let me try
to list down the requirements and options that we have.
1. Requirement: Raise ‘suspect’ deduced alarms in Vitrage.
Implementation:
Hi Ifat,
I think there is a situation that all the alarms are reported by the
monitored system. We use vitrage to:
1. Found the relationships of the alarms, and find the root cause.
2. Deduce the alarm before it really occured. This comprise two
aspects:
Maybe I have missed something in the scenario template, but it seems you
have understood my idea quite correctly :-)
See further explanation inline
On Sun, Jan 8, 2017 at 3:06 PM Afek, Ifat (Nokia - IL)
wrote:
> Hi Yujun,
>
>
>
> Thanks for the explanation, but I still
Hi Yujun,
Thanks for the explanation, but I still don’t fully understand.
Let me start with the current state:
1. introduce a flexible `metadata` dict in to ALARM entity
[Ifat] Already exists. An alarm is represented as a vertex in the entity graph,
with a dictionary of properties.
2.
The two questions raised by YinLiYin is actually one, i.e. *how to enrich
the alarm properties *that can be used as an condition in root cause
deducing.
Both 'suspect' or 'datasource' are additional information that may be
referred as a condition in general fault model, a.k.a. scenario in
Hi YinLiYin,
This is an interesting question. Let me divide my answer to two parts.
First, the case that you described with Nagios and Vitrage. This problem
depends on the specific Nagios tests that you configure in your system, as well
as on the Vitrage templates that you use. For example,
Hi all,
Vitrage generate alarms acording to the templates. All the alarms raised by
vitrage has the type "vitrage". Suppose Nagios has an alarm A. Alarm A is
raised by vitrage evaluator according to the action part of a scenario, type of
alarm A is "vitrage". If Nagios reported alarm A
20 matches
Mail list logo