Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-02-02 Thread Afek, Ifat (Nokia - IL)
On 02/02/2017, 15:43, "gordon chung"  wrote:
>
> On 02/02/17 06:30 AM, Afek, Ifat (Nokia - IL) wrote:
> > I understand. So clearly the use case of Vitrage raising alarms in Aodh is 
> > not relevant at the moment.
> > We will have to think if over and see how Panko fits in the use case.
>
> if the use case is that you wanted to store history of Vitrage alarms, 
> yes, i believe it's better/easier stored in Panko. i hope i understand 
> your requirement now. Aodh does have the ability to create composite 
> alarms which are basically an alarm consisting of multiple sub alarms. i 
> don't know if that will help you guys?

The history is not what we are looking for. Our idea was to provide visibility 
of the instances status, as known by Vitrage, through Aodh alarms API (so users 
that use Aodh will be aware of it).

Regarding the composite alarms, I believe that Vitrage templates are more 
complex, as the condition depends on the resources topology and not only on 
other alarms. 

> i apologise this took so long to clarify.

I thought I should be the one to apologize ;-)



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-02-02 Thread gordon chung


On 02/02/17 06:30 AM, Afek, Ifat (Nokia - IL) wrote:
> I understand. So clearly the use case of Vitrage raising alarms in Aodh is 
> not relevant at the moment.
> We will have to think if over and see how Panko fits in the use case.

if the use case is that you wanted to store history of Vitrage alarms, 
yes, i believe it's better/easier stored in Panko. i hope i understand 
your requirement now. Aodh does have the ability to create composite 
alarms which are basically an alarm consisting of multiple sub alarms. i 
don't know if that will help you guys?

i apologise this took so long to clarify.

cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-02-02 Thread Afek, Ifat (Nokia - IL)
On 31/01/2017, 18:43, "gordon chung"  wrote:

> On 31/01/17 08:34 AM, Afek, Ifat (Nokia - IL) wrote:
> > If you query Vitrage (or get a notification from Vitrage) and then you 
> > query Aodh, then Aodh will not return any additional information. But – if 
> > you query only Aodh, you will be aware of the fact that the instances are 
> > at risk. Without the integration, you will see that all instances are OK 
> > performance-wise, and you might mistakenly conclude that everything is else 
> > also fine.
> >
> i see. so the proposal was to have Aodh be the place where we collate 
> alarms from Aodh AND Vitrage. i agree, that's probably not what Aodh 
> should be doing (i'll still push that to Panko)
>
> would a possible workflow be to maybe have Vitrage send alert to Aodh 
> and for Aodh to listen to that event and reraise if needed? or if 
> vitrage can just reraise, then it can send that event to Panko so we can 
> see all information on that resource. 

I understand. So clearly the use case of Vitrage raising alarms in Aodh is not 
relevant at the moment. 
We will have to think if over and see how Panko fits in the use case. 

Thanks,
Ifat.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-31 Thread gordon chung


On 31/01/17 08:34 AM, Afek, Ifat (Nokia - IL) wrote:
> If you query Vitrage (or get a notification from Vitrage) and then you query 
> Aodh, then Aodh will not return any additional information. But – if you 
> query only Aodh, you will be aware of the fact that the instances are at 
> risk. Without the integration, you will see that all instances are OK 
> performance-wise, and you might mistakenly conclude that everything is else 
> also fine.

i see. so the proposal was to have Aodh be the place where we collate 
alarms from Aodh AND Vitrage. i agree, that's probably not what Aodh 
should be doing (i'll still push that to Panko)

would a possible workflow be to maybe have Vitrage send alert to Aodh 
and for Aodh to listen to that event and reraise if needed? or if 
vitrage can just reraise, then it can send that event to Panko so we can 
see all information on that resource.

my assumption right now is Vitrage itself is listening for a bunch of 
alerts (from zabbix, etc...) and has a set of 'composite' alarms which 
when it receives alert x and alert y, it 'deduces' that it should send 
an alert z?

cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-31 Thread Afek, Ifat (Nokia - IL)
On 30/01/2017, 19:11, "gordon chung"  wrote:
>   
> On 29/01/17 08:52 AM, Afek, Ifat (Nokia - IL) wrote:
> >
> > Vitrage could be enhanced to become an alarm orchestrator.
> > The question is – do you want Vitrage to be one?
> > And how would you describe the role of an alarm orchestrator/manager?
> >
>
> i don't really have an opinion on the orchestrator role although it 
> seems to be leaning that way.
>
> i'll re-ask a question i had earlier since i'm not entirely clear of 
> proposal (if it's still relevant):
>
> if we store a vitrage alarm in aodh, what would the use case be for
> querying it? the alarm occurred and vitrage has already sent a
> notification warning. if i were to query aodh, what additional
> information would i be retrieving?
>

If you query Vitrage (or get a notification from Vitrage) and then you query 
Aodh, then Aodh will not return any additional information. But – if you query 
only Aodh, you will be aware of the fact that the instances are at risk. 
Without the integration, you will see that all instances are OK 
performance-wise, and you might mistakenly conclude that everything is else 
also fine. 

Did I answer your question?

Ifat.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-30 Thread gordon chung


On 29/01/17 08:52 AM, Afek, Ifat (Nokia - IL) wrote:
> On 26/01/2017, 20:09, "Julien Danjou"  wrote:
>
>> On Thu, Jan 26 2017, gordon chung wrote:
>>
>>> On 26/01/17 11:41 AM, Julien Danjou wrote:
>>
>>> and vitrage would be an alarm orchestrator?
>>
>> Yup, something like that. It could be the one driving Zabbix and
>> creating alarms for Zabbix in Aodh when a new host is plugged for
>> example.
>
> Vitrage could be enhanced to become an alarm orchestrator.
> The question is – do you want Vitrage to be one?
> And how would you describe the role of an alarm orchestrator/manager?
>
>

i don't really have an opinion on the orchestrator role although it 
seems to be leaning that way.

i'll re-ask a question i had earlier since i'm not entirely clear of 
proposal (if it's still relevant):

if we store a vitrage alarm in aodh, what would the use case be for
querying it? the alarm occurred and vitrage has already sent a
notification warning. if i were to query aodh, what additional
information would i be retrieving?

cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-29 Thread Afek, Ifat (Nokia - IL)
On 26/01/2017, 20:09, "Julien Danjou"  wrote:

> On Thu, Jan 26 2017, gordon chung wrote:
> 
> > On 26/01/17 11:41 AM, Julien Danjou wrote:
>
> > and vitrage would be an alarm orchestrator?
> 
> Yup, something like that. It could be the one driving Zabbix and
> creating alarms for Zabbix in Aodh when a new host is plugged for
> example.

Vitrage could be enhanced to become an alarm orchestrator. 
The question is – do you want Vitrage to be one? 
And how would you describe the role of an alarm orchestrator/manager? 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-26 Thread Julien Danjou
On Thu, Jan 26 2017, gordon chung wrote:

> On 26/01/17 11:41 AM, Julien Danjou wrote:
>> So here's another question then: why wouldn't there be a "zabbix" alarm
>> type in Aodh that could be created by a user (or another program) and
>> that would be triggered by Aodh when Zabbix does something?
>> Which is something that is really like the event alarm mechanism which
>> already exists. Maybe all that's missing is a
>> Zabbix-to-OpenStack-notification converter to have that feature?
>
> and vitrage would be an alarm orchestrator?

Yup, something like that. It could be the one driving Zabbix and
creating alarms for Zabbix in Aodh when a new host is plugged for
example.

Just thinking out loud. :)

-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-26 Thread gordon chung


On 26/01/17 11:41 AM, Julien Danjou wrote:
> So here's another question then: why wouldn't there be a "zabbix" alarm
> type in Aodh that could be created by a user (or another program) and
> that would be triggered by Aodh when Zabbix does something?
> Which is something that is really like the event alarm mechanism which
> already exists. Maybe all that's missing is a
> Zabbix-to-OpenStack-notification converter to have that feature?

and vitrage would be an alarm orchestrator?

-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-26 Thread Julien Danjou
On Thu, Jan 26 2017, Afek, Ifat (Nokia - IL) wrote:

> I’ll try to answer your question from a user perspective. 

Thanks for your explanation, it helped me a lot to understand how you
view things. :)

> Suppose a bridge has a bond of two physical ports, and Zabbix detects a signal
> loss in one of them. This failure has no immediate effect on the host,
> instances or applications, and will not be reflected anywhere in OpenStack.
>
> Vitrage will receive an alarm from Zabbix, identify the instances that will be
> affected if the entire bond fails, and create deduced alarms that they are at
> risk (if the other port fails they will become unreachable). Similarly, it 
> will
> create alarms on the relevant applications.

So when you say "create deduced alarms"… What does it mean? I understand
the deduction, but I am not sure what it "creates" – 'cause then you
say:

> A user that checks Aodh will see that all alarms are in ‘ok’ state, which 
> might
> be misleading.

Which alarms? Could you be more precise? Where these alarms come from?
Are they created by the users or by Vitrage automatically?
If it's a CPU usage of its instance there's no reason for it to become
red.

If I recall correctly what you explained to me a while back, there are
alarms created by Vitrage based on some rules, so I imagine these are
the ones you talk about?

> The user might determine that everything is ok with the instances that
> Aodh is monitoring. If the user then checks Vitrage, he will see the
> deduced alarms and understand that the instances and the applications
> are at risk.

From what I understood the user can't really check Vitrage (IIRC it does
not really have a full API for users yet), right?

> Does it make sense that the user will check Aodh *and* Vitrage? A standard 
> user
> would like to see all of the alarms in one place, no matter which monitor was
> responsible for triggering them.

Yes: it does make sense for the user to check both because of the way
Aodh+Vitrage are architectured right now. Does it make sense in term of
user experience? I think we both agree that no it does not. Having a
central place of alerting would be awesome.

But does it make sense to force-fed Vitrage alarms and data model in
Aodh? I am not sure right now. If I circle back again to UX, when a user
requests Aodh, it only sees alarm he created and he managed. With
generic alarms, the way it's pushed right now, there's going to be a
bunch of generic thing the user has barely any clue about that can do
things he has no idea – because it can't really do anything on Vitrage.

And even if Vitrage had an API to manipulate the rules and all (I can
easily imagine it's in the roadmap) that means it would manipulate
deduction rules on the Vitrage API and then see things magically happen
into his Aodh account. I find that… weird. It sounds a lot prone to
failure and out-of-async between Aodh and Vitrage.

Let's imagine another scenario/solution (which I am *not* advocating,
it's just an exercise for thought): Vitrage would store its alarms
(defined and created bases on its rules) in a database. It would then
offer an access to it to Aodh (e.g. via an HTTP API). Then Aodh could
query it.
For example, when a user would ask Aodh to list the alarms, Aodh will
return the alarms that are store in its own database (created by the
user) and would also query Vitrage to return the list of alarms created
by Vitrage rules (and their deducted state).

What's the point of such a design? Well it's less prone to
out-of-sync-ness and does not force any data model in Aodh that it has
no use for. It also solves the problem of "having a central listing of
alarms" for the user – the user does not have to be aware of Vitrage. Is
it a good technical design? Probably not. It seems weird to make Aodh a
bridge to Vitrage.

And I think that's the whole thing I am not liking from the current
proposal and the one I just invented. The way Aodh and Vitrage are
bridged, the way Vitrage is built on top and outside of Aodh right now
feels wobbly to me.

So here's another question then: why wouldn't there be a "zabbix" alarm
type in Aodh that could be created by a user (or another program) and
that would be triggered by Aodh when Zabbix does something?
Which is something that is really like the event alarm mechanism which
already exists. Maybe all that's missing is a
Zabbix-to-OpenStack-notification converter to have that feature?

I'll stop that for now to let you reply or my mail is going to be way
too long lol.

> And a side note – you said that Aodh and Zabbix are exactly the same. I agree.
> You can implement in Aodh everything that is implemented in Zabbix. But why do
> that instead of just using that alarms that are already created by another
> monitor?

Oh no point, I was just making a point to be sure we were on the same
line in term of understanding, and it seems we are. :)

> Well… is this awesome enough? ;-)

Yes thanks, I think this is a good example that will help us 

Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-26 Thread Afek, Ifat (Nokia - IL)
On 25/01/2017, 17:12, "Julien Danjou"  wrote:

> On Wed, Jan 25 2017, Afek, Ifat (Nokia - IL) wrote:
>  
> To circle back to the original point, the main question that I asked and
> started this thread is: why, why Aodh should store Vitrage alarms? What
> are the advantages, for both Aodh and Vitrage?
> 
> So far the only answer I read is "well we though Aodh would be a central
> storage place for alarm". So far it seems it has more drawbacks than
> benefits: worst performances for Vitrage, confusion for users and more
> complexity in Aodh.
> 
> As I already said, I'm trying to be really objective on this. I just
> really want someone to explain to me how awesome this will be and why we
> should totally go toward this direction. :-)

I’ll try to answer your question from a user perspective. 

Suppose a bridge has a bond of two physical ports, and Zabbix detects a signal 
loss in one of them. This failure has no immediate effect on the host, 
instances or applications, and will not be reflected anywhere in OpenStack. 

Vitrage will receive an alarm from Zabbix, identify the instances that will be 
affected if the entire bond fails, and create deduced alarms that they are at 
risk (if the other port fails they will become unreachable). Similarly, it will 
create alarms on the relevant applications.

A user that checks Aodh will see that all alarms are in ‘ok’ state, which might 
be misleading. The user might determine that everything is ok with the 
instances that Aodh is monitoring. If the user then checks Vitrage, he will see 
the deduced alarms and understand that the instances and the applications are 
at risk. 

Does it make sense that the user will check Aodh *and* Vitrage? A standard user 
would like to see all of the alarms in one place, no matter which monitor was 
responsible for triggering them.

And a side note – you said that Aodh and Zabbix are exactly the same. I agree. 
You can implement in Aodh everything that is implemented in Zabbix. But why do 
that instead of just using that alarms that are already created by another 
monitor?

Well… is this awesome enough? ;-)
Ifat.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-25 Thread Julien Danjou
On Wed, Jan 25 2017, Afek, Ifat (Nokia - IL) wrote:

> As we see it, alarms can be generated by different sources – Aodh, Vitrage,
> Nagios, Zabbix, etc.

I think "generated" is the wrong word here. Aodh does not generate any
alarms: it allows users to create them. And then it evaluates them and
triggers them.

Nagios and Zabbix do *exactly* the same thing: users defined alarms and
they are evaluated and triggered by Nagios/Zabbix. The particularity of
Aodh is that it does gather nor store data itself (as Nagios and Zabbix
do) but is only a definition and evaluation of alarms.

So you can implement what Nagios and Zabbix do in Aodh. And you could
use Nagios instead of Aodh (instead that it has no REST API so…).

Vitrage seems to me to be a middle man, which indeed, seems to
*generate* (create) alarms based on thing it sees triggered by Nagios,
Zabiix or Aodh. IIUC.

> Each source has its own expertise and internal
> implementation. Nagios and Zabbix can raise alarms about the physical layer,
> Aodh can raise threshold alarms and event alarms, and Vitrage can raise 
> deduced
> alarms (e.g. if there is an alarm on a host, Vitrage will raise alarms on the
> relevant instances and applications). I would prefer that you view Vitrage the
> way you view Zabbix, as a project that has a way of evaluating some kinds of
> problems in the system, and notify about them.

This "specialization" you describe is entirely artificial. Aodh can
triggers alarm on the physical layer. It already does if you monitor
your hardware with e.g. SNMP or IPMI, puts data in Gnocchi and create
alarm rules based on those metrics. And it could be extended to do more
(that'd be cool :)

What Vitrage does is using the existing software that might be (already)
deployed (Nagios, Zabbix) and consolidate things.

> The question is should there be a central place that provides information 
> about
> *all* alarms gathered in the system, and this includes an API, database,
> notification mechanism and history. We can implement these in Vitrage (as we
> already integrate with different datasources and monitors), but we always had
> in mind that this is part of Aodh project definition.

I don't see in the case of Vitrage why alarms should be stored by Aodh
and not by Nagios, for example. What the rationale?

To circle back to the original point, the main question that I asked and
started this thread is: why, why Aodh should store Vitrage alarms? What
are the advantages, for both Aodh and Vitrage?

So far the only answer I read is "well we though Aodh would be a central
storage place for alarm". So far it seems it has more drawbacks than
benefits: worst performances for Vitrage, confusion for users and more
complexity in Aodh.

As I already said, I'm trying to be really objective on this. I just
really want someone to explain to me how awesome this will be and why we
should totally go toward this direction. :-)

Cheers,
-- 
Julien Danjou
;; Free Software hacker
;; https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-25 Thread gordon chung


On 25/01/17 08:39 AM, Afek, Ifat (Nokia - IL) wrote:
> As we see it, alarms can be generated by different sources – Aodh, Vitrage, 
> Nagios, Zabbix, etc. Each source has its own expertise and internal 
> implementation. Nagios and Zabbix can raise alarms about the physical layer, 
> Aodh can raise threshold alarms and event alarms, and Vitrage can raise 
> deduced alarms (e.g. if there is an alarm on a host, Vitrage will raise 
> alarms on the relevant instances and applications). I would prefer that you 
> view Vitrage the way you view Zabbix, as a project that has a way of 
> evaluating some kinds of problems in the system, and notify about them.

so the purpose of 'generic alarms' proposal was just to 'log' the alarm 
from vitrage in a central place? tbh, i don't know if that's what we 
want to store in aodh. i think it should ideally be handling active 
alarms, not past alarms.

if we store a vitrage alarm in aodh, what would the use case be for 
querying it? the alarm occurred and vitrage has already sent a 
notification warning. if i were to query aodh, what additional 
information would i be retrieving?

it would seem much more useful to send that information to panko so you 
can see that alarm event with other past events relating to the resource.


cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-25 Thread Afek, Ifat (Nokia - IL)
Hi,

Alarm history and a database are definitely important, but they are not the 
main issue here.

As we see it, alarms can be generated by different sources – Aodh, Vitrage, 
Nagios, Zabbix, etc. Each source has its own expertise and internal 
implementation. Nagios and Zabbix can raise alarms about the physical layer, 
Aodh can raise threshold alarms and event alarms, and Vitrage can raise deduced 
alarms (e.g. if there is an alarm on a host, Vitrage will raise alarms on the 
relevant instances and applications). I would prefer that you view Vitrage the 
way you view Zabbix, as a project that has a way of evaluating some kinds of 
problems in the system, and notify about them.

The question is should there be a central place that provides information about 
*all* alarms gathered in the system, and this includes an API, database, 
notification mechanism and history. We can implement these in Vitrage (as we 
already integrate with different datasources and monitors), but we always had 
in mind that this is part of Aodh project definition.

What do you say?

Best Regards,
Ifat.


On 25/01/2017, 13:19, "Julien Danjou"  wrote:

On Tue, Jan 24 2017, gordon chung wrote:

> you mean, keep alarm history in aodh and also in panko if needed? i'm ok 
> with that.

Yeah, IIRC there's an expirer in Aodh for alarm history based on TTL –
that's enough. That should probably be replaced with just a hard limit on
the number of history items you have (e.g. 100) and having them the
older being dropped when the limit is hit.

And if somebody wants a full audit control of what's done, Panko is the
way to go (you know, bread crumbs ;-).

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-25 Thread Julien Danjou
On Tue, Jan 24 2017, gordon chung wrote:

> you mean, keep alarm history in aodh and also in panko if needed? i'm ok 
> with that.

Yeah, IIRC there's an expirer in Aodh for alarm history based on TTL –
that's enough. That should probably be replaced with just a hard limit on
the number of history items you have (e.g. 100) and having them the
older being dropped when the limit is hit.

And if somebody wants a full audit control of what's done, Panko is the
way to go (you know, bread crumbs ;-).

-- 
Julien Danjou
-- Free Software hacker
-- https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread gordon chung


On 24/01/17 03:05 PM, Julien Danjou wrote:
> I think Aodh emits notifications when something happens so it can be in
> Panko indeed. I don't think it'd be fair to force Panko to have (a
> recent) history though. :)

i'm going to add a work item (for anyone): allow multiple notification 
topics on alarmchange... i actually have no idea what is consuming those 
alarm change notifications currently.

you mean, keep alarm history in aodh and also in panko if needed? i'm ok 
with that.

cheers,
-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread Julien Danjou
On Tue, Jan 24 2017, gordon chung wrote:

> just curious, why doesn't vitrage send an event to aodh (on the error 
> topic) in this case rather than get nova to do it? if you created an 
> event alarm in aodh to check for vitrage error events could it solve the 
> use case? i don't know if we store the event details but i imagine we 
> could? or we could store the event id which can be linked to Panko 
> (Event storage) for full metadata information?

I imagine they are or they could be forwarded as a payload when
triggering the alarm action?

> tbh, i think the alarm history should be in panko since it seems like a 
> pretty common use case to correlate an alarm event with the other events 
> in the system.

I think Aodh emits notifications when something happens so it can be in
Panko indeed. I don't think it'd be fair to force Panko to have (a
recent) history though. :)

-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread gordon chung


On 24/01/17 03:01 AM, Afek, Ifat (Nokia - IL) wrote:
> We understood that Aodh aims to be OpenStack alarming service, which is much 
> more than an ‘engine of alarm evaluation’ (as you wrote in your comment in 
> gerrit). If I may describe another use case for generic alarms - of OPNFV 
> Doctor: A monitor notifies about an alarm, e.g. a NIC failure. The inspector 
> (Vitrage in this case) receives the alarm, understands that the host is 
> affected, and raises an alarm on the host. This is currently implemented by 
> Vitrage calling nova force-down, and Nova sending a notification that is 
> converted to an event and then consumed by an Aodh event-alarm.
>

just curious, why doesn't vitrage send an event to aodh (on the error 
topic) in this case rather than get nova to do it? if you created an 
event alarm in aodh to check for vitrage error events could it solve the 
use case? i don't know if we store the event details but i imagine we 
could? or we could store the event id which can be linked to Panko 
(Event storage) for full metadata information?

tbh, i think the alarm history should be in panko since it seems like a 
pretty common use case to correlate an alarm event with the other events 
in the system.

> In his first commit, alexey_weyl suggested to add metadata, and you asked him 
> to call it ‘userdata’. Personally, I think that metadata is more accurate. It 
> is legitimate for an alarm to have additional data, in our example we need to 
> hold the resource id and an external alarm id. When you call it userdata, it 
> indeed sounds like ‘a user datastore’ (in your words), which is not the 
> purpose at all.
> How about renaming it back to metadata? and how about adding it only to the 
> generic alarm, instead of to all alarms?

i had no idea what 'userdata' field was... i'd much prefer it be 
'metadata' even though it's a bit ambiguous.

cheers,
-- 
gord
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread Julien Danjou
On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote:

> Vitrage, and I assume that other projects, needs an “Alarm Manager”. The role
> of an Alarm Manager is to store all alarms in the system, keep their history,
> notify on changes, etc. Vitrage does not declare itself as an Alarm Manager,
> mainly because we understood that this is the role of Aodh.

What you describe is a database. There are tons of database out there
that you can use to store data. Define a model and Vitrage can have its
own data storage for alarms, metadata, whatever you want. Plus, it will
be way more performant than using Aodh! :-)

Creating alarms via the Aodh API just to make it store the alarms in a
SQL database the alarms is kinda… pointless. Let Vitrage just use a
database directly. Because in that case what are the perks of using
Aodh, except saying that it uses Aodh.

But indeed, I think what you describe is some kind of centralizer
database. I don't think being a database has any interest for Aodh (nor
for Vitrage). The only upside of using Aodh instead of a database
directly would be to make alarms readable by the user. But that
therefore exposes Vitrage internal datastore. And since user should not
manipulate alarms created by Vitrage, directly, I don't see any gain in
that either.

Aodh is not /just/ an alarm data store. Its real features are in the
evaluators and notifiers.

So the whole "generic" alarm approach is about keeping the "define and
store and notify alarms" part into Aodh while having the
"evaluate/trigger alarms" being outisde Aodh (in this case in Vitrage).

As I already said a while back, I think it's OK to have that and
externalize the evaluator. But you also have to keep in mind the
original use case and design of Aodh: being a user accessible API that
provides alarm definition, evaluation and triggering.

I hope that enlighten things a bit more! :-)

Cheers,
-- 
Julien Danjou
/* Free Software hacker
   https://julien.danjou.info */


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread Afek, Ifat (Nokia - IL)
Hi Julien,

Before I reply to everything you wrote, I would like to ask a question that 
seems to be the core issue here.

On 24/01/2017, 12:58, "Julien Danjou"  wrote:
On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote:

> We understood that Aodh aims to be OpenStack alarming service, which is 
much
> more than an ‘engine of alarm evaluation’ (as you wrote in your comment in
> gerrit).

Well, currently it's not really more than that. We've been to the path
of "more more more and more" in Ceilometer and I don't think anybody can
say it had great results – so you can understand how unadventurous and
cautious we are in adding more things in Aodh.

Vitrage, and I assume that other projects, needs an “Alarm Manager”. The role 
of an Alarm Manager is to store all alarms in the system, keep their history, 
notify on changes, etc.  Vitrage does not declare itself as an Alarm Manager, 
mainly because we understood that this is the role of Aodh. 

From what you wrote, I understand that you do not see Aodh as an Alarm Manager. 
Is this correct? If so, how would you define the Aodh role? Also, if Aodh is 
not meant to serve as an Alarm Manager, where does this functionality belong in 
your opinion? Is there a need for another project for this purpose, or perhaps 
you disagree with the need for such a central alarming repository?

Best Regards,
Ifat.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms

2017-01-24 Thread Julien Danjou
On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote:

Hi Ifat,

> We understood that Aodh aims to be OpenStack alarming service, which is much
> more than an ‘engine of alarm evaluation’ (as you wrote in your comment in
> gerrit).

Well, currently it's not really more than that. We've been to the path
of "more more more and more" in Ceilometer and I don't think anybody can
say it had great results – so you can understand how unadventurous and
cautious we are in adding more things in Aodh.

> If I may describe another use case for generic alarms - of OPNFV
> Doctor: A monitor notifies about an alarm, e.g. a NIC failure. The inspector
> (Vitrage in this case) receives the alarm, understands that the host is
> affected, and raises an alarm on the host.
> This is currently implemented by Vitrage calling nova force-down, and
> Nova sending a notification that is converted to an event and then
> consumed by an Aodh event-alarm.

I don't see why Vitrage must be involved in this scenario. If a
"monitor" sees something e.g. a NIC failure, it should send a event
stating that and Aodh could trigger an alarm.
This alarm could call nova force-down, etc…

> As the next phase in Doctor use case, for performance reasons, they might want
> Vitrage to raise alarms also on the instances and applications [3]. We know 
> how
> to raise these alarms, and we can send them directly to a VNFM or another
> consumer. But we thought the right thing to do was to raise these alarms in
> Aodh, and let the VNFM connect to Aodh. This is what I mean by ‘Aodh as the
> alarming service of OpenStack’. 

Part of the problem is that Vitrage is a different evaluation engine –
external to Aodh — and wants to use Aodh as a data storage (to store
alarms, metadata and then trigger actions). So since the evaluation
engine (Vitrage) is external to Aodh, it tends to bend Aodh to something
it's not (a data storage for alarms).

You mention performances reasons but if you really want performances,
the real way to achieve them is to:
Option 1: provide Vitrage functionalities embedded into Aodh as an
 evaluation engine
Option 2: manage Vitrage alarms inside Vitrage directly

It seems Vitrage decided not to pick option 2 because Aodh exists, which
I think is a really good thing. Option 1 has not been picked, not based
on technical issues, but on the social challenge that it represents. It
means implementing (part of) Vitrage features in Aodh directly, which
involves can be complicated as it means joining an existing project. :)

> What do you think about this use case? do you want Aodh to take this role, as
> the place where all OpenStack alarms are gathered and managed?

I think that particular use case is valid, but the way I understand it,
it barely needs Vitrage. It could/should be just Aodh doing this.
(Or maybe I just misunderstood your use case, feel free to explain
further :)
> Now, about the details. 
>
> In his first commit, alexey_weyl suggested to add metadata, and you asked him
> to call it ‘userdata’. Personally, I think that metadata is more accurate. It
> is legitimate for an alarm to have additional data, in our example we need to
> hold the resource id and an external alarm id. When you call it userdata, it
> indeed sounds like ‘a user datastore’ (in your words), which is not the 
> purpose
> at all. 

The Aodh API is used by _users_. The data that are set in this in this
field are set by _users_. Vitrage is an _user_ of the Aodh API. That's
what I think they should be called userdata: Aodh has no use of this
data. It's just a random payload that has no usage for Aodh.

Though it's interesting that you mention it because I think it
highlights how we might differ on how Aodh/Vitrage should interact.
You're on the Vitrage side, so you basically see Aodh as being
completely encompassing Aodh and "absorbed" by Vitrage and its
use-cases. I guess it's normal, but that would lead to terrible design
decision and generally bad UX for Aodh.

I would agree for this field to be metadata, if it was used by Aodh as
metadata used to _evaluate_ the alarm. But that's not the case, unless
you move Vitrage evaluation engine inside Aodh, which could be
interesting, but is a different way of building things.

I hope I made things clearer. :) I have no intention on blocking our
cooperation whatsoever, I'm just trying actually to bring the two
projects closer as I am not even sure there should be two entirely
distinct projects. But I don't think we should do technical bending to
bypass social or political issues – we've done that before, and it blew
up in our face later.

Cheers,
-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe