Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 02/02/2017, 15:43, "gordon chung"wrote: > > On 02/02/17 06:30 AM, Afek, Ifat (Nokia - IL) wrote: > > I understand. So clearly the use case of Vitrage raising alarms in Aodh is > > not relevant at the moment. > > We will have to think if over and see how Panko fits in the use case. > > if the use case is that you wanted to store history of Vitrage alarms, > yes, i believe it's better/easier stored in Panko. i hope i understand > your requirement now. Aodh does have the ability to create composite > alarms which are basically an alarm consisting of multiple sub alarms. i > don't know if that will help you guys? The history is not what we are looking for. Our idea was to provide visibility of the instances status, as known by Vitrage, through Aodh alarms API (so users that use Aodh will be aware of it). Regarding the composite alarms, I believe that Vitrage templates are more complex, as the condition depends on the resources topology and not only on other alarms. > i apologise this took so long to clarify. I thought I should be the one to apologize ;-) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 02/02/17 06:30 AM, Afek, Ifat (Nokia - IL) wrote: > I understand. So clearly the use case of Vitrage raising alarms in Aodh is > not relevant at the moment. > We will have to think if over and see how Panko fits in the use case. if the use case is that you wanted to store history of Vitrage alarms, yes, i believe it's better/easier stored in Panko. i hope i understand your requirement now. Aodh does have the ability to create composite alarms which are basically an alarm consisting of multiple sub alarms. i don't know if that will help you guys? i apologise this took so long to clarify. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 31/01/2017, 18:43, "gordon chung"wrote: > On 31/01/17 08:34 AM, Afek, Ifat (Nokia - IL) wrote: > > If you query Vitrage (or get a notification from Vitrage) and then you > > query Aodh, then Aodh will not return any additional information. But – if > > you query only Aodh, you will be aware of the fact that the instances are > > at risk. Without the integration, you will see that all instances are OK > > performance-wise, and you might mistakenly conclude that everything is else > > also fine. > > > i see. so the proposal was to have Aodh be the place where we collate > alarms from Aodh AND Vitrage. i agree, that's probably not what Aodh > should be doing (i'll still push that to Panko) > > would a possible workflow be to maybe have Vitrage send alert to Aodh > and for Aodh to listen to that event and reraise if needed? or if > vitrage can just reraise, then it can send that event to Panko so we can > see all information on that resource. I understand. So clearly the use case of Vitrage raising alarms in Aodh is not relevant at the moment. We will have to think if over and see how Panko fits in the use case. Thanks, Ifat. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 31/01/17 08:34 AM, Afek, Ifat (Nokia - IL) wrote: > If you query Vitrage (or get a notification from Vitrage) and then you query > Aodh, then Aodh will not return any additional information. But – if you > query only Aodh, you will be aware of the fact that the instances are at > risk. Without the integration, you will see that all instances are OK > performance-wise, and you might mistakenly conclude that everything is else > also fine. i see. so the proposal was to have Aodh be the place where we collate alarms from Aodh AND Vitrage. i agree, that's probably not what Aodh should be doing (i'll still push that to Panko) would a possible workflow be to maybe have Vitrage send alert to Aodh and for Aodh to listen to that event and reraise if needed? or if vitrage can just reraise, then it can send that event to Panko so we can see all information on that resource. my assumption right now is Vitrage itself is listening for a bunch of alerts (from zabbix, etc...) and has a set of 'composite' alarms which when it receives alert x and alert y, it 'deduces' that it should send an alert z? cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 30/01/2017, 19:11, "gordon chung"wrote: > > On 29/01/17 08:52 AM, Afek, Ifat (Nokia - IL) wrote: > > > > Vitrage could be enhanced to become an alarm orchestrator. > > The question is – do you want Vitrage to be one? > > And how would you describe the role of an alarm orchestrator/manager? > > > > i don't really have an opinion on the orchestrator role although it > seems to be leaning that way. > > i'll re-ask a question i had earlier since i'm not entirely clear of > proposal (if it's still relevant): > > if we store a vitrage alarm in aodh, what would the use case be for > querying it? the alarm occurred and vitrage has already sent a > notification warning. if i were to query aodh, what additional > information would i be retrieving? > If you query Vitrage (or get a notification from Vitrage) and then you query Aodh, then Aodh will not return any additional information. But – if you query only Aodh, you will be aware of the fact that the instances are at risk. Without the integration, you will see that all instances are OK performance-wise, and you might mistakenly conclude that everything is else also fine. Did I answer your question? Ifat. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 29/01/17 08:52 AM, Afek, Ifat (Nokia - IL) wrote: > On 26/01/2017, 20:09, "Julien Danjou"wrote: > >> On Thu, Jan 26 2017, gordon chung wrote: >> >>> On 26/01/17 11:41 AM, Julien Danjou wrote: >> >>> and vitrage would be an alarm orchestrator? >> >> Yup, something like that. It could be the one driving Zabbix and >> creating alarms for Zabbix in Aodh when a new host is plugged for >> example. > > Vitrage could be enhanced to become an alarm orchestrator. > The question is – do you want Vitrage to be one? > And how would you describe the role of an alarm orchestrator/manager? > > i don't really have an opinion on the orchestrator role although it seems to be leaning that way. i'll re-ask a question i had earlier since i'm not entirely clear of proposal (if it's still relevant): if we store a vitrage alarm in aodh, what would the use case be for querying it? the alarm occurred and vitrage has already sent a notification warning. if i were to query aodh, what additional information would i be retrieving? cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 26/01/2017, 20:09, "Julien Danjou"wrote: > On Thu, Jan 26 2017, gordon chung wrote: > > > On 26/01/17 11:41 AM, Julien Danjou wrote: > > > and vitrage would be an alarm orchestrator? > > Yup, something like that. It could be the one driving Zabbix and > creating alarms for Zabbix in Aodh when a new host is plugged for > example. Vitrage could be enhanced to become an alarm orchestrator. The question is – do you want Vitrage to be one? And how would you describe the role of an alarm orchestrator/manager? __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Thu, Jan 26 2017, gordon chung wrote: > On 26/01/17 11:41 AM, Julien Danjou wrote: >> So here's another question then: why wouldn't there be a "zabbix" alarm >> type in Aodh that could be created by a user (or another program) and >> that would be triggered by Aodh when Zabbix does something? >> Which is something that is really like the event alarm mechanism which >> already exists. Maybe all that's missing is a >> Zabbix-to-OpenStack-notification converter to have that feature? > > and vitrage would be an alarm orchestrator? Yup, something like that. It could be the one driving Zabbix and creating alarms for Zabbix in Aodh when a new host is plugged for example. Just thinking out loud. :) -- Julien Danjou # Free Software hacker # https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 26/01/17 11:41 AM, Julien Danjou wrote: > So here's another question then: why wouldn't there be a "zabbix" alarm > type in Aodh that could be created by a user (or another program) and > that would be triggered by Aodh when Zabbix does something? > Which is something that is really like the event alarm mechanism which > already exists. Maybe all that's missing is a > Zabbix-to-OpenStack-notification converter to have that feature? and vitrage would be an alarm orchestrator? -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Thu, Jan 26 2017, Afek, Ifat (Nokia - IL) wrote: > I’ll try to answer your question from a user perspective. Thanks for your explanation, it helped me a lot to understand how you view things. :) > Suppose a bridge has a bond of two physical ports, and Zabbix detects a signal > loss in one of them. This failure has no immediate effect on the host, > instances or applications, and will not be reflected anywhere in OpenStack. > > Vitrage will receive an alarm from Zabbix, identify the instances that will be > affected if the entire bond fails, and create deduced alarms that they are at > risk (if the other port fails they will become unreachable). Similarly, it > will > create alarms on the relevant applications. So when you say "create deduced alarms"… What does it mean? I understand the deduction, but I am not sure what it "creates" – 'cause then you say: > A user that checks Aodh will see that all alarms are in ‘ok’ state, which > might > be misleading. Which alarms? Could you be more precise? Where these alarms come from? Are they created by the users or by Vitrage automatically? If it's a CPU usage of its instance there's no reason for it to become red. If I recall correctly what you explained to me a while back, there are alarms created by Vitrage based on some rules, so I imagine these are the ones you talk about? > The user might determine that everything is ok with the instances that > Aodh is monitoring. If the user then checks Vitrage, he will see the > deduced alarms and understand that the instances and the applications > are at risk. From what I understood the user can't really check Vitrage (IIRC it does not really have a full API for users yet), right? > Does it make sense that the user will check Aodh *and* Vitrage? A standard > user > would like to see all of the alarms in one place, no matter which monitor was > responsible for triggering them. Yes: it does make sense for the user to check both because of the way Aodh+Vitrage are architectured right now. Does it make sense in term of user experience? I think we both agree that no it does not. Having a central place of alerting would be awesome. But does it make sense to force-fed Vitrage alarms and data model in Aodh? I am not sure right now. If I circle back again to UX, when a user requests Aodh, it only sees alarm he created and he managed. With generic alarms, the way it's pushed right now, there's going to be a bunch of generic thing the user has barely any clue about that can do things he has no idea – because it can't really do anything on Vitrage. And even if Vitrage had an API to manipulate the rules and all (I can easily imagine it's in the roadmap) that means it would manipulate deduction rules on the Vitrage API and then see things magically happen into his Aodh account. I find that… weird. It sounds a lot prone to failure and out-of-async between Aodh and Vitrage. Let's imagine another scenario/solution (which I am *not* advocating, it's just an exercise for thought): Vitrage would store its alarms (defined and created bases on its rules) in a database. It would then offer an access to it to Aodh (e.g. via an HTTP API). Then Aodh could query it. For example, when a user would ask Aodh to list the alarms, Aodh will return the alarms that are store in its own database (created by the user) and would also query Vitrage to return the list of alarms created by Vitrage rules (and their deducted state). What's the point of such a design? Well it's less prone to out-of-sync-ness and does not force any data model in Aodh that it has no use for. It also solves the problem of "having a central listing of alarms" for the user – the user does not have to be aware of Vitrage. Is it a good technical design? Probably not. It seems weird to make Aodh a bridge to Vitrage. And I think that's the whole thing I am not liking from the current proposal and the one I just invented. The way Aodh and Vitrage are bridged, the way Vitrage is built on top and outside of Aodh right now feels wobbly to me. So here's another question then: why wouldn't there be a "zabbix" alarm type in Aodh that could be created by a user (or another program) and that would be triggered by Aodh when Zabbix does something? Which is something that is really like the event alarm mechanism which already exists. Maybe all that's missing is a Zabbix-to-OpenStack-notification converter to have that feature? I'll stop that for now to let you reply or my mail is going to be way too long lol. > And a side note – you said that Aodh and Zabbix are exactly the same. I agree. > You can implement in Aodh everything that is implemented in Zabbix. But why do > that instead of just using that alarms that are already created by another > monitor? Oh no point, I was just making a point to be sure we were on the same line in term of understanding, and it seems we are. :) > Well… is this awesome enough? ;-) Yes thanks, I think this is a good example that will help us
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 25/01/2017, 17:12, "Julien Danjou"wrote: > On Wed, Jan 25 2017, Afek, Ifat (Nokia - IL) wrote: > > To circle back to the original point, the main question that I asked and > started this thread is: why, why Aodh should store Vitrage alarms? What > are the advantages, for both Aodh and Vitrage? > > So far the only answer I read is "well we though Aodh would be a central > storage place for alarm". So far it seems it has more drawbacks than > benefits: worst performances for Vitrage, confusion for users and more > complexity in Aodh. > > As I already said, I'm trying to be really objective on this. I just > really want someone to explain to me how awesome this will be and why we > should totally go toward this direction. :-) I’ll try to answer your question from a user perspective. Suppose a bridge has a bond of two physical ports, and Zabbix detects a signal loss in one of them. This failure has no immediate effect on the host, instances or applications, and will not be reflected anywhere in OpenStack. Vitrage will receive an alarm from Zabbix, identify the instances that will be affected if the entire bond fails, and create deduced alarms that they are at risk (if the other port fails they will become unreachable). Similarly, it will create alarms on the relevant applications. A user that checks Aodh will see that all alarms are in ‘ok’ state, which might be misleading. The user might determine that everything is ok with the instances that Aodh is monitoring. If the user then checks Vitrage, he will see the deduced alarms and understand that the instances and the applications are at risk. Does it make sense that the user will check Aodh *and* Vitrage? A standard user would like to see all of the alarms in one place, no matter which monitor was responsible for triggering them. And a side note – you said that Aodh and Zabbix are exactly the same. I agree. You can implement in Aodh everything that is implemented in Zabbix. But why do that instead of just using that alarms that are already created by another monitor? Well… is this awesome enough? ;-) Ifat. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Wed, Jan 25 2017, Afek, Ifat (Nokia - IL) wrote: > As we see it, alarms can be generated by different sources – Aodh, Vitrage, > Nagios, Zabbix, etc. I think "generated" is the wrong word here. Aodh does not generate any alarms: it allows users to create them. And then it evaluates them and triggers them. Nagios and Zabbix do *exactly* the same thing: users defined alarms and they are evaluated and triggered by Nagios/Zabbix. The particularity of Aodh is that it does gather nor store data itself (as Nagios and Zabbix do) but is only a definition and evaluation of alarms. So you can implement what Nagios and Zabbix do in Aodh. And you could use Nagios instead of Aodh (instead that it has no REST API so…). Vitrage seems to me to be a middle man, which indeed, seems to *generate* (create) alarms based on thing it sees triggered by Nagios, Zabiix or Aodh. IIUC. > Each source has its own expertise and internal > implementation. Nagios and Zabbix can raise alarms about the physical layer, > Aodh can raise threshold alarms and event alarms, and Vitrage can raise > deduced > alarms (e.g. if there is an alarm on a host, Vitrage will raise alarms on the > relevant instances and applications). I would prefer that you view Vitrage the > way you view Zabbix, as a project that has a way of evaluating some kinds of > problems in the system, and notify about them. This "specialization" you describe is entirely artificial. Aodh can triggers alarm on the physical layer. It already does if you monitor your hardware with e.g. SNMP or IPMI, puts data in Gnocchi and create alarm rules based on those metrics. And it could be extended to do more (that'd be cool :) What Vitrage does is using the existing software that might be (already) deployed (Nagios, Zabbix) and consolidate things. > The question is should there be a central place that provides information > about > *all* alarms gathered in the system, and this includes an API, database, > notification mechanism and history. We can implement these in Vitrage (as we > already integrate with different datasources and monitors), but we always had > in mind that this is part of Aodh project definition. I don't see in the case of Vitrage why alarms should be stored by Aodh and not by Nagios, for example. What the rationale? To circle back to the original point, the main question that I asked and started this thread is: why, why Aodh should store Vitrage alarms? What are the advantages, for both Aodh and Vitrage? So far the only answer I read is "well we though Aodh would be a central storage place for alarm". So far it seems it has more drawbacks than benefits: worst performances for Vitrage, confusion for users and more complexity in Aodh. As I already said, I'm trying to be really objective on this. I just really want someone to explain to me how awesome this will be and why we should totally go toward this direction. :-) Cheers, -- Julien Danjou ;; Free Software hacker ;; https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 25/01/17 08:39 AM, Afek, Ifat (Nokia - IL) wrote: > As we see it, alarms can be generated by different sources – Aodh, Vitrage, > Nagios, Zabbix, etc. Each source has its own expertise and internal > implementation. Nagios and Zabbix can raise alarms about the physical layer, > Aodh can raise threshold alarms and event alarms, and Vitrage can raise > deduced alarms (e.g. if there is an alarm on a host, Vitrage will raise > alarms on the relevant instances and applications). I would prefer that you > view Vitrage the way you view Zabbix, as a project that has a way of > evaluating some kinds of problems in the system, and notify about them. so the purpose of 'generic alarms' proposal was just to 'log' the alarm from vitrage in a central place? tbh, i don't know if that's what we want to store in aodh. i think it should ideally be handling active alarms, not past alarms. if we store a vitrage alarm in aodh, what would the use case be for querying it? the alarm occurred and vitrage has already sent a notification warning. if i were to query aodh, what additional information would i be retrieving? it would seem much more useful to send that information to panko so you can see that alarm event with other past events relating to the resource. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
Hi, Alarm history and a database are definitely important, but they are not the main issue here. As we see it, alarms can be generated by different sources – Aodh, Vitrage, Nagios, Zabbix, etc. Each source has its own expertise and internal implementation. Nagios and Zabbix can raise alarms about the physical layer, Aodh can raise threshold alarms and event alarms, and Vitrage can raise deduced alarms (e.g. if there is an alarm on a host, Vitrage will raise alarms on the relevant instances and applications). I would prefer that you view Vitrage the way you view Zabbix, as a project that has a way of evaluating some kinds of problems in the system, and notify about them. The question is should there be a central place that provides information about *all* alarms gathered in the system, and this includes an API, database, notification mechanism and history. We can implement these in Vitrage (as we already integrate with different datasources and monitors), but we always had in mind that this is part of Aodh project definition. What do you say? Best Regards, Ifat. On 25/01/2017, 13:19, "Julien Danjou"wrote: On Tue, Jan 24 2017, gordon chung wrote: > you mean, keep alarm history in aodh and also in panko if needed? i'm ok > with that. Yeah, IIRC there's an expirer in Aodh for alarm history based on TTL – that's enough. That should probably be replaced with just a hard limit on the number of history items you have (e.g. 100) and having them the older being dropped when the limit is hit. And if somebody wants a full audit control of what's done, Panko is the way to go (you know, bread crumbs ;-). -- Julien Danjou -- Free Software hacker -- https://julien.danjou.info __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Tue, Jan 24 2017, gordon chung wrote: > you mean, keep alarm history in aodh and also in panko if needed? i'm ok > with that. Yeah, IIRC there's an expirer in Aodh for alarm history based on TTL – that's enough. That should probably be replaced with just a hard limit on the number of history items you have (e.g. 100) and having them the older being dropped when the limit is hit. And if somebody wants a full audit control of what's done, Panko is the way to go (you know, bread crumbs ;-). -- Julien Danjou -- Free Software hacker -- https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 24/01/17 03:05 PM, Julien Danjou wrote: > I think Aodh emits notifications when something happens so it can be in > Panko indeed. I don't think it'd be fair to force Panko to have (a > recent) history though. :) i'm going to add a work item (for anyone): allow multiple notification topics on alarmchange... i actually have no idea what is consuming those alarm change notifications currently. you mean, keep alarm history in aodh and also in panko if needed? i'm ok with that. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Tue, Jan 24 2017, gordon chung wrote: > just curious, why doesn't vitrage send an event to aodh (on the error > topic) in this case rather than get nova to do it? if you created an > event alarm in aodh to check for vitrage error events could it solve the > use case? i don't know if we store the event details but i imagine we > could? or we could store the event id which can be linked to Panko > (Event storage) for full metadata information? I imagine they are or they could be forwarded as a payload when triggering the alarm action? > tbh, i think the alarm history should be in panko since it seems like a > pretty common use case to correlate an alarm event with the other events > in the system. I think Aodh emits notifications when something happens so it can be in Panko indeed. I don't think it'd be fair to force Panko to have (a recent) history though. :) -- Julien Danjou # Free Software hacker # https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On 24/01/17 03:01 AM, Afek, Ifat (Nokia - IL) wrote: > We understood that Aodh aims to be OpenStack alarming service, which is much > more than an ‘engine of alarm evaluation’ (as you wrote in your comment in > gerrit). If I may describe another use case for generic alarms - of OPNFV > Doctor: A monitor notifies about an alarm, e.g. a NIC failure. The inspector > (Vitrage in this case) receives the alarm, understands that the host is > affected, and raises an alarm on the host. This is currently implemented by > Vitrage calling nova force-down, and Nova sending a notification that is > converted to an event and then consumed by an Aodh event-alarm. > just curious, why doesn't vitrage send an event to aodh (on the error topic) in this case rather than get nova to do it? if you created an event alarm in aodh to check for vitrage error events could it solve the use case? i don't know if we store the event details but i imagine we could? or we could store the event id which can be linked to Panko (Event storage) for full metadata information? tbh, i think the alarm history should be in panko since it seems like a pretty common use case to correlate an alarm event with the other events in the system. > In his first commit, alexey_weyl suggested to add metadata, and you asked him > to call it ‘userdata’. Personally, I think that metadata is more accurate. It > is legitimate for an alarm to have additional data, in our example we need to > hold the resource id and an external alarm id. When you call it userdata, it > indeed sounds like ‘a user datastore’ (in your words), which is not the > purpose at all. > How about renaming it back to metadata? and how about adding it only to the > generic alarm, instead of to all alarms? i had no idea what 'userdata' field was... i'd much prefer it be 'metadata' even though it's a bit ambiguous. cheers, -- gord __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote: > Vitrage, and I assume that other projects, needs an “Alarm Manager”. The role > of an Alarm Manager is to store all alarms in the system, keep their history, > notify on changes, etc. Vitrage does not declare itself as an Alarm Manager, > mainly because we understood that this is the role of Aodh. What you describe is a database. There are tons of database out there that you can use to store data. Define a model and Vitrage can have its own data storage for alarms, metadata, whatever you want. Plus, it will be way more performant than using Aodh! :-) Creating alarms via the Aodh API just to make it store the alarms in a SQL database the alarms is kinda… pointless. Let Vitrage just use a database directly. Because in that case what are the perks of using Aodh, except saying that it uses Aodh. But indeed, I think what you describe is some kind of centralizer database. I don't think being a database has any interest for Aodh (nor for Vitrage). The only upside of using Aodh instead of a database directly would be to make alarms readable by the user. But that therefore exposes Vitrage internal datastore. And since user should not manipulate alarms created by Vitrage, directly, I don't see any gain in that either. Aodh is not /just/ an alarm data store. Its real features are in the evaluators and notifiers. So the whole "generic" alarm approach is about keeping the "define and store and notify alarms" part into Aodh while having the "evaluate/trigger alarms" being outisde Aodh (in this case in Vitrage). As I already said a while back, I think it's OK to have that and externalize the evaluator. But you also have to keep in mind the original use case and design of Aodh: being a user accessible API that provides alarm definition, evaluation and triggering. I hope that enlighten things a bit more! :-) Cheers, -- Julien Danjou /* Free Software hacker https://julien.danjou.info */ signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
Hi Julien, Before I reply to everything you wrote, I would like to ask a question that seems to be the core issue here. On 24/01/2017, 12:58, "Julien Danjou"wrote: On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote: > We understood that Aodh aims to be OpenStack alarming service, which is much > more than an ‘engine of alarm evaluation’ (as you wrote in your comment in > gerrit). Well, currently it's not really more than that. We've been to the path of "more more more and more" in Ceilometer and I don't think anybody can say it had great results – so you can understand how unadventurous and cautious we are in adding more things in Aodh. Vitrage, and I assume that other projects, needs an “Alarm Manager”. The role of an Alarm Manager is to store all alarms in the system, keep their history, notify on changes, etc. Vitrage does not declare itself as an Alarm Manager, mainly because we understood that this is the role of Aodh. From what you wrote, I understand that you do not see Aodh as an Alarm Manager. Is this correct? If so, how would you define the Aodh role? Also, if Aodh is not meant to serve as an Alarm Manager, where does this functionality belong in your opinion? Is there a need for another project for this purpose, or perhaps you disagree with the need for such a central alarming repository? Best Regards, Ifat. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [aodh][vitrage] Aodh generic alarms
On Tue, Jan 24 2017, Afek, Ifat (Nokia - IL) wrote: Hi Ifat, > We understood that Aodh aims to be OpenStack alarming service, which is much > more than an ‘engine of alarm evaluation’ (as you wrote in your comment in > gerrit). Well, currently it's not really more than that. We've been to the path of "more more more and more" in Ceilometer and I don't think anybody can say it had great results – so you can understand how unadventurous and cautious we are in adding more things in Aodh. > If I may describe another use case for generic alarms - of OPNFV > Doctor: A monitor notifies about an alarm, e.g. a NIC failure. The inspector > (Vitrage in this case) receives the alarm, understands that the host is > affected, and raises an alarm on the host. > This is currently implemented by Vitrage calling nova force-down, and > Nova sending a notification that is converted to an event and then > consumed by an Aodh event-alarm. I don't see why Vitrage must be involved in this scenario. If a "monitor" sees something e.g. a NIC failure, it should send a event stating that and Aodh could trigger an alarm. This alarm could call nova force-down, etc… > As the next phase in Doctor use case, for performance reasons, they might want > Vitrage to raise alarms also on the instances and applications [3]. We know > how > to raise these alarms, and we can send them directly to a VNFM or another > consumer. But we thought the right thing to do was to raise these alarms in > Aodh, and let the VNFM connect to Aodh. This is what I mean by ‘Aodh as the > alarming service of OpenStack’. Part of the problem is that Vitrage is a different evaluation engine – external to Aodh — and wants to use Aodh as a data storage (to store alarms, metadata and then trigger actions). So since the evaluation engine (Vitrage) is external to Aodh, it tends to bend Aodh to something it's not (a data storage for alarms). You mention performances reasons but if you really want performances, the real way to achieve them is to: Option 1: provide Vitrage functionalities embedded into Aodh as an evaluation engine Option 2: manage Vitrage alarms inside Vitrage directly It seems Vitrage decided not to pick option 2 because Aodh exists, which I think is a really good thing. Option 1 has not been picked, not based on technical issues, but on the social challenge that it represents. It means implementing (part of) Vitrage features in Aodh directly, which involves can be complicated as it means joining an existing project. :) > What do you think about this use case? do you want Aodh to take this role, as > the place where all OpenStack alarms are gathered and managed? I think that particular use case is valid, but the way I understand it, it barely needs Vitrage. It could/should be just Aodh doing this. (Or maybe I just misunderstood your use case, feel free to explain further :) > Now, about the details. > > In his first commit, alexey_weyl suggested to add metadata, and you asked him > to call it ‘userdata’. Personally, I think that metadata is more accurate. It > is legitimate for an alarm to have additional data, in our example we need to > hold the resource id and an external alarm id. When you call it userdata, it > indeed sounds like ‘a user datastore’ (in your words), which is not the > purpose > at all. The Aodh API is used by _users_. The data that are set in this in this field are set by _users_. Vitrage is an _user_ of the Aodh API. That's what I think they should be called userdata: Aodh has no use of this data. It's just a random payload that has no usage for Aodh. Though it's interesting that you mention it because I think it highlights how we might differ on how Aodh/Vitrage should interact. You're on the Vitrage side, so you basically see Aodh as being completely encompassing Aodh and "absorbed" by Vitrage and its use-cases. I guess it's normal, but that would lead to terrible design decision and generally bad UX for Aodh. I would agree for this field to be metadata, if it was used by Aodh as metadata used to _evaluate_ the alarm. But that's not the case, unless you move Vitrage evaluation engine inside Aodh, which could be interesting, but is a different way of building things. I hope I made things clearer. :) I have no intention on blocking our cooperation whatsoever, I'm just trying actually to bring the two projects closer as I am not even sure there should be two entirely distinct projects. But I don't think we should do technical bending to bypass social or political issues – we've done that before, and it blew up in our face later. Cheers, -- Julien Danjou # Free Software hacker # https://julien.danjou.info signature.asc Description: PGP signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe