Hi Greg,

First, I think that supporting alarm suppression in Vitrage is a very good idea.

One question that I have is: I understand that you plan to support it both in 
the UI and in the CLI. Do you want to the suppression to be per-user? 
per-tenant? global?

Regarding adding vitrage_alarm_type, my main concern is how the different 
datasources will fill this information. A monitor like Zabbix can have a lot of 
different alarms, and we will have to find a way to map them to the different 
alarm types. Aodh could also have its own alarm types, etc. I believe that some 
monitors will not use this property at all, which will cause:

·         No way to suppress some of the alarms by vitrage_alarm_type

·         Empty column in Vitrage alarms list

I think that your second suggestion, of the vitrage_type and regex, could work 
better. Is there any other reason to add the vitrage_alarm_type property, other 
than for suppression purposes?

Best Regards,
Ifat.


From: "Waines, Greg" <greg.wai...@windriver.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Date: Monday, 4 December 2017 at 15:34
To: "OpenStack Development Mailing List (not for usage questions)" 
<openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms 
by type and/or resource in Vitrage

Thinking about this more ...
·         Any thoughts on adding a ‘vitrage_alarm_type (enum or short string)’ 
as a mechanism to identify the general type of problem or alarm being reported 
in order to address this ?
o   could be an optional field
o   but we’d display in the alarm list
o   and we’d use it as the mechanism to suppress alarms by ‘type’


Other option:

·         wrt specifying which alarms to suppress,

o   could use combination of

§  ‘vitrage_type (enum)’ field - e.g. collectd, nagios, zabbix, vitrage, ...

§  and

§  a regexp on the ‘name (string)’ field

Thoughts ?
Greg.


From: Greg Waines <greg.wai...@windriver.com>
Reply-To: "openstack-dev@lists.openstack.org" 
<openstack-dev@lists.openstack.org>
Date: Friday, December 1, 2017 at 8:45 AM
To: "openstack-dev@lists.openstack.org" <openstack-dev@lists.openstack.org>
Subject: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by 
type and/or resource in Vitrage

Hey,

I am interested in getting some feedback on a proposed blueprint for Vitrage.

BLUEPRINT:

TITLE: Add the ability to ‘suppress’ alarms by Alarm Type and/or Resource

When managing a cloud, there are situations where a particular alarm or a set 
of alarms from a particular resource are occurring frequently, however they are 
identifying issues that are not of concern, at least for the time being.  For 
example, new hardware is in the process of being installed and resulting in 
alarms to occur, or remote servers (e.g. NTP Servers) are unreliable and result 
in frequent connectivity alarms.   In these situations, these irrelevant alarms 
are cluttering the alarm displays and it would be helpful to be able to 
suppress these alarms.

Suppressed alarms would not be shown in Active Alarm lists or Historical Alarm 
lists, and would not be included in alarm counts.
There would be a CLI Option / Horizon Button, to enable looking at Alarms that 
are currently suppressed.
( i.e. the idea would be that suppressed alarms would still be tracked, they 
just would not be displayed by default)

Thoughts on usefulness ?



Questions on how to implement this in Vitrage

·         from an end user’s point of view, alarms have the following fields

o   vitrage_id (uuid) - unique identifier of an instance of an alarm

o   vitrage_type (enum) - e.g. collectd, nagios, zabbix, vitrage, ...
                                      - really an identifier of the general 
entity reporting the alarm

o   name (string) - the alarm description

o   vitrage_resource_type (enum) - e.g. nova.instance, nova.host, port, ...

o   vitrage_resource_id (uuid) - resource instance

o   vitrage_aggregated_severity

o   vitrage_operational_severity

o   update_timestamp

·

·         there definitely is a specific resource identifier in order to be 
able to suppress alarms from a particular resource

·

·         BUT there doesn’t seem like there is a general alarm type field
i.e. that would classify the type of problem that’s occurring
e.g.

o   communication failure with compute host

o   loss-of-signal on port of compute host

o   loss of connectivity with NTP Server

o   CPU Threshold exceeded on compute host

o   Memory Threshold exceeded on compute host

o   File System Threshold exceeded on compute host

o   etc.

·         ... which would be type/granularity of ‘Alarm Type’ that i would 
think the user would want to suppress alarms based on.

·         i.e. it seems like the ‘name’ field is a combination of this general 
Alarm Type and details on the particular alarm.

·

·         Any thoughts on adding a ‘vitrage_alarm_type (enum or short string)’ 
as a mechanism to identify the general type of problem or alarm being reported 
in order to address this ?

o   could be an optional field

o   but we’d display in the alarm list

o   and we’d use it as the mechanism to suppress alarms by ‘type’

         Let me know what you think ?


Greg.







__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to