Re: [openstack-dev] [Horizon] [UX] Design for Alarming and Alarm Management

Martinez, Christian Tue, 10 Jun 2014 13:08:15 -0700

Here my feedback regarding the designs:

Page 2:


*         I think that the admin would probably want to filter alarms per user, 
project, name, meter_name, current_alarm_state("ok"="alarm ready"; 
"insufficient data" = "alarm not ready"; "alarm" ="alarm triggered"), but we 
don't have all that columns on the table. Maybe it will be better just to add 
columns for those fields, or have another tables or tabs that could allow the 
admin to see the alarms based on that parameters.

*         I would add a "delete alarm" button as a table action

*         Nice to have: if we are thinking about "combining alarms", maybe 
having a "combine alarm" button as table action that gets activated when the 
admin selects two or more alarms.

o   When the button is clicked, it should show something like the "Add Alarm" 
dialog, allowing the user to create a new combined alarm, based on their 
previous alarm selection

Page 3-5:

*         Love the workflow!

*         A couple of things related to the "Alarm When" setup:

o   Depending on the resource that is "selected" (from page 2) you would have a 
list of the possible meters to be considered. For example, if your resource is 
an instance, you would have the following list of meters: number of instances, 
cpu time used, Average CPU utilization, memory, etc. This will also affect the 
"threshold" unit to be used. In the design, there is a textbox that has a 
percentage label ("%") right next to it. The thing is that this "threshold" 
could be a percentage (for example, CPU utilization), but it could be a flat 
number as well (for example, number of instances on the project).

o   (Related to your point 5) There are two things related to combined alarms 
that we need to consider.

?  1) the combination can be between any type of alarm: you could combine 
alarms associated to different resources, meters, users? (Ceilometer expert 
will know). You even could combine combined alarms with other alarms as well. 
The AND and OR operation between the alarms can be used for combined alarms. 
For instance, combine two alarms with an OR operator

?  2) Adding two rules to match to a single alarm is not supported by 
Ceilometer. For that, you use combined alarms :). The idea of adding triggering 
rules to the alarm creation dialog is great for me, but I'm not sure if 
Ceilometer supports that.

Page 6:

*         Really liked the way that actions and state could be set, but we 
should see how the notifications will be handled. Maybe these actions could be 
set "by default" in our first version and after that, start thinking about 
setting custom actions for alarm states in the future (same for email add-on  
at the user settings)

Page 7:  "Viewing Alarm History" A.K.A: the alarms that have occurred.

*         Same as page 2: I think that the admin would probably want to filter 
alarms per user, project, name, meter_name, etc. (for instance, to see what 
alarms have being triggered on the project "X"), but we don't have that columns 
on the table. Maybe it will be better just to add columns for those fields, or 
have another tables or tabs that could allow the admin to see the alarms based 
on that parameters.

*         Is the alarm date column referring to the date in which the alarm was 
created or the date in which the alarm was triggered?

*         Is the alarm name content a link or a simple text? What would happen 
when the admin selects an alarm? Is It going to show the "update alarm dialog"? 
Are there any actions associated to the rows?

*         Maybe changing the name of the tab to "Activated alarms" or smth that 
actually it's interpreted as "in here you can see the alarms that have 
occurred".

Hope it helps

Cheers,
H

From: Liz Blanchard [mailto:[email protected]]
Sent: Monday, June 9, 2014 2:36 PM
To: Eoghan Glynn
Cc: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Horizon] [UX] Design for Alarming and Alarm 
Management

Hi all,

Thanks again for the great comments on the initial cut of wireframes. I've 
updated them a fair amount based on feedback in this e-mail thread along with 
the feedback written up here:
https://etherpad.openstack.org/p/alarm-management-page-design-discussion

Here is a link to the new version:
http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-06-05.pdf

And a quick explanation of the updates that I made from the last version:

1) Removed severity.

2) Added Status column. I also added details around the fact that users can 
enable/disable alerts.

3) Updated Alarm creation workflow to include choosing the project and user 
(optionally for filtering the resource list), choosing resource, and allowing 
for choose of amount of time to monitor for alarming.
     -Perhaps we could be even more sophisticated for how we let users filter 
down to find the right resources that they want to monitor for alarms?

4) As for notifying users...I've updated the "Alarms" section to be "Alarms 
History". The point here is to show any Alarms that have occurred to notify the 
user. Other notification ideas could be to allow users to get notified of 
alerts via e-mail (perhaps a user setting?). I've added a wireframe for this 
update in User Settings. Then the Alarms Management section would just be where 
the user creates, deletes, enables, and disables alarms. Do you still think we 
don't need the "alarms" tab? Perhaps this just becomes iteration 2 and is left 
out for now as you mention in your etherpad.

5) Question about combined alarms...currently I've designed it so that a user 
could create multiple levels in the "Alarm When..." section. They could combine 
these with AND/ORs. Is this going far enough? Or do we actually need to allow 
users to combine Alarms that might watch different resources?

6) I updated the Actions column to have the "More" drop down which is 
consistent with other tables in Horizon.

7) Added in a section in the "Add Alarm" workflow for "Actions after Alarm". 
I'm thinking we could have some sort of If State is X, do X type selections, 
but I'm looking to understand more details about how the backend works for this 
feature. Eoghan gave examples of logging and potentially scaling out via Heat. 
Would simple drop downs support these events?

8) I can definitely add in a "scheduling" feature with respect to Alarms. I 
haven't added it in yet, but I could see this being very useful in future 
revisions of this feature.

9) Another though is that we could add in some padding for outlier data as 
Eoghan mentioned. Perhaps a setting for "This has happened 3 times over the 
last minute, so now send an alarm."?

A new round of feedback is of course welcome :)

Best,
Liz

On Jun 4, 2014, at 1:27 PM, Liz Blanchard 
<[email protected]<mailto:[email protected]>> wrote:


Thanks for the excellent feedback on these, guys! I'll be working on making 
updates over the next week and will send a fresh link out when done. Anyone 
else with feedback, please feel free to fire away.

Best,
Liz
On Jun 4, 2014, at 12:33 PM, Eoghan Glynn 
<[email protected]<mailto:[email protected]>> wrote:



Hi Liz,

Two further thoughts occurred to me after hitting send on
my previous mail.

First, is the concept of alarm dimensioning; see my RDO Ceilometer
getting started guide[1] for an explanation of that notion.

"A key associated concept is the notion of dimensioning which defines the set 
of matching meters that feed into an alarm evaluation. Recall that meters are 
per-resource-instance, so in the simplest case an alarm might be defined over a 
particular meter applied to all resources visible to a particular user. More 
useful however would the option to explicitly select which specific resources 
we're interested in alarming on. On one extreme we would have narrowly 
dimensioned alarms where this selection would have only a single target 
(identified by resource ID). On the other extreme, we'd have widely dimensioned 
alarms where this selection identifies many resources over which the statistic 
is aggregated, for example all instances booted from a particular image or all 
instances with matching user metadata (the latter is how Heat identifies 
autoscaling groups)."

We'd have to think about how that concept is captured in the
UX for alarm creation/update.

Second, there are a couple of more advanced alarming features
that were added in Icehouse:

1. The ability to constrain alarms on time ranges, such that they
 would only fire say during 9-to-5 on a weekday. This would
 allow for example different autoscaling policies to be applied
 out-of-hours, when resource usage is likely to be cheaper and
 manual remediation less straight-forward.

2. The ability to exclude low-quality datapoints with anomolously
 low sample counts. This allows the leading edge of the trend of
 widely dimensioned alarms not to be skewed by eagerly-reporting
 outliers.

Perhaps not in a first iteration, but at some point it may make sense
to expose these more advanced features in the UI.

Cheers,
Eoghan

[1] http://openstack.redhat.com/CeilometerQuickStart



----- Original Message -----


Hi Liz,

Looks great!

Some thoughts on the wireframe doc:

* The description of form:

  "If CPU Utilization exceeds 80%, send alarm."

misses the time-window aspect of the alarm definition.

Whereas the boilerplate default descriptions generated by
ceilometer itself:

  "cpu_util > 70.0 during 3 x 600s"

captures this important info.

* The metric names, e.g. "CPU Utilization", are not an exact
match for the meter names used by ceilometer, e.g. "cpu_util".

* Non-admin users can create alarms in ceilometer:

"This is where admins can come in and
 define and edit any alarms they want
 the environment to use."

(though these alarms will only have visibility onto the stats
 that would be accessible to the user on behalf of whom the
 alarm is being evaluated)

* There's no concept currently of alarm severity.

* "Should users be able to enable/dis-able alarms."

Yes, the API allows for disabled (i.e. non-evaluated) alarms.

* "Should users be able to own/assign alarms?"

Only admin users can create an alarm on behalf of another
user/tenant.

* "Should users be able to acknowledge, close alarms?"

No, we have no concept of ACKing an alarm.

* "Admins can also see a full list of all Alarms that have
 taken place in the past."

In ceilometer terminology, we refer to this as alarm history
or alarm change events.

* "CPU Utilization exceeded 80%."

Again good to capture the duration in that description of the
event.

* "Within the Overview section, there should be a new tab that allows the
 user to click and view all Alarms that have occurred in their
 environment."

Not sure really what "environment" means here. Non-admin tenants only
have visibility to their own alarm, whereas admins have visibility to
all alarms.

* "This list would keep the latest  alarms."

Presumably this would be based on querying the alarm-history API,
as opposed to an assumption that Horizon is consuming the actual
alarm notifications?

Cheers,
Eoghan

----- Original Message -----

Hi All,

I've recently put together a set of wireframes[1] around Alarm Management
that would support the following blueprint:
https://blueprints.launchpad.net/horizon/+spec/ceilometer-alarm-management-page

If you have a chance it would be great to hear any feedback that folks have
on this direction moving forward with Alarms.

Best,
Liz

[1]
http://people.redhat.com/~lsurette/OpenStack/Alarm%20Management%20-%202014-05-30.pdf

_______________________________________________
OpenStack-dev mailing list
[email protected]<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Horizon] [UX] Design for Alarming and Alarm Management

Reply via email to