Hi all.

We recently introduced the disable grouping label aka ['...'] on our 0.20.0 
Alertmanager instances.

The label is used in our routes as shown in the piece of configuration 
below:

routes:
  - receiver: 'slack_primary'
    group_by: [...] # disables grouping
    continue: true
    match_re:
    stack: our_stack
    severity: warning|average|high|disaster

We have alerts which has a "stack" label and an "environment" label for 
staging and production clusters. Recently, we had a very awkward outage and 
some clusters went down for *both* environments. Since our current message 
templates expect just one alert, we ended up missing staging alerts in 
slack.

Of course I can change the template to iterate over the alerts but the 
question remains: is that a normal behaviour or should alerts be generated 
separately and that' s a bug?

One of the expressions that failed was this one:

envoy_cluster_health_check_healthy{envoy_cluster_name=~"name1|name2|name3"} 
== 0

We basically had several alerts from the expression above boiling down to 
those two:

envoy_cluster_health_check_healthy{envoy_cluster_name=<name>, 
environment="staging" }
envoy_cluster_health_check_healthy{envoy_cluster_name=<name>, 
environment="production" }

But in slack we got reported with just the "production" one because the two 
alerts were clustered and the template didn't take in account that.

We have currently split the two alerts as follows:

envoy_cluster_health_check_healthy{envoy_cluster_name=~"name1|name2|name3", 
environment="staging"} == 0

envoy_cluster_health_check_healthy{envoy_cluster_name=~"name1|name2|name3", 
environment="production"} == 0

Summing up, is that behaviour expected and we should absolutely change the 
templates and/or split the rules on environment? 

Thanks in advance,
F.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prometheus-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/e9096037-3065-43be-b5b0-e6de89803a54n%40googlegroups.com.

Reply via email to