Hi all,
I have a data model where some metrics are namespaced by client,
environment and deployment name. I am interested in creating a summary of
each deployment, where that summary is based on the number of alerts that
are present for each deployment.
I can get the deployments in the dev, uat, and prod environments using this
query:
group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"})
- 1 # returns the following by way of example:
{deployment="default",environment="dev",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="uat",tenant="tenant1"} 0
So we can see that tenant 1 has 2 deployments in 2 different environments,
whereas the other 2 have only one. group by returns a value of 1, so we
subtract 1 to get 0 for each deployment and I now wish to add to this the
number of alerts that are applicable to each deployment.
To get the alerts, I do this:
ALERTS{severity="warning"}
# returns something like this when there is an alert, the details in the
alert will vary, but will always have the `tenant`, `environment` and
`deployment` labels
ALERTS{alertname="HostSystemdServiceCrashed",alertstate="firing",instance="example",job="node",deployment="default",environment="dev",tenant="tenant1",name="example.service",severity="warning",state="failed",type="oneshot"}
1
# however, when there are no alerts, I get "no data" returned
I can't work out how to add the alerts to the deployments whilst retaining
the deployments for which there were no alerts returned:
(group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"})
-1) + on(tenant, environment, deployment) (ALERTS{severity="warning"})
# returns only data for the deployment for which there is an alert
{deployment="default",environment="dev",tenant="tenant1"} 1
# if there are no alerts, I get no data returned at all
What I want as output is this:
{deployment="default",environment="dev",tenant="tenant1"} 1
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
How can I achieve this?
*NOTE:*
If I use sum with or, then I get this, depending on the order of the
arguments to or:
(group by(tenant, environment, deployment)(up{environment=~"dev|uat|prod"})
-1) or sum by (tenant, environment, deployment) (ALERTS{severity="warning"}
)
# returns this, note the value in `tenant1|dev|default`
{deployment="default",environment="dev",tenant="tenant1"} 0
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
If I reverse the order of the parameters to or, I get what I am after:
{deployment="default",environment="dev",tenant="tenant1"} 1
{deployment="default",environment="uat",tenant="tenant1"} 0
{deployment="default",environment="prod",tenant="tenant2"} 0
{deployment="default",environment="prod",tenant="tenant3"} 0
But I'm stuck now if I want to do something like apply a weight to alerts
of a different severity level, e.g. (pseudocode):
summary = 0 + sum(warning alerts) + 2*sum(alerts(critical alerts))
This gives the same single value series, or no data if there are no alerts.
Question is also posted to Stack Overflow, in case someone would prefer to
answer there:
https://stackoverflow.com/questions/64803483/promql-how-to-add-values-when-there-is-no-data-returned
Many thanks,
John
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/550eac3e-45f2-42c9-adc3-c5d2b3480a14n%40googlegroups.com.