Hey again,

do you mean by "rate of errors" the ratio between errors and the total 
number of requests? If it is just the rate (as in the number of errors per 
second) you can just replace `increase` with `rate`. This will give you the 
errors per second averaged over the last 5 minutes.

How does the label mismatch manifest itself? Is it just the label names or 
do the values differ as well? Can you post the respective labels of 
interest to you? 

[email protected] schrieb am Mittwoch, 21. Oktober 2020 um 10:28:10 UTC+2:

> # Caculates HTTP error Responses total 
>   - record: windows:windows_iis_worker_request_errors_total:irate5m
>     expr: irate(windows_iis_worker_request_errors_total[5m])
>
>   - alert: IIS error requests rate
>     expr: 
> sum without () 
> (rate(windows:windows_iis_worker_request_errors_total:irate5m{status_code!="401"}[5m]))
>  > 3
>     for: 5m
>     labels:
>       severity: critical
>       component: WindowsOS
>     annotations:
>       summary: "High IIS worker error rate"
>       description: 
> "IIS http responses on {{ if $labels.fqdn }}{{ $labels.fqdn }}{{ else }}{{ 
> $labels.instance }}{{ end }}for {{ $labels.app }} has high rate of errors."
>       dashboard:
>       runbook:
>
> I'm trying to do something like this to alert on when people are getting 
> errors whilst trying to connect to a webapp, the issue is the query itself '
> windows_iis_worker_request_errors_total:irate5m' is returning non integer 
> values
>
> The idea was to evaluate over a rolling 5 minute window the number of 
> errors.
>
> of course in an ideal world I'd alert on the rate of errors using the 
> total requests metrics and dividing, however the two metrics have a label 
> mismatch and I am unsure how to perform that query.
>
> Would really appreciate any assistance!
>
> edit:
>
> Someone in the Prometheus developer group provided me with the followering 
> query which does work:
>
> sum by (fqdn, instance, app) 
> (increase(windows_iis_worker_request_errors_total{status_code!="401"}[5m]))
>
> However I was wondering if someone would still know how to get a query 
> working on the rate of errors rather than the increase in count despite the 
> label mismatch between the IIS total requests and IIS error request metrics.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/927e91ca-22e7-4a19-926b-4c3aa187d95dn%40googlegroups.com.

Reply via email to