Re: [prometheus-users] Re: How auto-resolved an alarm ?

Julien Pivotto Wed, 22 Jun 2022 05:51:28 -0700

On 22 Jun 05:36, Loïc wrote:
> Thanks Brian for your reply. 
> 
> In my use case, if i want sent the error log into the alarm generated, i 
> should add the error message as label of my metric. The metric created by 
> mtail : 
> test_dbms_error[$container,$namespace,$pod_name,$domain,$productname,$setname,$message]
> As the error message is present in the metric, i can't created my sample 
> with value 0 at the start. Indeed, the content of error message is 
> dynamically registered from the log and i can't created the metric sample 
> before. 
> 
> This is why i would like use a alertmanager or prometheus parameter for 
> auto-resolv my rule. But it's not possible?



This is generally not recommended in prometheus, but you could do

del 
test_dbms_error[$container,$namespace,$pod_name,$domain,$productname,$setname,$message]
 after 5m

in mtail.

Note the "after 5m"

> 
> Loïc
> 
> 
> 
> 
> Le mercredi 22 juin 2022 à 12:11:40 UTC+2, Brian Candler a écrit :
> 
> > > When my alarm is firing, i would like auto-resolved it
> >
> > Alerts are generated by a PromQL expression ("expr:").  For as long as 
> > this returns a non-empty instance vector, the alert is firing.  When the 
> > result is empty, the alert stops.
> >
> > For example: I want to get an alert whenever the metric 
> > "megaraid_pd_media_errors" increases by more than 200.  But if it has been 
> > stable for 72 hours, I want the alert to go away.  This is what I do:
> >
> >   - alert: megaraid_pd_media_errors_rate
> >     expr: increase(megaraid_pd_media_errors[72h]) > 200
> >     for: 5m
> >     labels:
> >       severity: warning
> >     annotations:
> >       summary: 'Megaraid Physical Disk media error count increased by 
> > {{$value | humanize}} over 72h'
> >
> > Every time the expr is evaluated, it's looking over the most recent 72 
> > hours.  "increase" is like "rate", but its output is scaled up to the time 
> > period in question - i.e. instead of rate per second, it gives rate per 72 
> > hours in this case.
> >
> > > i tried to use the promql function rate but in this case my first 
> > occurence is missing. 
> >
> > "rate" (and "increase") calculate the rate between two data points.  If 
> > the timeseries has only one data point, it cannot give a result.  It cannot 
> > assume that the previous data point was zero, because in general that may 
> > not be the case: prometheus could have been started when the counter was 
> > already above zero.
> >
> > You should make your timeseries spring into existence with value 0 at the 
> > start.
> >
> > On Wednesday, 22 June 2022 at 09:27:52 UTC+1 Loïc wrote:
> >
> >> Hi,
> >>
> >> I use an exporter mtail to alerting when a pattern match into the 
> >> kubernetes logs. When my alarm is firing, i would like auto-resolved it. I 
> >> search how to use tje endsat parameter in my rule but i don't found.
> >>
> >> Also, i tried to use the promql function rate but in this case my first 
> >> occurence is missing. 
> >>   
> >> Have you an idea  ? 
> >>
> >> Thanks 
> >> Loïc
> >>
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/ee94ec73-8714-46b5-b6cd-1ec1cabcf93en%40googlegroups.com.


-- 
Julien Pivotto
@roidelapluie

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/YrMQRhFuSgfw0x65%40nixos.

Re: [prometheus-users] Re: How auto-resolved an alarm ?

Reply via email to