Re: [prometheus-users] Re: How auto-resolved an alarm ?

Loïc Wed, 22 Jun 2022 06:17:53 -0700

Hi Julien,

If i have no solution with prometheus configuration, i think indeed to 
delete the sample with mtail , that would resolved the alarm...  Do you 
know if this could cause the prometheus level ?
Are you using mtail/prometheus with this configuration?


Thanks
Loïc
Le mercredi 22 juin 2022 à 14:51:26 UTC+2, Julien Pivotto a écrit :

> On 22 Jun 05:36, Loïc wrote:
> > Thanks Brian for your reply. 
> > 
> > In my use case, if i want sent the error log into the alarm generated, i 
> > should add the error message as label of my metric. The metric created 
> by 
> > mtail : 
> > 
> test_dbms_error[$container,$namespace,$pod_name,$domain,$productname,$setname,$message]
> > As the error message is present in the metric, i can't created my sample 
> > with value 0 at the start. Indeed, the content of error message is 
> > dynamically registered from the log and i can't created the metric 
> sample 
> > before. 
> > 
> > This is why i would like use a alertmanager or prometheus parameter for 
> > auto-resolv my rule. But it's not possible? 
>
>
> This is generally not recommended in prometheus, but you could do
>
> del 
> test_dbms_error[$container,$namespace,$pod_name,$domain,$productname,$setname,$message]
>  
> after 5m
>
> in mtail.
>
> Note the "after 5m"
>
> > 
> > Loïc
> > 
> > 
> > 
> > 
> > Le mercredi 22 juin 2022 à 12:11:40 UTC+2, Brian Candler a écrit :
> > 
> > > > When my alarm is firing, i would like auto-resolved it
> > >
> > > Alerts are generated by a PromQL expression ("expr:"). For as long as 
> > > this returns a non-empty instance vector, the alert is firing. When 
> the 
> > > result is empty, the alert stops.
> > >
> > > For example: I want to get an alert whenever the metric 
> > > "megaraid_pd_media_errors" increases by more than 200. But if it has 
> been 
> > > stable for 72 hours, I want the alert to go away. This is what I do:
> > >
> > > - alert: megaraid_pd_media_errors_rate
> > > expr: increase(megaraid_pd_media_errors[72h]) > 200
> > > for: 5m
> > > labels:
> > > severity: warning
> > > annotations:
> > > summary: 'Megaraid Physical Disk media error count increased by 
> > > {{$value | humanize}} over 72h'
> > >
> > > Every time the expr is evaluated, it's looking over the most recent 72 
> > > hours. "increase" is like "rate", but its output is scaled up to the 
> time 
> > > period in question - i.e. instead of rate per second, it gives rate 
> per 72 
> > > hours in this case.
> > >
> > > > i tried to use the promql function rate but in this case my first 
> > > occurence is missing. 
> > >
> > > "rate" (and "increase") calculate the rate between two data points. If 
> > > the timeseries has only one data point, it cannot give a result. It 
> cannot 
> > > assume that the previous data point was zero, because in general that 
> may 
> > > not be the case: prometheus could have been started when the counter 
> was 
> > > already above zero.
> > >
> > > You should make your timeseries spring into existence with value 0 at 
> the 
> > > start.
> > >
> > > On Wednesday, 22 June 2022 at 09:27:52 UTC+1 Loïc wrote:
> > >
> > >> Hi,
> > >>
> > >> I use an exporter mtail to alerting when a pattern match into the 
> > >> kubernetes logs. When my alarm is firing, i would like auto-resolved 
> it. I 
> > >> search how to use tje endsat parameter in my rule but i don't found.
> > >>
> > >> Also, i tried to use the promql function rate but in this case my 
> first 
> > >> occurence is missing. 
> > >> 
> > >> Have you an idea ? 
> > >>
> > >> Thanks 
> > >> Loïc
> > >>
> > >
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected].
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/ee94ec73-8714-46b5-b6cd-1ec1cabcf93en%40googlegroups.com
> .
>
>
> -- 
> Julien Pivotto
> @roidelapluie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/d89a8e23-14ca-4097-a8f1-b3bdf22962dbn%40googlegroups.com.

Re: [prometheus-users] Re: How auto-resolved an alarm ?

Reply via email to