Hi Stuart. On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark, <[email protected]> wrote:
> On 25/11/2020 11:46, [email protected] wrote: > > The alert formation doesn't seem to be a problem here, because it > > happens for different alerts randomly. Below is the alert for Exporter > > being down for which it has happened thrice today. > > > > - alert: ExporterDown > > expr: up == 0 > > for: 10m > > labels: > > severity: "CRITICAL" > > annotations: > > summary: "Exporter down on *{{ $labels.instance }}*" > > description: "Not able to fetch application metrics from *{{ > > $labels.instance }}*" > > > > - the ALERTS metric shows what is pending or firing over time > > >> But the problem is that one of my ExporterDown alerts is active > > since the past 10 days, there is no genuine reason for the alert to go > > to a resolved state. > > > What do you have evaluation_interval set to in Prometheus, and > resolve_timeout in Alertmanager? > >> My evaluation interval is 1m whereas my scrape timeout and scrape interval are 25s. Resolve timeout in Alertmanager is 5m. > > Is the alert definitely being resolved, as in you are getting a resolved > email/notification, or could it just be an email/notification for a long > running alert? - you should get another email/notification every now and > then based on repeat_interval. > >> Yes, I suspected that too in the beginning but I am logging each and every alert notification and found that I am indeed getting resolved notification for that alert and again firing notification the very next second. > > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFGi5vBO-T%3DxnZH5FSJBAKTLJp-%2BMDm4fWoHyc_HbwPh4UU3-g%40mail.gmail.com.

