Firing immediately.

Stuart Clark Wed, 25 Nov 2020 08:04:34 -0800

Is the second instance still running?

If you are having some cluster communications issues that could result in what 
you are seeing. Both instances learn of an alert but then one instance missed 
some of the renewal messages, so resolves it. Then it gets updated and the 
alert is fired again.


If you look in Prometheus (UI or ALERTS metric) does the alert continue for the 
whole period or does it have a gap? 

On 25 November 2020 14:58:50 GMT, "Yagyansh S. Kumar" 
<[email protected]> wrote:
>On Wed, 25 Nov, 2020, 8:26 pm Stuart Clark, <[email protected]>
>wrote:
>
>> How many Alertmanager instances are there? Can they talk to each
>other and
>> is Prometheus configured and able to push alerts to them all?
>>
>>> Single instance as of now. I did setup a Alertmanager Mesh of 2
>Alertmanagers but I am facing duplicate alert issue in that setup.
>Another
>issue that is pending for me. Hence, currently only a single
>Alertmanager
>is receiving alerts from my Prometheus instance.
>
>On 25 November 2020 14:07:41 GMT, "Yagyansh S. Kumar" <
>> [email protected]> wrote:
>>>
>>> Hi Stuart.
>>>
>>> On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark,
><[email protected]>
>>> wrote:
>>>
>>>> On 25/11/2020 11:46, [email protected] wrote:
>>>> > The alert formation doesn't seem to be a problem here, because it
>>>> > happens for different alerts randomly. Below is the alert for
>Exporter
>>>> > being down for which it has happened thrice today.
>>>> >
>>>> >   - alert: ExporterDown
>>>> >     expr: up == 0
>>>> >     for: 10m
>>>> >     labels:
>>>> >       severity: "CRITICAL"
>>>> >     annotations:
>>>> >       summary: "Exporter down on *{{ $labels.instance }}*"
>>>> >       description: "Not able to fetch application metrics from
>*{{
>>>> > $labels.instance }}*"
>>>> >
>>>> > - the ALERTS metric shows what is pending or firing over time
>>>> > >> But the problem is that one of my ExporterDown alerts is
>active
>>>> > since the past 10 days, there is no genuine reason for the alert
>to go
>>>> > to a resolved state.
>>>> >
>>>> What do you have evaluation_interval set to in Prometheus, and
>>>> resolve_timeout in Alertmanager?
>>>>
>>> >> My evaluation interval is 1m whereas my scrape timeout and scrape
>>> interval are 25s. Resolve timeout in Alertmanager is 5m.
>>>
>>>>
>>>> Is the alert definitely being resolved, as in you are getting a
>resolved
>>>> email/notification, or could it just be an email/notification for a
>long
>>>> running alert? - you should get another email/notification every
>now and
>>>> then based on repeat_interval.
>>>>
>>> >> Yes, I suspected that too in the beginning but I am logging each
>and
>>> every alert notification and found that I am indeed getting resolved
>>> notification for that alert and again firing notification the very
>next
>>> second.
>>>
>>>>
>>>>
>>>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CC88CCF1-EA12-4239-BCCE-C132AAC3EAFE%40Jahingo.com.

Re: [prometheus-users] Alert goes to Firing --> Resolved --> Firing immediately.

Reply via email to