On Wed, 25 Nov, 2020, 9:34 pm Stuart Clark, <[email protected]>
wrote:

> Is the second instance still running?
>
> If you are having some cluster communications issues that could result in
> what you are seeing. Both instances learn of an alert but then one instance
> missed some of the renewal messages, so resolves it. Then it gets updated
> and the alert is fired again.
>
>> Sorry, my bad. I forgot I enabled the mesh again. I have 2 Alertmanager
instances running and Prometheus is sending the data to both the
Alertmanagers.

*Instance 1* -  /usr/local/bin/alertmanager --config.file
/etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager
--data.retention=120h --log.level=debug --web.listen-address=x.x.x.x:9093
--cluster.listen-address=x.x.x.x:9094 --cluster.peer=y.y.y.y:9094

*Instance 2* - /usr/local/bin/alertmanager --config.file
/etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager
--data.retention=120h --log.level=debug --web.listen-address=y.y.y.y:9093
--cluster.listen-address=y.y.y.y:9094 --cluster.peer=x.x.x.x:9094

Snippet from Prometheus config where both the alertmanagers are defined.
alerting:
  alertmanagers:
  - static_configs:
    - targets:

*      - 'x.x.x.x:9093'*
*      - 'y.y.y.y:9093'*

If you look in Prometheus (UI or ALERTS metric) does the alert continue for
> the whole period or does it have a gap?
>
>> In the last 1 day I do see 1 gap but the timing of this gap and the
resolved notification does not match.
[image: image.png]



On 25 November 2020 14:58:50 GMT, "Yagyansh S. Kumar" <
> [email protected]> wrote:
>>
>>
>>
>> On Wed, 25 Nov, 2020, 8:26 pm Stuart Clark, <[email protected]>
>> wrote:
>>
>>> How many Alertmanager instances are there? Can they talk to each other
>>> and is Prometheus configured and able to push alerts to them all?
>>>
>> >> Single instance as of now. I did setup a Alertmanager Mesh of 2
>> Alertmanagers but I am facing duplicate alert issue in that setup. Another
>> issue that is pending for me. Hence, currently only a single Alertmanager
>> is receiving alerts from my Prometheus instance.
>>
>> On 25 November 2020 14:07:41 GMT, "Yagyansh S. Kumar" <
>>> [email protected]> wrote:
>>>>
>>>> Hi Stuart.
>>>>
>>>> On Wed, 25 Nov, 2020, 6:56 pm Stuart Clark, <[email protected]>
>>>> wrote:
>>>>
>>>>> On 25/11/2020 11:46, [email protected] wrote:
>>>>> > The alert formation doesn't seem to be a problem here, because it
>>>>> > happens for different alerts randomly. Below is the alert for
>>>>> Exporter
>>>>> > being down for which it has happened thrice today.
>>>>> >
>>>>> >   - alert: ExporterDown
>>>>> >     expr: up == 0
>>>>> >     for: 10m
>>>>> >     labels:
>>>>> >       severity: "CRITICAL"
>>>>> >     annotations:
>>>>> >       summary: "Exporter down on *{{ $labels.instance }}*"
>>>>> >       description: "Not able to fetch application metrics from *{{
>>>>> > $labels.instance }}*"
>>>>> >
>>>>> > - the ALERTS metric shows what is pending or firing over time
>>>>> > >> But the problem is that one of my ExporterDown alerts is active
>>>>> > since the past 10 days, there is no genuine reason for the alert to
>>>>> go
>>>>> > to a resolved state.
>>>>> >
>>>>> What do you have evaluation_interval set to in Prometheus, and
>>>>> resolve_timeout in Alertmanager?
>>>>>
>>>> >> My evaluation interval is 1m whereas my scrape timeout and scrape
>>>> interval are 25s. Resolve timeout in Alertmanager is 5m.
>>>>
>>>>>
>>>>> Is the alert definitely being resolved, as in you are getting a
>>>>> resolved
>>>>> email/notification, or could it just be an email/notification for a
>>>>> long
>>>>> running alert? - you should get another email/notification every now
>>>>> and
>>>>> then based on repeat_interval.
>>>>>
>>>> >> Yes, I suspected that too in the beginning but I am logging each and
>>>> every alert notification and found that I am indeed getting resolved
>>>> notification for that alert and again firing notification the very next
>>>> second.
>>>>
>>>>>
>>>>>
>>>>>
>>> --
>>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>>>
>>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAFGi5vCkYVcmsX%3DdmHuYVDdFsZcCwdS1B3i8-WOK9KL9_7DSGQ%40mail.gmail.com.

Reply via email to