Hi Stuart,

Yes I can see both cluster peers and showing information like the cluster
is ready.

[image: image.png]

Thanks,
Venkatraman N

On Tue, Jul 5, 2022 at 1:19 PM Stuart Clark <[email protected]>
wrote:

> Two alerts suggests that the two instances aren't talking to each other.
> How have you configured them? Does the UI show the "other" instance?
>
> On 5 July 2022 08:34:45 BST, Venkatraman Natarajan <[email protected]>
> wrote:
>>
>> Thanks Brian. I have used last_over_time query in our expression instead
>> of turning off auto-resolved.
>>
>> Also, we have two alert managers in our environment. Both are up and
>> running. But Nowadays, we are getting two alerts from two alert managers.
>> Could you please help me to sort this issue as well.?
>>
>> Please find the alert manager configuration.
>>
>>   alertmanager0:
>>     image: prom/alertmanager
>>     container_name: alertmanager0
>>     user: rootuser
>>     volumes:
>>       - ../data:/data
>>       - ../config/alertmanager.yml:/etc/alertmanager/alertmanager.yml
>>     command:
>>       - '--config.file=/etc/alertmanager/alertmanager.yml'
>>       - '--storage.path=/data/alert0'
>>       - '--cluster.listen-address=0.0.0.0:6783'
>>       - '--cluster.peer={{ IP Address }}:6783'
>>       - '--cluster.peer={{ IP Address }}:6783'
>>     restart: unless-stopped
>>     logging:
>>       driver: "json-file"
>>       options:
>>         max-size: "10m"
>>         max-file: "2"
>>     ports:
>>       - 9093:9093
>>       - 6783:6783
>>     networks:
>>       - network
>>
>> Regards,
>> Venkatraman N
>>
>>
>>
>> On Sat, Jun 25, 2022 at 9:05 PM Brian Candler <[email protected]>
>> wrote:
>>
>>> If probe_success becomes non-zero, even for a single evaluation
>>> interval, then the alert will be immediately resolved.  There is no delay
>>> on resolving, like there is for pending->firing ("for: 5m").
>>>
>>> I suggest you enter the alerting expression, e.g. "probe_success == 0",
>>> into the PromQL web interface (query browser), and switch to Graph view,
>>> and zoom in.  If you see any gaps in the graph, then the alert was resolved
>>> at that instant.
>>>
>>> Conversely, use the query
>>> probe_success{instance="xxx"} != 0
>>> to look at a particular timeseries, as identified by the label9s), and
>>> see if there are any dots shown where the label is non-zero.
>>>
>>> To make your alerts more robust you may need to use queries with range
>>> vectors, e.g. min_over_time(foo[5m]) or max_over_time(foo[5m]) or whatever.
>>>
>>> As a general rule though: you should consider carefully whether you want
>>> to send *any* notification for resolved alerts.  Personally, I have
>>> switched to send_resolved = false.  There are some good explanations here:
>>>
>>> https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped
>>>
>>> https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/
>>>
>>> You don't want to build a culture where people ignore alerts because the
>>> alert cleared itself - or is expected to clear itself.
>>>
>>> You want the alert condition to trigger a *process*, which is an
>>> investigation of *why* the alert happened, *what* caused it, whether the
>>> underlying cause has been fixed, and whether the alerting rule itself was
>>> wrong.  When all that has been investigated, manually close the ticket.
>>> The fact that the alert has gone below threshold doesn't mean that this
>>> work no longer needs to be done.
>>>
>>> On Saturday, 25 June 2022 at 13:27:22 UTC+1 [email protected] wrote:
>>>
>>>> Hi Team,
>>>>
>>>> We are having two prometheus and two alert managers in separate VMs as
>>>> containers.
>>>>
>>>> Alerts are getting auto resolved even though the issues are there as
>>>> per threshold.
>>>>
>>>> For example, if we have an alert rule called probe_success == 0 means
>>>> it is triggering an alert but after sometime the alert gets auto-resolved
>>>> because we have enabled send_resolved = true. But probe_success == 0 still
>>>> there so we don't want to auto resolve the alerts.
>>>>
>>>> Could you please help us on this.?
>>>>
>>>> Thanks,
>>>> Venkatraman N
>>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Prometheus Users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com
>>> <https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CANSgTEabhU93ZKy%3D6xF2dEMryH5VyBy0O6EiZzFOMquBUXD1sQ%40mail.gmail.com.

Reply via email to