Re: [prometheus-users] alertmanager instance failure

Simon Pasquier Fri, 21 Feb 2020 06:50:22 -0800

You need to run Alertmanager instances on different machines and setup
HA as described in the README.md [1].
This way your setup will be resilient to (N-1) instances going down.
If you want to detect a failure in your monitoring pipeline, you need
to setup something like a dead man's snitch integration [2].


[1] https://github.com/prometheus/alertmanager#high-availability
[2] https://www.pagerduty.com/docs/guides/dead-mans-snitch-integration-guide/

On Wed, Feb 19, 2020 at 3:26 AM Dhiman Barman <[email protected]> wrote:
>
> Hi,
>
> We have a setup which has multiple prometheus instances and same number of 
> (alertmanager + webhook) instances.
> We have a docker which has both alertmanager and webhook processes running.
> If alertmanager webhook but not alertmanager process, how catastrophic is 
> this event ?
> What if both go down, how catastrophic is the event. Note if VM gets 
> rebooted, it might take a long time for the
> instances to come up. How much clustering will help in not dropping alerts ?
>
> Thanks,
> Dhiman
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/CA%2BLhoFwWabxJBHhaaZT3AsAORD_8sWmsdpNtA%3DsTotD8U8FkGg%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CAM6RFu5-ckY1Nq%2BJ274Hj4huq%2B%3DVvxiFHU6-p8ajoD0Xmc1VTg%40mail.gmail.com.

Re: [prometheus-users] alertmanager instance failure

Reply via email to