You need to run Alertmanager instances on different machines and setup HA as described in the README.md [1]. This way your setup will be resilient to (N-1) instances going down. If you want to detect a failure in your monitoring pipeline, you need to setup something like a dead man's snitch integration [2].
[1] https://github.com/prometheus/alertmanager#high-availability [2] https://www.pagerduty.com/docs/guides/dead-mans-snitch-integration-guide/ On Wed, Feb 19, 2020 at 3:26 AM Dhiman Barman <[email protected]> wrote: > > Hi, > > We have a setup which has multiple prometheus instances and same number of > (alertmanager + webhook) instances. > We have a docker which has both alertmanager and webhook processes running. > If alertmanager webhook but not alertmanager process, how catastrophic is > this event ? > What if both go down, how catastrophic is the event. Note if VM gets > rebooted, it might take a long time for the > instances to come up. How much clustering will help in not dropping alerts ? > > Thanks, > Dhiman > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/CA%2BLhoFwWabxJBHhaaZT3AsAORD_8sWmsdpNtA%3DsTotD8U8FkGg%40mail.gmail.com. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAM6RFu5-ckY1Nq%2BJ274Hj4huq%2B%3DVvxiFHU6-p8ajoD0Xmc1VTg%40mail.gmail.com.

