The alertmanager documentation states that each Prometheus instance should send the alerts to each AM instance in a cluster: https://github.com/prometheus/alertmanager/blob/main/README.md#high-availability but from what I can see, these no explicit mention of distributing nodes over a large geographical region (WAN instead of LAN)
Brian Candler also mentions in this post that we shouldn't attempt any gossip or other network communications across regions : https://groups.google.com/g/prometheus-users/c/vyHn-727Vp0 Unfortunately I can't seem to find any documentation clearly stating that an alertmanager cluster spread over multiple regions (for example, 2 nodes in a DC in North America and 2 other nodes in a DC in Europe) will not work due to specific reasons. If ia relatively high speed network exists between birth regions and t's acceptable to potentially have a slightly higher latency, wouldn't it be feasible to have a cluster distributed this way? Considering the eventually consistent nature of Gossip, why doesn't this type of AM cluster more common? I understand that the added latency could potentially lead to duplicate alerts being sent to the destination receiver, but given receivers such ss victorops would have the incident triggered with the same ID, these should be essentially unaffected based on my understanding? The main purpose of this kind of configuration would be to adress the following : - have a single cluster to which silences to be managed - to ensure global redundancy if one region should become unavailable I would appreciate any feedback or advice on this topic. Thank you -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/prometheus-users/CAGp9Lzv%3DgpC9%3DK2twT7hyGxO%2BGhVHGqvch48wmDOwsBJMpcH7A%40mail.gmail.com.