On Thursday, 2 December 2021 at 10:35:55 UTC Dj Fox wrote: > @Brian > Do you confirm that one of the main reason that Alertmanager cluster needs > to handle the same set of alerts as the others (and hence be plugged to all > same prometheuses) is because of the way deduplication works? >
Kind of, although I think it's more for redundancy. If Prometheus only sent to one alertmanager and that one failed, then you wouldn't get your alert. So it sends to all of them, and then those which are up should have a consistent view quickly. > - It would be cool to be able to tell each alertmanager: this is my > "alert-family". The deduplication mechanism would then only occur among the > members sharing the same "alert-family" value. That way, we are not forced > to connect 20 Proms to every alertmanager anymore, and all of them can > still gossip the valuable "silences". > That's basically having multiple alertmanager clusters. > - And/or being able to give to amtool a list of clusters so that it can > also handle several clusters. > Well, it is scriptable, and tools like karma will do this for you. Instead you could follow MR's suggested approach and build a single global alertmanager cluster, say three nodes in total, and point all the other prometheus servers at those three. I think that's reasonable: if a region becomes so isolated that it can't talk to that central alertmanager cluster, then it's probably so isolated that it couldn't send out an alert via E-mail either. -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9c1c58f4-4662-4862-95d9-8a0987977d5dn%40googlegroups.com.

