@Brian
Do you confirm that one of the main reason that Alertmanager cluster needs 
to handle the same set of alerts as the others (and hence be plugged to all 
same prometheuses) is because of the way deduplication works?
It's because of the 5s delay times the position of the alertmanager in the 
gossip cluster, right?
- It would be cool to be able to tell each alertmanager: this is my 
"alert-family". The deduplication mechanism would then only occur among the 
members sharing the same "alert-family" value. That way, we are not forced 
to connect 20 Proms to every alertmanager anymore, and all of them can 
still gossip the valuable "silences".
- And/or being able to give to amtool a list of clusters so that it can 
also handle several clusters.

@Matthias
Yes, the main benefits I see with your proposal is:
- *Silencings* are automatically propagated to all nodes
- And the *group_by* will become a "real" global group_by (instead of 
having 10 sub-optimal group_by that are only capable of grouping by region)
So this is very appealing.
However, I'm just worried about exposing myself more to network partitions 
with this solution? With that solution of far away alertmanagers, there is 
more probability that one node will become totally isolated, and will hence 
stop to deduplicate and send more alerts.
At least with 2 alertmanagers in the same region, those partitions are 
still a possibility (missconfiguration, firewall, local issue), but much 
less likely.
As I understand it, memberlist library handles network partitions very 
well? But I guess only if one node is partially isolated.
Does someone have more information to share about the behaviour of that 
Gossip Protocol during network partitions? There is low level documentation 
in the source code of the lib, but I'm struggling to find a more high-level 
documentation for it.
For now, I know that there are awareness/suspicion mechanism, but I'm not 
sure exactly how it works.
If Alertmanager has (through the memberlist lib) a high awareness of being 
not properly working, will it decide to stop sending alerts?

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/58981b30-96f8-4dcf-a9b7-8e9823498197n%40googlegroups.com.

Reply via email to