On 25/11/2020 16:27, Yagyansh S. Kumar wrote:
On Wed, 25 Nov, 2020, 9:34 pm Stuart Clark, <[email protected]
<mailto:[email protected]>> wrote:
Is the second instance still running?
If you are having some cluster communications issues that could
result in what you are seeing. Both instances learn of an alert
but then one instance missed some of the renewal messages, so
resolves it. Then it gets updated and the alert is fired again.
>> Sorry, my bad. I forgot I enabled the mesh again. I have 2
Alertmanager instances running and Prometheus is sending the data to
both the Alertmanagers.
*
*
*Instance 1* - /usr/local/bin/alertmanager --config.file
/etc/alertmanager/alertmanager.yml --storage.path
/mnt/vol2/alertmanager --data.retention=120h --log.level=debug
--web.listen-address=x.x.x.x:9093
--cluster.listen-address=x.x.x.x:9094 --cluster.peer=y.y.y.y:9094
*Instance 2* - /usr/local/bin/alertmanager --config.file
/etc/alertmanager/alertmanager.yml --storage.path
/mnt/vol2/alertmanager --data.retention=120h --log.level=debug
--web.listen-address=y.y.y.y:9093
--cluster.listen-address=y.y.y.y:9094 --cluster.peer=x.x.x.x:9094
Snippet from Prometheus config where both the alertmanagers are defined.
alerting:
alertmanagers:
- static_configs:
- targets:*
*
* - 'x.x.x.x:9093'
*
* - 'y.y.y.y:9093'*
If you look in Prometheus (UI or ALERTS metric) does the alert
continue for the whole period or does it have a gap?
>> In the last 1 day I do see 1 gap but the timing of this gap and the
resolved notification does not match.
image.png
If the alert did continue throughout that suggests either a Prometheus
-> Alertmanager communications issue (if enough updates are missed
Alertmnager would assume the alert has been resolved) or a clustering
issue (as mentioned you can end up with an instance being out of sync,
again assuming an alert is resolved due to lack of updates).
Alertmanager does expose various metrics, including ones about the
clustering. Do you see anything within those that matches roughly the
times you saw the blip?
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/606e2b15-cd0e-ece6-482d-a55a2f1debea%40Jahingo.com.