On 25/11/2020 16:27, Yagyansh S. Kumar wrote:


On Wed, 25 Nov, 2020, 9:34 pm Stuart Clark, <[email protected] <mailto:[email protected]>> wrote:

    Is the second instance still running?

    If you are having some cluster communications issues that could
    result in what you are seeing. Both instances learn of an alert
    but then one instance missed some of the renewal messages, so
    resolves it. Then it gets updated and the alert is fired again.

>> Sorry, my bad. I forgot I enabled the mesh again. I have 2 Alertmanager instances running and Prometheus is sending the data to both the Alertmanagers.
*
*
*Instance 1* - /usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager --data.retention=120h --log.level=debug --web.listen-address=x.x.x.x:9093 --cluster.listen-address=x.x.x.x:9094 --cluster.peer=y.y.y.y:9094

*Instance 2* - /usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --storage.path /mnt/vol2/alertmanager --data.retention=120h --log.level=debug --web.listen-address=y.y.y.y:9093 --cluster.listen-address=y.y.y.y:9094 --cluster.peer=x.x.x.x:9094

Snippet from Prometheus config where both the alertmanagers are defined.
alerting:
  alertmanagers:
  - static_configs:
    - targets:*
*
*      - 'x.x.x.x:9093'
*
*      - 'y.y.y.y:9093'*

    If you look in Prometheus (UI or ALERTS metric) does the alert
    continue for the whole period or does it have a gap?

>> In the last 1 day I do see 1 gap but the timing of this gap and the resolved notification does not match.
image.png

If the alert did continue throughout that suggests either a Prometheus -> Alertmanager communications issue (if enough updates are missed Alertmnager would assume the alert has been resolved) or a clustering issue (as mentioned you can end up with an instance being out of sync, again assuming an alert is resolved due to lack of updates).

Alertmanager does expose various metrics, including ones about the clustering. Do you see anything within those that matches roughly the times you saw the blip?

--
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/606e2b15-cd0e-ece6-482d-a55a2f1debea%40Jahingo.com.

Reply via email to