Hi Stuart, Yes I can see both cluster peers and showing information like the cluster is ready.
[image: image.png] Thanks, Venkatraman N On Tue, Jul 5, 2022 at 1:19 PM Stuart Clark <[email protected]> wrote: > Two alerts suggests that the two instances aren't talking to each other. > How have you configured them? Does the UI show the "other" instance? > > On 5 July 2022 08:34:45 BST, Venkatraman Natarajan <[email protected]> > wrote: >> >> Thanks Brian. I have used last_over_time query in our expression instead >> of turning off auto-resolved. >> >> Also, we have two alert managers in our environment. Both are up and >> running. But Nowadays, we are getting two alerts from two alert managers. >> Could you please help me to sort this issue as well.? >> >> Please find the alert manager configuration. >> >> alertmanager0: >> image: prom/alertmanager >> container_name: alertmanager0 >> user: rootuser >> volumes: >> - ../data:/data >> - ../config/alertmanager.yml:/etc/alertmanager/alertmanager.yml >> command: >> - '--config.file=/etc/alertmanager/alertmanager.yml' >> - '--storage.path=/data/alert0' >> - '--cluster.listen-address=0.0.0.0:6783' >> - '--cluster.peer={{ IP Address }}:6783' >> - '--cluster.peer={{ IP Address }}:6783' >> restart: unless-stopped >> logging: >> driver: "json-file" >> options: >> max-size: "10m" >> max-file: "2" >> ports: >> - 9093:9093 >> - 6783:6783 >> networks: >> - network >> >> Regards, >> Venkatraman N >> >> >> >> On Sat, Jun 25, 2022 at 9:05 PM Brian Candler <[email protected]> >> wrote: >> >>> If probe_success becomes non-zero, even for a single evaluation >>> interval, then the alert will be immediately resolved. There is no delay >>> on resolving, like there is for pending->firing ("for: 5m"). >>> >>> I suggest you enter the alerting expression, e.g. "probe_success == 0", >>> into the PromQL web interface (query browser), and switch to Graph view, >>> and zoom in. If you see any gaps in the graph, then the alert was resolved >>> at that instant. >>> >>> Conversely, use the query >>> probe_success{instance="xxx"} != 0 >>> to look at a particular timeseries, as identified by the label9s), and >>> see if there are any dots shown where the label is non-zero. >>> >>> To make your alerts more robust you may need to use queries with range >>> vectors, e.g. min_over_time(foo[5m]) or max_over_time(foo[5m]) or whatever. >>> >>> As a general rule though: you should consider carefully whether you want >>> to send *any* notification for resolved alerts. Personally, I have >>> switched to send_resolved = false. There are some good explanations here: >>> >>> https://www.robustperception.io/running-into-burning-buildings-because-the-fire-alarm-stopped >>> >>> https://docs.google.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/ >>> >>> You don't want to build a culture where people ignore alerts because the >>> alert cleared itself - or is expected to clear itself. >>> >>> You want the alert condition to trigger a *process*, which is an >>> investigation of *why* the alert happened, *what* caused it, whether the >>> underlying cause has been fixed, and whether the alerting rule itself was >>> wrong. When all that has been investigated, manually close the ticket. >>> The fact that the alert has gone below threshold doesn't mean that this >>> work no longer needs to be done. >>> >>> On Saturday, 25 June 2022 at 13:27:22 UTC+1 [email protected] wrote: >>> >>>> Hi Team, >>>> >>>> We are having two prometheus and two alert managers in separate VMs as >>>> containers. >>>> >>>> Alerts are getting auto resolved even though the issues are there as >>>> per threshold. >>>> >>>> For example, if we have an alert rule called probe_success == 0 means >>>> it is triggering an alert but after sometime the alert gets auto-resolved >>>> because we have enabled send_resolved = true. But probe_success == 0 still >>>> there so we don't want to auto resolve the alerts. >>>> >>>> Could you please help us on this.? >>>> >>>> Thanks, >>>> Venkatraman N >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com >>> <https://groups.google.com/d/msgid/prometheus-users/68bff458-ee79-42ce-bafb-facd239e26aen%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CANSgTEabhU93ZKy%3D6xF2dEMryH5VyBy0O6EiZzFOMquBUXD1sQ%40mail.gmail.com.

