Does anyone know how alertmanager can be configured to allow permanent notify retries? If connection was lost to the webhook target for several hours, with my current setup none of the alerts that occurred during the outage would be sent, and no one would ever know something was amiss
To add more context, the retries cease after 1 min, and it does 12 retries in total. I was looking through the alertmanager code and it seems that in v0.21 (which is the one we are running) the retries should be endless, capped at 1 min per retry (if I'm reading the backoff timer code correctly) so it seems odd that the retries end after one minute Here's a sample of the error I see in the Alertmanager logs:level=error ts=2020-11-27T13:03:54.660Z caller=dispatch.go:309 component=dispatcher msg="Notify for alerts failed" num_alerts=3 err="sd_webhook/webhook[0]: notify retry canceled after 12 attempts: Post \"http://192.168.1.10:4444\": dial tcp 192.168.1.10:4444: connect: connection refused" -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90aa1e69-eff8-4302-a081-22de12059d37n%40googlegroups.com.

