[prometheus-users] Re: Understanding the parameter endsAt and resend-delay.

Brian Candler Mon, 30 Aug 2021 04:32:19 -0700

Are you sure that's your problem?  Can you show your complete alerting rule 
and its enclosing rule group?


When starting an alert, the expression has to return a value for a certain 
amount of time ("for:") before the alert triggers.  But the converse is not 
true: if the expr value disappears, even for a single evaluation cycle, the 
alert is immediately resolved.

Therefore, try entering your alert expr in the PromQL browser, and look for 
any gaps in it.  Any gap will resolve the alert.

On Sunday, 29 August 2021 at 13:53:47 UTC+1 [email protected] wrote:

> Hi, 
>
> Recently, I've been debugging an issue where the alert is resolving even 
> though from prometheus it is in firing mode. 
> so, the cycle is firing->resolving->firing. 
>
> After going through some documents and blogs, I found out that 
> alertmanager will resolve the alert if the prometheus doesn't send the 
> alert within the "*resolve_timeout*".
> If, Prometheus now sends the *endsAt* field to the Alertmanager with a 
> very short timeout until AlertManager can mark the alert as resolved. This 
> overrides the *resolve_timeout* setting in AlertManager and creates the 
> firing->resolved->firing behavior if Prometheus does not resend the alert 
> before the short timeout.
>
> Is that understanding correct?
>
> *Questions as follows:*
> 1) How endsAt time is calculated? Is that calculated from *resend_delay*?
> 2) what is the default value of resend_delay or how to check this 
> configuration and in which file it is defined? 
>
> 3)* msg:Received alerts *logs are received when prometheus sends alerts 
> to alertmanager? msg:flushing* logs get *logged when? (below)
>
> 4)  evaluation_interval : 1m, and scrape_interval : 1m. then why did the 
> received alert at 12:34 and received alert at 12:36 have a time difference 
> of 2m? 
>      When I do get a request for alerts from alertmanager, I could see 
> endsat time is +4 minutes from the last received alert, why is that so? *Is 
> my resend_delay 4m? Because, I didn't set the resend_delay value.*
>
> *Below are the logs from alertmanager :*
>
> level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138 
> component=dispatcher msg="*Received alert*" 
> alert=disk_utilization[6356c43][active]
> level=debug ts=2021-08-29T12:34:40.342Z caller=dispatch.go:138 
> component=dispatcher msg="*Received alert*" 
> alert=disk_utilization[1db5352][active]
>
> level=debug ts=2021-08-29T12:34:40.381Z caller=dispatch.go:473 
> component=dispatcher 
> aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" 
> msg=flushing alerts="[disk_utilization[6356c43][active] 
> disk_utilization[1db5352][active]]"
> level=debug ts=2021-08-29T12:35:10.381Z caller=dispatch.go:473 
> component=dispatcher 
> aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" 
> msg=flushing alerts="[disk_utilization[6356c43][active] 
> disk_utilization[1db5352][active]]"
> level=debug ts=2021-08-29T12:35:40.382Z caller=dispatch.go:473 
> component=dispatcher 
> aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" 
> msg=flushing alerts="[disk_utilization[6356c43][active] 
> disk_utilization[1db5352][active]]"
> level=debug ts=2021-08-29T12:36:10.382Z caller=dispatch.go:473 
> component=dispatcher 
> aggrGroup="{}/{name=~\"^(?:test-1)$\"}:{alertname=\"disk_utilization\"}" 
> msg=flushing alerts="[disk_utilization[6356c43][active] 
> disk_utilization[1db5352][active]]"
>
> level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138 
> component=dispatcher msg="*Received alert*" 
> alert=disk_utilization[6356c43][active]
> level=debug ts=2021-08-29T12:36:40.345Z caller=dispatch.go:138 
> component=dispatcher msg="*Received alert*" 
> alert=disk_utilization[1db5352][active]
>
> Get request from alertmanager:
> curl http://10.233.49.116:9092/api/v1/alerts
> {"status":"success","data":[{"labels":{"alertname":"disk_utilization","device":"xx.xx.xx.xx:/media/test","fstype":"nfs4","instance":"xx.xx.xx.xx","job":"test-1","mountpoint":"/media/test","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk
>  
> utilization has crossed x%. Current Disk utilization = 
> 86.823044624783"},"startsAt":"2021-08-29T11:28:40.339802555Z",
> *"endsAt":"2021-08-29T12:40:40.339802555Z",*"generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"1db535212ea6dcf6"},{"labels":{"alertname":"disk_utilization","device":"test","fstype":"ext4","instance":"xx.xx.xx.xx","job":"Node_test-1","mountpoint":"/","node_name":"test-1","severity":"critical"},"annotations":{"summary":"Disk
>  
> utilization has crossed x%. Current Disk utilization = 
> 94.59612027578963"},"startsAt":"2021-08-29T11:28:40.339802555Z","
> *endsAt":"2021-08-29T12:40:40.339802555Z*
> ","generatorURL":"x","status":{"state":"active","silencedBy":[],"inhibitedBy":[]},"receivers":["test-1"],"fingerprint":"6356c43dc3589622"}]}
>
>
>
> thanks,
> Akshay
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/0af8717a-791a-49a1-9efc-f256273854b3n%40googlegroups.com.

[prometheus-users] Re: Understanding the parameter endsAt and resend-delay.

Reply via email to