Chris

Thanks for your reply.  The sys admins on my team had noticed that the 
issue is a corrupted block on our Thanos cluster.  We are working on 
upgrading the cluster to prevent the drops.

Joe

On Friday, February 14, 2020 at 9:32:43 AM UTC-8, Joe Devilla wrote:
>
> Hi
>
> I am using alertmanager to post alerts on slack.  Here is the 
> configuration of my alert:
>
> expr: <a query that takes 5 seconds>
> for: 60m
>
>
> Here are the settings on my alertmanager:
>
> global:
>   resolve_timeout: 5m
> route:
>   group_by: ['alertname', 'cluster']
>   group_interval: 5m
>   group_wait: 30s
>   receiver: "slack"
>   repeat_interval: 12h
>
>
> To enhance performance, I had created a recording rule so that the 5 second 
> query takes 100ms.
>
>
> I have two issues:
>
>
>
>    1. I was running into an issue where I was getting "toggling" on the slack 
> channel, meaning that the alert would be in an unresolved state, quickly be 
> resolved, then go back into an unresolved state.  In this case, the alert was 
> not actually being resolved.  When viewing prometheus, the alert would show 
> up, but when viewing the alertmanager, the alert would periodically disappear 
> than reappear.  Why would the alertmanager lose the alert only to have it 
> reappear seconds later?
>    2. What is the behavior for slack to send messages?  I would assume that 
> it would send messages on the following situations:
>       1. Alert goes into alarm
>       2. Alert goes out of alarm
>       3. num_firing on alert either increases or decreases
>    
>       When I look at my slack channel, despite the alertmanager settings 
> above, I would see messages posted at the following times:
>
>
>    1. 12:02AM
>    2. 12:08AM
>    3. 1:02AM
>    4. 1:08AM
>    5. 1:52AM
>    6. 2:53AM
>    7. 2:58AM
>    8. 3:18AM
>    9. 3:38AM
>    10. 4:23AM
>    11. 6:23AM
>    12. 6:43AM
>    13. 6:48AM
>    14. 6:53AM
>    15. 6:59AM
>    16. 8:39AM
>    17. 8:54AM
>    18. 9:04AM
>    19. 9:19AM
>
> In summary, I had 2 questions:
>
>
>    1. Why would alertmanager be dropping alerts?
>    2. Why is the alertmanager sending messages to slack at non-determinant 
> times?  
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f6531c26-856f-41a5-b0f4-21a5cef102da%40googlegroups.com.

Reply via email to