You can use something like
`avg_over_time(node_processes_state{state='D'}[10m])` to smooth over missed
scrapes. Depending on how sensitive you want this to be, you can also do
`max_over_time()`.

On Wed, Jun 3, 2020 at 9:49 AM 林浩 <[email protected]> wrote:

>
> We use node export to monitor os D state process, when D state process
> number > 500, it triggers pager alert rule like these
>
>   - alert: Node_Process_In_D_State_Count_Critical
>     expr: node_processes_state{state='D'} > 500
>     for: 10m
>
> but the problem is when OS running into problem status (too much D state
> process), looks like node export agent also running in problem status,  it
> can NOT report correct D state process metric to Prometheus server.
> from the below screenshot, we can see some data points missing. This
> causes alert flapping, when data missing, the alert gets resolved.
>
> is any way to avoid alert auto resolved when some data points missed?
>
> [image: Jietu20200603-154527.jpg]
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/f8560b07-9b00-4dfc-9671-667368ddd530%40googlegroups.com
> <https://groups.google.com/d/msgid/prometheus-users/f8560b07-9b00-4dfc-9671-667368ddd530%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/CABbyFmrkiD1ZT1p5fOfP%3DpdcW-yTMRDYbnAZLgRbz1eRuWykaw%40mail.gmail.com.

Reply via email to