We use node export to monitor os D state process, when D state process 
number > 500, it triggers pager alert rule like these

  - alert: Node_Process_In_D_State_Count_Critical
    expr: node_processes_state{state='D'} > 500
    for: 10m

but the problem is when OS running into problem status (too much D state 
process), looks like node export agent also running in problem status,  it 
can NOT report correct D state process metric to Prometheus server.
from the below screenshot, we can see some data points missing. This causes 
alert flapping, when data missing, the alert gets resolved.

is any way to avoid alert auto resolved when some data points missed? 

[image: Jietu20200603-154527.jpg] <about:invalid#zClosurez>





-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/f8560b07-9b00-4dfc-9671-667368ddd530%40googlegroups.com.

Reply via email to