We use node export to monitor os D state process, when D state process
number > 500, it triggers pager alert rule like these
- alert: Node_Process_In_D_State_Count_Critical
expr: node_processes_state{state='D'} > 500
for: 10m
but the problem is when OS running into problem status (too much D state
process), looks like node export agent also running in problem status, it
can NOT report correct D state process metric to Prometheus server.
from the below screenshot, we can see some data points missing. This causes
alert flapping, when data missing, the alert gets resolved.
is any way to avoid alert auto resolved when some data points missed?
[image: Jietu20200603-154527.jpg] <about:invalid#zClosurez>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/f8560b07-9b00-4dfc-9671-667368ddd530%40googlegroups.com.