These are my observations from a 1 week usage of node_filesystem_device_error for my production NFS mounts:
node_filesystem_device_error works quite well for hanging NFS. Whenever a NFS is in hung state node_filesystem_device_error will definitely indicate that there is an issue. But, this does not hold true for vice versa. i.e if node_filesystem_device_error == 1, it does not necessarily mean the NFS is in hung state, there might be some other reason that statfs call is failing. One of the reasons that I recently noticed is "Stale file handle". So, all in all if your NFS is in hung state, node_filesystem_device_error should definitely inform you about the same. On Sunday, March 15, 2020 at 3:56:19 AM UTC+5:30, Yagyansh S. Kumar wrote: > > Sure. Will absolutely do. > > On Sunday, March 15, 2020 at 3:30:59 AM UTC+5:30, Christian Hoffmann wrote: >> >> Hi, >> >> On 3/14/20 10:35 PM, Yagyansh S. Kumar wrote: >> > Yes, I did experiment with node_filesystem_device_error earlier based >> on >> > Ben's suggestion on my earlier thread, but not extensively. Also, I >> > didn't know it is Statfs success. With what I have read so far on this >> > matter, statfs is the best way to find your filesystem is hanging or >> > not. Hence, I'll definitely give node_filesystem_device_error another >> > try and see if I can come up with something interesting. >> Yeah, this should be it: >> >> https://github.com/prometheus/node_exporter/blob/master/collector/filesystem_linux.go#L78 >> >> >> Please report back with your results -- I'm also highly interested. :) >> >> Kind regards, >> Christian >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/43beb87e-f722-40ff-a0d5-24194c2127d8%40googlegroups.com.

