Yes, I did experiment with node_filesystem_device_error earlier based on Ben's suggestion on my earlier thread, but not extensively. Also, I didn't know it is Statfs success. With what I have read so far on this matter, statfs is the best way to find your filesystem is hanging or not. Hence, I'll definitely give node_filesystem_device_error another try and see if I can come up with something interesting.
Thanks a lot for your help. Cheers! On Sunday, March 15, 2020 at 2:49:01 AM UTC+5:30, Christian Hoffmann wrote: > > On 3/14/20 10:01 PM, Yagyansh S. Kumar wrote: > > Also, since you mentioned hanging network filesystem, is there any > > way/logic to find out whether my NFS mount is hanged on a machine or > > not? I have busted my ass on getting this result, must have tried more > > than 50 things but still have nothing in this matter. > > In our setup we use a lot of NFS and some of the mounts are really > > critical. All these shared NFS mounts are taken from a 3rd party vendor > > and due to network lag or IP mismatch or 10 other reasons, the NFS ends > > up being hanged on a machine or two. I need to know whenever this > > happens. Anything that can be done here? > > I think I would aim for using the regular node_filesystem_device_error > metric nowadays, which is basically the Statfs sucess status. > > In earlier node_exporter times, a hung nfs mount could easily prevent > node_exporter from working reliably, which is why we still have nfs > excluded via --collector.filesystem.ignored-fs-types. However, since > #997 [1] this should have been improved. Therefore, I plan to give this > a go again. > > Other than that, there are nfs client metrics, but I'm not sure if you > can derive a hung / not hung result from that. > > I was about to link to another thread some weeks ago, but I just noticed > that it was started by you as well [2]. ;) > > I think that Ben's suggestion is basically the same. Julien's approach > regarding separation of collector's into different jobs (in the same > mail thread) also sounded interesting. > > Have you done some experiments with node_filesystem_device_error? > > Kind regards, > Christian > > > [1] https://github.com/prometheus/node_exporter/pull/997 > [2] > > https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com?utm_medium=email&utm_source=footer > > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/514d2573-a723-4c9e-8e0a-61c8188f989e%40googlegroups.com.

