Yes, I did experiment with node_filesystem_device_error earlier based on 
Ben's suggestion on my earlier thread, but not extensively. Also, I didn't 
know it is Statfs success. With what I have read so far on this matter, 
statfs is the best way to find your filesystem is hanging or not. Hence, 
I'll definitely give node_filesystem_device_error another try and see if I 
can come up with something interesting.

Thanks a lot for your help. Cheers!

On Sunday, March 15, 2020 at 2:49:01 AM UTC+5:30, Christian Hoffmann wrote:
>
> On 3/14/20 10:01 PM, Yagyansh S. Kumar wrote: 
> > Also, since you mentioned hanging network filesystem, is there any 
> > way/logic to find out whether my NFS mount is hanged on a machine or 
> > not? I have busted my ass on getting this result, must have tried more 
> > than 50 things but still have nothing in this matter. 
> > In our setup we use a lot of NFS and some of the mounts are really 
> > critical. All these shared NFS mounts are taken from a 3rd party vendor 
> > and due to network lag or IP mismatch or 10 other reasons, the NFS ends 
> > up being hanged on a machine or two. I need to know whenever this 
> > happens. Anything that can be done here? 
>
> I think I would aim for using the regular node_filesystem_device_error 
> metric nowadays, which is basically the Statfs sucess status. 
>
> In earlier node_exporter times, a hung nfs mount could easily prevent 
> node_exporter from working reliably, which is why we still have nfs 
> excluded via --collector.filesystem.ignored-fs-types. However, since 
> #997 [1] this should have been improved. Therefore, I plan to give this 
> a go again. 
>
> Other than that, there are nfs client metrics, but I'm not sure if you 
> can derive a hung / not hung result from that. 
>
> I was about to link to another thread some weeks ago, but I just noticed 
> that it was started by you as well [2]. ;) 
>
> I think that Ben's suggestion is basically the same. Julien's approach 
> regarding separation of collector's into different jobs (in the same 
> mail thread) also sounded interesting. 
>
> Have you done some experiments with node_filesystem_device_error? 
>
> Kind regards, 
> Christian 
>
>
> [1] https://github.com/prometheus/node_exporter/pull/997 
> [2] 
>
> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com?utm_medium=email&utm_source=footer
>  
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/514d2573-a723-4c9e-8e0a-61c8188f989e%40googlegroups.com.

Reply via email to