@Julien - Can you please explain a bit on what actually you are checking 
and how are you concluding that the NFS is infact in hung state.

On Tuesday, March 3, 2020 at 10:48:02 PM UTC+5:30, Julien Pivotto wrote:
>
> Hi, 
>
> We have a dedicated job that collects disks metrics: 
>
> - job_name: node_disks 
>   params: 
>     collect[]: 
>     - diskstats 
>     - filefd 
>     - filesystem 
>     - mdadm 
>     - mountstats 
>     - nfs 
>     - nfsd 
> - job_name: node 
>   params: 
>     collect[]: 
>     - arp 
>     - bonding 
>     - conntrack 
>     - cpu 
>     - entropy 
>     - hwmon 
>     - infiniband 
>     - loadavg 
>     - meminfo 
>     - netclass 
>     - netdev 
>     - netstat 
>     - ntp 
>     - processes 
>     - sockstat 
>     - stat 
>     - textfile 
>     - time 
>     - timex 
>     - uname 
>     - vmstat 
>     - xfs 
>
> stale nfs will usually be noticed: 
>   up{job="node_disks"}==0 and 
> label_replace(up{job="node"}==1,"job","node_disks","","") 
> and second rule: 
>   node_filesystem_avail_bytes offset 8h unless node_filesystem_avail_bytes 
> and on(job, instance) up == 1 
>
>
> Those two expression seem to have worked fine for us in the past. 
>
>
> On 03 Mar 18:11, Ben Kochie wrote: 
> > We added some mitigation for filesystem hangs. The node_exporter will 
> > notice a stuck filesystem and stop attempting to gather metrics from it 
> > until it gets un-stuck. Although, I don't think we have any metrics for 
> > when that happens, only log errors. 
> > 
> > On Tue, Mar 3, 2020 at 6:03 PM Serkan Çoban <[email protected] 
> <javascript:>> wrote: 
> > 
> > > if I remember correctly node exporter will hang too when an nfs share 
> > > hangs. maybe you can test it... 
> > > 
> > > On Tue, Mar 3, 2020 at 6:26 PM Yagyansh S. Kumar 
> > > <[email protected] <javascript:>> wrote: 
> > > > 
> > > > I also thought about doing the same, but I am keeping that as a last 
> > > resort because that would require me to push the script to all my 
> 2500+ 
> > > servers. 
> > > > 
> > > > On Tuesday, March 3, 2020 at 8:46:27 PM UTC+5:30, Murali Krishna 
> > > Kanagala wrote: 
> > > >> 
> > > >> I would write a small shell script that tries to write to the nfs 
> > > mount  path and writes the status to a file which can be read by the 
> text 
> > > file collector. And schedule that shell script cron. I think this is 
> the 
> > > easiest solution. 
> > > >> 
> > > >> On Tue, Mar 3, 2020, 9:12 AM Yagyansh S. Kumar <
> [email protected]> 
> > > wrote: 
> > > >>> 
> > > >>> Already enabled the nfs and nfsd collectors. Till now I haven't 
> found 
> > > anything that can accurately give me the information about NFS hang. 
> > > >>> Correct me if I am wrong, but I don't think it is a good indicator 
> of 
> > > NFS hang as there may be times where no activity is happening on the 
> NFS, 
> > > but that does not mean that NFS is hanged. (eg. I have 25 NFS mounts 
> on one 
> > > of my servers, some of them are used rarely, so we won't find any 
> > > substantial IO on those mounts, but I need to know whether they are 
> > > accessible or not). Still, thanks for the suggestion, will try it out 
> once. 
> > > >>> 
> > > >>> 
> > > >>> On Tuesday, March 3, 2020 at 8:35:03 PM UTC+5:30, Murali Krishna 
> > > Kanagala wrote: 
> > > >>>> 
> > > >>>> Try enabling the nfs options in the node exporter config. It will 
> > > spit out some metrics about the nfs status. 
> > > >>>> 
> > > >>>> Also look at the disk IO metrics from node exporter and if you 
> see no 
> > > activity which indicates the nfs is not doing anything. 
> > > >>>> 
> > > >>>> On Tue, Mar 3, 2020, 7:10 AM Yagyansh S. Kumar <
> [email protected]> 
> > > wrote: 
> > > >>>>> 
> > > >>>>> I want to check if the NFS is hanged(i.e whether it is 
> accessible 
> > > from the server or not, and if yes then what is the response time it 
> is 
> > > getting). I know using the mountstats and nfs collector we have a lot 
> of 
> > > metrics for NFS, but haven't found any that can tell me every time the 
> NFS 
> > > hangs correctly. 
> > > >>>>> Thanks in advance. 
> > > >>>>> 
> > > >>>>> -- 
> > > >>>>> You received this message because you are subscribed to the 
> Google 
> > > Groups "Prometheus Users" group. 
> > > >>>>> To unsubscribe from this group and stop receiving emails from 
> it, 
> > > send an email to [email protected]. 
> > > >>>>> To view this discussion on the web visit 
> > > 
> https://groups.google.com/d/msgid/prometheus-users/06929518-d3b5-4c2f-9490-b08cc664d26b%40googlegroups.com
>  
> > > . 
> > > >>> 
> > > >>> -- 
> > > >>> You received this message because you are subscribed to the Google 
> > > Groups "Prometheus Users" group. 
> > > >>> To unsubscribe from this group and stop receiving emails from it, 
> send 
> > > an email to [email protected]. 
> > > >>> To view this discussion on the web visit 
> > > 
> https://groups.google.com/d/msgid/prometheus-users/1dda60cc-0b20-47da-87ff-4f1c76ce076f%40googlegroups.com
>  
> > > . 
> > > > 
> > > > -- 
> > > > You received this message because you are subscribed to the Google 
> > > Groups "Prometheus Users" group. 
> > > > To unsubscribe from this group and stop receiving emails from it, 
> send 
> > > an email to [email protected] <javascript:>. 
> > > > To view this discussion on the web visit 
> > > 
> https://groups.google.com/d/msgid/prometheus-users/832f2823-eab1-4f40-8f91-ddbc00190551%40googlegroups.com
>  
> > > . 
> > > 
> > > -- 
> > > You received this message because you are subscribed to the Google 
> Groups 
> > > "Prometheus Users" group. 
> > > To unsubscribe from this group and stop receiving emails from it, send 
> an 
> > > email to [email protected] <javascript:>. 
> > > To view this discussion on the web visit 
> > > 
> https://groups.google.com/d/msgid/prometheus-users/CAP9WWed%2BtxJVRSJc0mkCOkg6_neGAJRNEMq_hku87LPbYXAhjA%40mail.gmail.com
>  
> > > . 
> > > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "Prometheus Users" group. 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected] <javascript:>. 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com.
>  
>
>
> -- 
>  (o-    Julien Pivotto 
>  //\    Open-Source Consultant 
>  V_/_   Inuits - https://www.inuits.eu 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/a3e62e72-0e1e-41c3-b3eb-1b979cd50f08%40googlegroups.com.

Reply via email to