We added some mitigation for filesystem hangs. The node_exporter will notice a stuck filesystem and stop attempting to gather metrics from it until it gets un-stuck. Although, I don't think we have any metrics for when that happens, only log errors.
On Tue, Mar 3, 2020 at 6:03 PM Serkan Çoban <[email protected]> wrote: > if I remember correctly node exporter will hang too when an nfs share > hangs. maybe you can test it... > > On Tue, Mar 3, 2020 at 6:26 PM Yagyansh S. Kumar > <[email protected]> wrote: > > > > I also thought about doing the same, but I am keeping that as a last > resort because that would require me to push the script to all my 2500+ > servers. > > > > On Tuesday, March 3, 2020 at 8:46:27 PM UTC+5:30, Murali Krishna > Kanagala wrote: > >> > >> I would write a small shell script that tries to write to the nfs > mount path and writes the status to a file which can be read by the text > file collector. And schedule that shell script cron. I think this is the > easiest solution. > >> > >> On Tue, Mar 3, 2020, 9:12 AM Yagyansh S. Kumar <[email protected]> > wrote: > >>> > >>> Already enabled the nfs and nfsd collectors. Till now I haven't found > anything that can accurately give me the information about NFS hang. > >>> Correct me if I am wrong, but I don't think it is a good indicator of > NFS hang as there may be times where no activity is happening on the NFS, > but that does not mean that NFS is hanged. (eg. I have 25 NFS mounts on one > of my servers, some of them are used rarely, so we won't find any > substantial IO on those mounts, but I need to know whether they are > accessible or not). Still, thanks for the suggestion, will try it out once. > >>> > >>> > >>> On Tuesday, March 3, 2020 at 8:35:03 PM UTC+5:30, Murali Krishna > Kanagala wrote: > >>>> > >>>> Try enabling the nfs options in the node exporter config. It will > spit out some metrics about the nfs status. > >>>> > >>>> Also look at the disk IO metrics from node exporter and if you see no > activity which indicates the nfs is not doing anything. > >>>> > >>>> On Tue, Mar 3, 2020, 7:10 AM Yagyansh S. Kumar <[email protected]> > wrote: > >>>>> > >>>>> I want to check if the NFS is hanged(i.e whether it is accessible > from the server or not, and if yes then what is the response time it is > getting). I know using the mountstats and nfs collector we have a lot of > metrics for NFS, but haven't found any that can tell me every time the NFS > hangs correctly. > >>>>> Thanks in advance. > >>>>> > >>>>> -- > >>>>> You received this message because you are subscribed to the Google > Groups "Prometheus Users" group. > >>>>> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected]. > >>>>> To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/06929518-d3b5-4c2f-9490-b08cc664d26b%40googlegroups.com > . > >>> > >>> -- > >>> You received this message because you are subscribed to the Google > Groups "Prometheus Users" group. > >>> To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > >>> To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/1dda60cc-0b20-47da-87ff-4f1c76ce076f%40googlegroups.com > . > > > > -- > > You received this message because you are subscribed to the Google > Groups "Prometheus Users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/832f2823-eab1-4f40-8f91-ddbc00190551%40googlegroups.com > . > > -- > You received this message because you are subscribed to the Google Groups > "Prometheus Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/prometheus-users/CAP9WWed%2BtxJVRSJc0mkCOkg6_neGAJRNEMq_hku87LPbYXAhjA%40mail.gmail.com > . > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com.

