This seems like a lot of work, especially when I have to monitor over 2500+ servers. :P
On Tuesday, March 3, 2020 at 10:49:57 PM UTC+5:30, sayf eddine Hammemi wrote: > > If the node-exporter will log errors if the nfs share hangs then u can use > mtail for example to scrape node exporter log files and export nfs errors, > that would be better than using a hand made script. > > On Tue, Mar 3, 2020, 18:12 Ben Kochie <[email protected] <javascript:>> > wrote: > >> We added some mitigation for filesystem hangs. The node_exporter will >> notice a stuck filesystem and stop attempting to gather metrics from it >> until it gets un-stuck. Although, I don't think we have any metrics for >> when that happens, only log errors. >> >> On Tue, Mar 3, 2020 at 6:03 PM Serkan Çoban <[email protected] >> <javascript:>> wrote: >> >>> if I remember correctly node exporter will hang too when an nfs share >>> hangs. maybe you can test it... >>> >>> On Tue, Mar 3, 2020 at 6:26 PM Yagyansh S. Kumar >>> <[email protected] <javascript:>> wrote: >>> > >>> > I also thought about doing the same, but I am keeping that as a last >>> resort because that would require me to push the script to all my 2500+ >>> servers. >>> > >>> > On Tuesday, March 3, 2020 at 8:46:27 PM UTC+5:30, Murali Krishna >>> Kanagala wrote: >>> >> >>> >> I would write a small shell script that tries to write to the nfs >>> mount path and writes the status to a file which can be read by the text >>> file collector. And schedule that shell script cron. I think this is the >>> easiest solution. >>> >> >>> >> On Tue, Mar 3, 2020, 9:12 AM Yagyansh S. Kumar <[email protected]> >>> wrote: >>> >>> >>> >>> Already enabled the nfs and nfsd collectors. Till now I haven't >>> found anything that can accurately give me the information about NFS hang. >>> >>> Correct me if I am wrong, but I don't think it is a good indicator >>> of NFS hang as there may be times where no activity is happening on the >>> NFS, but that does not mean that NFS is hanged. (eg. I have 25 NFS mounts >>> on one of my servers, some of them are used rarely, so we won't find any >>> substantial IO on those mounts, but I need to know whether they are >>> accessible or not). Still, thanks for the suggestion, will try it out once. >>> >>> >>> >>> >>> >>> On Tuesday, March 3, 2020 at 8:35:03 PM UTC+5:30, Murali Krishna >>> Kanagala wrote: >>> >>>> >>> >>>> Try enabling the nfs options in the node exporter config. It will >>> spit out some metrics about the nfs status. >>> >>>> >>> >>>> Also look at the disk IO metrics from node exporter and if you see >>> no activity which indicates the nfs is not doing anything. >>> >>>> >>> >>>> On Tue, Mar 3, 2020, 7:10 AM Yagyansh S. Kumar < >>> [email protected]> wrote: >>> >>>>> >>> >>>>> I want to check if the NFS is hanged(i.e whether it is accessible >>> from the server or not, and if yes then what is the response time it is >>> getting). I know using the mountstats and nfs collector we have a lot of >>> metrics for NFS, but haven't found any that can tell me every time the NFS >>> hangs correctly. >>> >>>>> Thanks in advance. >>> >>>>> >>> >>>>> -- >>> >>>>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> >>>>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [email protected]. >>> >>>>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/06929518-d3b5-4c2f-9490-b08cc664d26b%40googlegroups.com >>> . >>> >>> >>> >>> -- >>> >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> >>> To unsubscribe from this group and stop receiving emails from it, >>> send an email to [email protected]. >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/1dda60cc-0b20-47da-87ff-4f1c76ce076f%40googlegroups.com >>> . >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> > To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/832f2823-eab1-4f40-8f91-ddbc00190551%40googlegroups.com >>> . >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Prometheus Users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/prometheus-users/CAP9WWed%2BtxJVRSJc0mkCOkg6_neGAJRNEMq_hku87LPbYXAhjA%40mail.gmail.com >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "Prometheus Users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com >> >> <https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "Prometheus Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cb697139-7540-4a52-86f2-3ad04d242c68%40googlegroups.com.

