@Julien - Can you please explain a bit on what actually you are checking
and how are you concluding that the NFS is infact in hung state.
On Tuesday, March 3, 2020 at 10:48:02 PM UTC+5:30, Julien Pivotto wrote:
>
> Hi,
>
> We have a dedicated job that collects disks metrics:
>
> - job_name: node_disks
> params:
> collect[]:
> - diskstats
> - filefd
> - filesystem
> - mdadm
> - mountstats
> - nfs
> - nfsd
> - job_name: node
> params:
> collect[]:
> - arp
> - bonding
> - conntrack
> - cpu
> - entropy
> - hwmon
> - infiniband
> - loadavg
> - meminfo
> - netclass
> - netdev
> - netstat
> - ntp
> - processes
> - sockstat
> - stat
> - textfile
> - time
> - timex
> - uname
> - vmstat
> - xfs
>
> stale nfs will usually be noticed:
> up{job="node_disks"}==0 and
> label_replace(up{job="node"}==1,"job","node_disks","","")
> and second rule:
> node_filesystem_avail_bytes offset 8h unless node_filesystem_avail_bytes
> and on(job, instance) up == 1
>
>
> Those two expression seem to have worked fine for us in the past.
>
>
> On 03 Mar 18:11, Ben Kochie wrote:
> > We added some mitigation for filesystem hangs. The node_exporter will
> > notice a stuck filesystem and stop attempting to gather metrics from it
> > until it gets un-stuck. Although, I don't think we have any metrics for
> > when that happens, only log errors.
> >
> > On Tue, Mar 3, 2020 at 6:03 PM Serkan Çoban <[email protected]
> <javascript:>> wrote:
> >
> > > if I remember correctly node exporter will hang too when an nfs share
> > > hangs. maybe you can test it...
> > >
> > > On Tue, Mar 3, 2020 at 6:26 PM Yagyansh S. Kumar
> > > <[email protected] <javascript:>> wrote:
> > > >
> > > > I also thought about doing the same, but I am keeping that as a last
> > > resort because that would require me to push the script to all my
> 2500+
> > > servers.
> > > >
> > > > On Tuesday, March 3, 2020 at 8:46:27 PM UTC+5:30, Murali Krishna
> > > Kanagala wrote:
> > > >>
> > > >> I would write a small shell script that tries to write to the nfs
> > > mount path and writes the status to a file which can be read by the
> text
> > > file collector. And schedule that shell script cron. I think this is
> the
> > > easiest solution.
> > > >>
> > > >> On Tue, Mar 3, 2020, 9:12 AM Yagyansh S. Kumar <
> [email protected]>
> > > wrote:
> > > >>>
> > > >>> Already enabled the nfs and nfsd collectors. Till now I haven't
> found
> > > anything that can accurately give me the information about NFS hang.
> > > >>> Correct me if I am wrong, but I don't think it is a good indicator
> of
> > > NFS hang as there may be times where no activity is happening on the
> NFS,
> > > but that does not mean that NFS is hanged. (eg. I have 25 NFS mounts
> on one
> > > of my servers, some of them are used rarely, so we won't find any
> > > substantial IO on those mounts, but I need to know whether they are
> > > accessible or not). Still, thanks for the suggestion, will try it out
> once.
> > > >>>
> > > >>>
> > > >>> On Tuesday, March 3, 2020 at 8:35:03 PM UTC+5:30, Murali Krishna
> > > Kanagala wrote:
> > > >>>>
> > > >>>> Try enabling the nfs options in the node exporter config. It will
> > > spit out some metrics about the nfs status.
> > > >>>>
> > > >>>> Also look at the disk IO metrics from node exporter and if you
> see no
> > > activity which indicates the nfs is not doing anything.
> > > >>>>
> > > >>>> On Tue, Mar 3, 2020, 7:10 AM Yagyansh S. Kumar <
> [email protected]>
> > > wrote:
> > > >>>>>
> > > >>>>> I want to check if the NFS is hanged(i.e whether it is
> accessible
> > > from the server or not, and if yes then what is the response time it
> is
> > > getting). I know using the mountstats and nfs collector we have a lot
> of
> > > metrics for NFS, but haven't found any that can tell me every time the
> NFS
> > > hangs correctly.
> > > >>>>> Thanks in advance.
> > > >>>>>
> > > >>>>> --
> > > >>>>> You received this message because you are subscribed to the
> Google
> > > Groups "Prometheus Users" group.
> > > >>>>> To unsubscribe from this group and stop receiving emails from
> it,
> > > send an email to [email protected].
> > > >>>>> To view this discussion on the web visit
> > >
> https://groups.google.com/d/msgid/prometheus-users/06929518-d3b5-4c2f-9490-b08cc664d26b%40googlegroups.com
>
> > > .
> > > >>>
> > > >>> --
> > > >>> You received this message because you are subscribed to the Google
> > > Groups "Prometheus Users" group.
> > > >>> To unsubscribe from this group and stop receiving emails from it,
> send
> > > an email to [email protected].
> > > >>> To view this discussion on the web visit
> > >
> https://groups.google.com/d/msgid/prometheus-users/1dda60cc-0b20-47da-87ff-4f1c76ce076f%40googlegroups.com
>
> > > .
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google
> > > Groups "Prometheus Users" group.
> > > > To unsubscribe from this group and stop receiving emails from it,
> send
> > > an email to [email protected] <javascript:>.
> > > > To view this discussion on the web visit
> > >
> https://groups.google.com/d/msgid/prometheus-users/832f2823-eab1-4f40-8f91-ddbc00190551%40googlegroups.com
>
> > > .
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> Groups
> > > "Prometheus Users" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to [email protected] <javascript:>.
> > > To view this discussion on the web visit
> > >
> https://groups.google.com/d/msgid/prometheus-users/CAP9WWed%2BtxJVRSJc0mkCOkg6_neGAJRNEMq_hku87LPbYXAhjA%40mail.gmail.com
>
> > > .
> > >
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected] <javascript:>.
> > To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com.
>
>
>
> --
> (o- Julien Pivotto
> //\ Open-Source Consultant
> V_/_ Inuits - https://www.inuits.eu
>
--
You received this message because you are subscribed to the Google Groups
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/prometheus-users/a3e62e72-0e1e-41c3-b3eb-1b979cd50f08%40googlegroups.com.