Re: [prometheus-users] Checking if NFS is hanged or not using node_exporter.

Julien Pivotto Tue, 03 Mar 2020 09:18:24 -0800

Hi,

We have a dedicated job that collects disks metrics:


- job_name: node_disks
  params:
    collect[]:
    - diskstats
    - filefd
    - filesystem
    - mdadm
    - mountstats
    - nfs
    - nfsd
- job_name: node
  params:
    collect[]:
    - arp
    - bonding
    - conntrack
    - cpu
    - entropy
    - hwmon
    - infiniband
    - loadavg
    - meminfo
    - netclass
    - netdev
    - netstat
    - ntp
    - processes
    - sockstat
    - stat
    - textfile
    - time
    - timex
    - uname
    - vmstat
    - xfs

stale nfs will usually be noticed:
  up{job="node_disks"}==0 and 
label_replace(up{job="node"}==1,"job","node_disks","","")
and second rule:
  node_filesystem_avail_bytes offset 8h unless node_filesystem_avail_bytes and 
on(job, instance) up == 1


Those two expression seem to have worked fine for us in the past.


On 03 Mar 18:11, Ben Kochie wrote:
> We added some mitigation for filesystem hangs. The node_exporter will
> notice a stuck filesystem and stop attempting to gather metrics from it
> until it gets un-stuck. Although, I don't think we have any metrics for
> when that happens, only log errors.
> 
> On Tue, Mar 3, 2020 at 6:03 PM Serkan Çoban <[email protected]> wrote:
> 
> > if I remember correctly node exporter will hang too when an nfs share
> > hangs. maybe you can test it...
> >
> > On Tue, Mar 3, 2020 at 6:26 PM Yagyansh S. Kumar
> > <[email protected]> wrote:
> > >
> > > I also thought about doing the same, but I am keeping that as a last
> > resort because that would require me to push the script to all my 2500+
> > servers.
> > >
> > > On Tuesday, March 3, 2020 at 8:46:27 PM UTC+5:30, Murali Krishna
> > Kanagala wrote:
> > >>
> > >> I would write a small shell script that tries to write to the nfs
> > mount  path and writes the status to a file which can be read by the text
> > file collector. And schedule that shell script cron. I think this is the
> > easiest solution.
> > >>
> > >> On Tue, Mar 3, 2020, 9:12 AM Yagyansh S. Kumar <[email protected]>
> > wrote:
> > >>>
> > >>> Already enabled the nfs and nfsd collectors. Till now I haven't found
> > anything that can accurately give me the information about NFS hang.
> > >>> Correct me if I am wrong, but I don't think it is a good indicator of
> > NFS hang as there may be times where no activity is happening on the NFS,
> > but that does not mean that NFS is hanged. (eg. I have 25 NFS mounts on one
> > of my servers, some of them are used rarely, so we won't find any
> > substantial IO on those mounts, but I need to know whether they are
> > accessible or not). Still, thanks for the suggestion, will try it out once.
> > >>>
> > >>>
> > >>> On Tuesday, March 3, 2020 at 8:35:03 PM UTC+5:30, Murali Krishna
> > Kanagala wrote:
> > >>>>
> > >>>> Try enabling the nfs options in the node exporter config. It will
> > spit out some metrics about the nfs status.
> > >>>>
> > >>>> Also look at the disk IO metrics from node exporter and if you see no
> > activity which indicates the nfs is not doing anything.
> > >>>>
> > >>>> On Tue, Mar 3, 2020, 7:10 AM Yagyansh S. Kumar <[email protected]>
> > wrote:
> > >>>>>
> > >>>>> I want to check if the NFS is hanged(i.e whether it is accessible
> > from the server or not, and if yes then what is the response time it is
> > getting). I know using the mountstats and nfs collector we have a lot of
> > metrics for NFS, but haven't found any that can tell me every time the NFS
> > hangs correctly.
> > >>>>> Thanks in advance.
> > >>>>>
> > >>>>> --
> > >>>>> You received this message because you are subscribed to the Google
> > Groups "Prometheus Users" group.
> > >>>>> To unsubscribe from this group and stop receiving emails from it,
> > send an email to [email protected].
> > >>>>> To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/06929518-d3b5-4c2f-9490-b08cc664d26b%40googlegroups.com
> > .
> > >>>
> > >>> --
> > >>> You received this message because you are subscribed to the Google
> > Groups "Prometheus Users" group.
> > >>> To unsubscribe from this group and stop receiving emails from it, send
> > an email to [email protected].
> > >>> To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/1dda60cc-0b20-47da-87ff-4f1c76ce076f%40googlegroups.com
> > .
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups "Prometheus Users" group.
> > > To unsubscribe from this group and stop receiving emails from it, send
> > an email to [email protected].
> > > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/832f2823-eab1-4f40-8f91-ddbc00190551%40googlegroups.com
> > .
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to [email protected].
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/CAP9WWed%2BtxJVRSJc0mkCOkg6_neGAJRNEMq_hku87LPbYXAhjA%40mail.gmail.com
> > .
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/prometheus-users/CABbyFmqMKQXYNOfdr7BeFA%3Dx%3D5fY%2Bk4EQ8oprL0Wh-8SNqmvoA%40mail.gmail.com.

-- 
 (o-    Julien Pivotto
 //\    Open-Source Consultant
 V_/_   Inuits - https://www.inuits.eu

-- 
You received this message because you are subscribed to the Google Groups 
"Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prometheus-users/20200303171756.GA15634%40oxygen.

signature.asc
Description: PGP signature

Re: [prometheus-users] Checking if NFS is hanged or not using node_exporter.

Reply via email to