Hi Voytek,

I'm not really sure what the best indicators are of a failing hard drive. I've used smart on a lot of hard drives; I've seen undocumented smart values and even hard drives function fine for a number of years when smart reports they are "FAILING NOW'. I've also seen some drives enter a state where they wont allow further smart tests (on/offline) to be run or aborted. This has lead me to believe that smart as an indicator needs to be considered on a per model basis and run carefully within the capabilities of the drive. The whole process has given me more questions than answers.

I try to detect a failure by monitoring huge changes in the smart attributes. I've configured munin to monitor the smart attributes; It wouldn't be too hard to change the plugin to monitor these values on your NAS (I imagine you can ssh/telnet to it). You will notice some variance in things like temperature and ECC, but unless they start behaving erratically then I wouldn't worry.

Hope this helps in 'detecting and notifying' potential failures.

Best Regards,
David Balnaves

-----Original Message----- From: Voytek Eymont
Sent: Thursday, November 11, 2010 8:21 AM
To: [email protected]
Subject: [SLUG] detecting hard drive failure ?

I have a brand new QNAP NAS with 4 SATA HD as 'Striping Disk Volume: Drive
1 2 3 4', installed couple of month ago

when 1st installed, using QNAP web i/f, I've run SMART tests, all were
100%, etc

yesterday, it seems HD3 suffered total failure, if says:

---------------------
Summary HD3
Hard disk does not exist.
---------------------
(though, LCD panel says disk 4: "HD4 ejected")

I can ssh to the NAS:

- what sort of tests or whatever can I run before I pull the unit down ?

- what sort of utility can I run to 'detect and notify' should such
failure occurs again ?

# uname -a
Linux NAS01 2.6.33.2 #1 SMP Tue Sep 28 00:54:34 CST 2010 i686 unknown

# df -h
Filesystem                Size      Used Available Use% Mounted on
/dev/ram                124.0M    109.7M     14.3M  88% /
tmpfs                    32.0M     92.0k     31.9M   0% /tmp
/dev/sda4               310.0M    160.5M    149.5M  52% /mnt/ext
/dev/md9                509.5M     41.3M    468.2M   8% /mnt/HDA_ROOT
/dev/md0                  3.6T      2.5T      1.1T  69% /share/MD0_DATA



--
Voytek

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to