Hi Voytek,
I'm not really sure what the best indicators are of a failing hard drive.
I've used smart on a lot of hard drives; I've seen undocumented smart
values and even hard drives function fine for a number of years when smart
reports they are "FAILING NOW'. I've also seen some drives enter a state
where they wont allow further smart tests (on/offline) to be run or aborted.
This has lead me to believe that smart as an indicator needs to be
considered on a per model basis and run carefully within the capabilities of
the drive. The whole process has given me more questions than answers.
I try to detect a failure by monitoring huge changes in the smart
attributes. I've configured munin to monitor the smart attributes; It
wouldn't be too hard to change the plugin to monitor these values on your
NAS (I imagine you can ssh/telnet to it). You will notice some variance in
things like temperature and ECC, but unless they start behaving erratically
then I wouldn't worry.
Hope this helps in 'detecting and notifying' potential failures.
Best Regards,
David Balnaves
-----Original Message-----
From: Voytek Eymont
Sent: Thursday, November 11, 2010 8:21 AM
To: [email protected]
Subject: [SLUG] detecting hard drive failure ?
I have a brand new QNAP NAS with 4 SATA HD as 'Striping Disk Volume: Drive
1 2 3 4', installed couple of month ago
when 1st installed, using QNAP web i/f, I've run SMART tests, all were
100%, etc
yesterday, it seems HD3 suffered total failure, if says:
---------------------
Summary HD3
Hard disk does not exist.
---------------------
(though, LCD panel says disk 4: "HD4 ejected")
I can ssh to the NAS:
- what sort of tests or whatever can I run before I pull the unit down ?
- what sort of utility can I run to 'detect and notify' should such
failure occurs again ?
# uname -a
Linux NAS01 2.6.33.2 #1 SMP Tue Sep 28 00:54:34 CST 2010 i686 unknown
# df -h
Filesystem Size Used Available Use% Mounted on
/dev/ram 124.0M 109.7M 14.3M 88% /
tmpfs 32.0M 92.0k 31.9M 0% /tmp
/dev/sda4 310.0M 160.5M 149.5M 52% /mnt/ext
/dev/md9 509.5M 41.3M 468.2M 8% /mnt/HDA_ROOT
/dev/md0 3.6T 2.5T 1.1T 69% /share/MD0_DATA
--
Voytek
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html