On Sat, November 13, 2010 5:52 pm, David Balnaves wrote:
> I'm not really sure what the best indicators are of a failing hard drive.
> I've used smart on a lot of hard drives; I've seen undocumented smart
> values and even hard drives function fine for a number of years when smart
> reports they are "FAILING NOW'. I've also seen some drives enter a
> state where they wont allow further smart tests (on/offline) to be run or
> aborted. This has lead me to believe that smart as an indicator needs to
> be considered on a per model basis and run carefully within the
> capabilities of the drive. The whole process has given me more questions
> than answers.
>
> I try to detect a failure by monitoring huge changes in the smart
> attributes. I've configured munin to monitor the smart attributes; It
> wouldn't be too hard to change the plugin to monitor these values on your
> NAS (I imagine you can ssh/telnet to it). You will notice some variance
> in things like temperature and ECC, but unless they start behaving
> erratically then I wouldn't worry.
>
> Hope this helps in 'detecting and notifying' potential failures.
David, thanks
yes, I can ssh to it
I'm not very familiar with the raid utilities (beyond knowing what the
acronym stand for...)
but I get:
# mdadm --detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Sat Jun 19 04:35:02 2010
Raid Level : raid0
Array Size : 3900774400 (3720.07 GiB 3994.39 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sat Jun 19 04:35:02 2010
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Chunk Size : 64K
UUID : 79e23cd2:b3f9618d:58a8936b:5e0d814b
Events : 0.1
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 8 51 3 active sync /dev/sdd3
# mount
/proc on /proc type proc (rw)
none on /dev/pts type devpts (rw,gid=5,mode=620)
sysfs on /sys type sysfs (rw)
tmpfs on /tmp type tmpfs (rw,size=32M)
none on /proc/bus/usb type usbfs (rw)
/dev/sda4 on /mnt/ext type ext3 (rw)
/dev/md9 on /mnt/HDA_ROOT type ext3 (rw)
/dev/md0 on /share/MD0_DATA type ext4
(rw,usrjquota=aquota.user,jqfmt=vfsv0,user_xattr,data=ordered,nodelalloc)
# ls /share/MD0_DATA
ls: /share/MD0_DATA/Web: Input/output error
ls: /share/MD0_DATA/Network Recycle Bin: Input/output error
ls: /share/MD0_DATA/lost+found: Input/output error
ls: /share/MD0_DATA/Download: Input/output error
ls: /share/MD0_DATA/aquota.user: Input/output error
ls: /share/MD0_DATA/Multimedia: Input/output error
ls: /share/MD0_DATA/Usb: Input/output error
ls: /share/MD0_DATA/Recordings: Input/output error
ls: /share/MD0_DATA/Public: Input/output error
cameras/
--
Voytek
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html