Checking individual drive state

2006-11-05 Thread Bradshaw
I've recently built a smallish RAID5 box as a storage area for my home 
network, using mdadm. However, one of the drives will not remain in the 
array for longer that around two days before it is removed. Readding it 
to the array does not throw any errors, leading me to believe that it's 
probably a problem with the controller, which is an add-in SATA card, as 
well as the other drive connected to it failing once.


I don't know how to scan the one disk for bad sectors, stopping the 
array and doing an fsck or similar throws errors, so I need help in 
determining whether the disc itself is faulty.


If the controller is to be replaced, how would I go about migrating the 
two discs to the new controller whilst maintaining the array?


Thanks in advance

Tom Bradshaw
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Checking individual drive state

2006-11-05 Thread dean gaudet
On Sun, 5 Nov 2006, Bradshaw wrote:

 I've recently built a smallish RAID5 box as a storage area for my home
 network, using mdadm. However, one of the drives will not remain in the array
 for longer that around two days before it is removed. Readding it to the array
 does not throw any errors, leading me to believe that it's probably a problem
 with the controller, which is an add-in SATA card, as well as the other drive
 connected to it failing once.
 
 I don't know how to scan the one disk for bad sectors, stopping the array and
 doing an fsck or similar throws errors, so I need help in determining whether
 the disc itself is faulty.

try swapping the cable first.  after that swap ports with another disk and 
see if the problem follows the port or the disk.

you can see if smartctl -a (from smartmontools) tells you anything 
interesting.  (it can be quite difficult, to impossible, to understand 
smartctl -a output though.  but if you've got errors in the SMART error 
log that's a good place to start.)


 If the controller is to be replaced, how would I go about migrating the two
 discs to the new controller whilst maintaining the array?

it depends on which method you're using to assemble the array at boot 
time.  in most cases if these aren't your root disks then a swap of two 
disks won't result in any troubles reassembling the array.  other device 
renames may cause problems depending on your distribution though -- but 
generally when two devices swap names within an array you should be fine.

you'll want to do the disk swap with the array offline (either shutdown 
the box or mdadm --stop the array).

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Checking individual drive state

2006-11-05 Thread Mike Hardy


dean gaudet wrote:
 On Sun, 5 Nov 2006, Bradshaw wrote:


 I don't know how to scan the one disk for bad sectors, stopping the array and
 doing an fsck or similar throws errors, so I need help in determining whether
 the disc itself is faulty.
 
 try swapping the cable first.  after that swap ports with another disk and 
 see if the problem follows the port or the disk.
 
 you can see if smartctl -a (from smartmontools) tells you anything 
 interesting.  (it can be quite difficult, to impossible, to understand 
 smartctl -a output though.  but if you've got errors in the SMART error 
 log that's a good place to start.)

I don't think SMART output is that hard to understand.

And checking the entire drive for errors is as easy as 'smartctl -t long
/dev/drive' usually. If it is SATA as you say, you may need to put a
'-d ata' in there.

Wait for however long it says to wait, then do a 'smartctl -a
/dev/drive' and you should see the self test log at the bottom. Did it
finish? If not, there are bad sectors. If there are bad sectors, you
should google the string 'BadBlockHowTo' to see if you can clear them
(after failing the drive out of the array)

Note that this won't tell you anything about cables or controllers or
power or anything else that could and may be wrong. It's just for the
drive media and firmware.

-Mike
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html