On Wed, Feb 22, 2012 at 4:38 PM, Bill Maidment <[email protected]> wrote:

> > In (1) above, are they replying that you can't "--fail", "--remove",
> > and then "--add" the same disk or that you can't "--fail" and
> > "--remove" a disk, replace it, and then can't "--add" it because it's
> > got the same "X"/"XY" in "sdX"/"sdaXY" as the previous, failed disk?
> >
> >
>
> Now I've had my coffee fix I have got back my sanity.
> I have used the following sequence of commands to remove and re-add a disk
> to a running RAID1 array:
> mdadm /dev/md3 -f /dev/sdc1
> mdadm /dev/md3 -r /dev/sdc1
> mdadm --zero-superblock /dev/sdc1
> mdadm /dev/md3 -a /dev/sdc1
>
> It works as expected. I just found the original error message a bit
> confusing when it referred to making the disk a "spare". It would seem that
> earlier versions of the kernel did that automatically.
>
>
>
Interesting! I have mixed feeling about RAID, especially for simple RAID1
setups. I'd rather use the second drive as an rsnapshot based backup drive,
usueally in read-only mode. That allows me to recover files that I've
accidentally screwed up or deleted in the recent past, which occurs far
more often than drive failures. And it puts different wear and tear on the
hard drive: there's nothing like having all the drives in a RAID set start
failing at almost the same time, before drive replacement can
occur. This has happened to me before and is actually pretty well described
in a Google white paper at
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf
.

However, in this case, I'd tend to agreee with the idea that a RAID1 pair
should not be automatically re-activated on reboot. If one drive starts
failing, it should be kept offline until replaced, and trying to outguess
this process at boot time without intervention seems fraught.

Reply via email to