Hi all

I have a simple 2 disk RAID 1 array which has become corrupted by a faulty memory module.

If I repeatedly generate an MD5 hash on the same file, I consistantly get 1 of 2 values back, roughly alternating, so I assume that the 2 disks have different versions of the same file and they are accessed more-or-less alternately. 'raidclt -s' tells me that all is well with the array. It appears that the likelyhood of corruption is greater with larger files - >approx 1/2 gig are pretty much all corrupt while small files are pretty much all ok. All this sounds reasonable under the circumstances.

My idea on recovering as much as possible was to disconnect 1 drive, copy all the data off, switch to the other drive and do the same, then run an anaysis on the 2 copies - if a file is the same on both copys, the it's probably ok, if they differ, then one or both will be bad.

So, I did the first copy, but when I swap to the other disk, RAIDFrame has remembered that this has 'failed' so will not configure it into the set (as I feared it would(nt)).

Does anyone know how I can tell RAIDFrame that the first drive is actually ok, or is my reasoning just nonsense anyway?
What would a parity re-write do in this case?

Ironicaly this computer is in the process of being configured as backup storage, so while I have the originals of most of the data, there is some that I dont, and I haven't yet set up the secondary (off site) backups. And yes I did test the backups were ok, the first ones at least. It appears the module failed some time during the process. I know, I should have been anal and checked every single one, but it was all brand new hardware ...
Actually, that's when failure rates are high.


paulm

Reply via email to