Thanks guys, I did backup, and rebuild on new disks. The 3rd/Final disk is now rebuilding, so I guess I should be happy. About the one error every 13TB, I guess we'll have to wait for btrfs :)
more seriously now, I should have the raid-5 array of new disks as my primary storage. And the older 3 disks, just lying there. I would like to use the old ones as backup for the new one. My plan is to assemble to old disks into a new raid-5 unit, and then daily LVM-snapshot the primary unit, then 'dump' file systems onto the backup raid unit. Any better suggestions ? Actually I don't know if I should be making the backup unit raid-5, knowing that raid-5 is slow for writing! Kostas, about your performance problems, make sure you have the latest firmware, AFAIK it mentions performance improvements for ext3 specifically. I was too scared to re-flash my card though ;) Regards On Sat, May 3, 2008 at 2:22 PM, Kostas Georgiou <[EMAIL PROTECTED]> wrote: > On Thu, May 01, 2008 at 09:35:19PM +0300, Ahmed Kamal wrote: > > > Hello, > > I'm working on a server with a 3w-9550SX controller, with 3x500G disks > in a > > raid-5 and 1x500G hot spare. One night, a disk fails, and the server > > crashes! Working on the server, I see that many filesystems were > destroyed > > beyond repair!! This was too bad to hear. Some LVM volumes were > repaired, > > others were restored from backup. The bad disk was removed. I learnt > that > > 3ware controllers aren't really high quality, and they probably corrupt > the > > FSs. > > > > Since all disks are same age, I thought I'd buy new disks to replace the > old > > ones. I bought 4x500G barracuda-ES drives, which should be high quality. > > Here lies my problem. I need to replace the 3 running disks, with 3 new > > disks, and add an extra one as hot spare. I am scared to do that, > because > > the standard way is to "fail" a disk, and rebuild on a new one, then > repeat > > for the other 2 disks till all 3 are replaced. Now this puts me in a > > vulnerable situation, if I "fail" a disk, and while rebuilding another > disk > > naturally fails, all data is gone! Is there any other "wise" way to do > what > > I want safely ? I contacted 3w support, and they just insist I should > > fail/rebuild, but since I don't have much faith in their controllers or > the > > old disks ... any smarter way to do this ? > > I have some 3ware controllers as well and while I can't say that they > are the best (performance is horrible in many cases) I never lost any > data unless I had two dead disks in RAID5. > > The most common reason for a rebuild to fail is if any of your remaining > disks in the raid have a fault (bad blocks). The best way to deal with > this is to have the 3ware card to run a verify task every few days to > deal with problems like this. Also have smartd running to monitor the > disks > so you get a warning. > > So before your rebuilds, *backup* your data if you can. Run a verify > task so the controller/disks have a chance to correct any exsiting > errors, check with smartctl your disks and start with the one with the > most bad blocks (if any). > > Sadly most SATA disks have an unrecoverable read error rate of 1/10^14 > or so which means that statistically you'll get one error every ~13TB > that you read. So during every rebuild you'll have a 1/13 chance to > loose a block whatever you do :( > > Cheers, > Kostas > > _______________________________________________ > rhelv5-list mailing list > [email protected] > https://www.redhat.com/mailman/listinfo/rhelv5-list >
_______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
