rebuild calculations

Ross Fri, 11 Jul 2008 00:26:17 -0700

Without checking your math, I believe you may be confusing the risk of *any* 
data corruption with the risk of a total drive failure, but I do agree that the 
calculation should just be for the data on the drive, not the whole array.


My feeling on this from the various analyses I've read on the web is that 
you're reasonably likely to find some corruption on a drive during a rebuild, 
but raid-6 protects you from this nicely.  From memory, I think the stats were 
something like a 5% chance of an error on a 500GB drive, which would mean 
something like a 10% chance with your 1TB drives.  That would tie in with your 
figures if you took out the multiplier for the whole raid's data.  Instead of a 
guaranteed failure, you've calculated around 1 in 10 odds.

So, during any rebuild you've around a 1 in 10 chance of the rebuild 
encountering *some* corruption, but that's very likely going to be just a few 
bits of data, which can be easily recovered using raid-6 and the rest of the 
rebuild can carry on as normal.  

Of course there's always a risk of a second drive failing, which is why we have 
backups, but I believe that risk is miniscule in comparison, and also offset by 
the ability to regularly scrub your data, which helps to ensure that any 
problems with drives are caught early on.  Early replacement of failing drives 
means it's far less likely that you'll ever have two fail together.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] please help with raid / failure / rebuild calculations

Reply via email to