On Thu, 23 Dec 2010, [email protected] wrote:
On Thu, 23 Dec 2010, Andrew Hume wrote:
we (some folks at work) have built two backblaze boxes
(roughly speaking, a linux box with 45 2TB drives).
I missed what hardware you are using when I made my first reply
a big issue for you to think about with this box is how you want to slice
up the drives.
this hardware has a high fan-out ratio, so you cannot transfer data to all
the drives at anything close to the I/O capacity of the drives.
with 2TB drives, the rebuild time will be quite significant (if you want
to do it in the background, say 5-10% of your I/O bandwidth, you can
easily be talking about a week in rebuild time)
As a result, you need to worry about a second drive dieing before you
finish rebuilding the first one.
the big hard drive failure studies from a few years ago (google, etc) came
up with drive failure rates of ~10% chance of failure per drive per year.
this is ~1% chance of failure per month, with 45 drives you are talking
about a very significant chance of having a failure every month, if they
are all in one raid set, the chance of a second drive failing before you
finish rebuilding after the prior failure is significant, with raid6 you
would have to loose three drives at once, and that significantly improves
your chances. I calculated the odds a couple years ago, and I think they
were something on the order or 2.5% chance of a second failure during a
rebuild of 1TB drives at 10% bandwidth, but only 0.025% chance of a third
failure.
with raid 10, every block of data is on two drives, but I believe it then
distributes the stripes to balance the load rather than the drives ending
up in exact mirrors of each other. With the large number of blocks on a
2TB drive, I believe that the odds of a second drive taking down _some_
blocks that were the mirrors of the first drive approaches certinty (if
raid10 doesn't distribute the mirrored stripes and has one drive be an
exact duplicate of another drive, then the odds get better as you need to
loose both drives in a pair before you can rebuild one)
I've run 45 drive software raid arrays, and I had multiple instances of
double drive failures over a couple years of operation. As a result, with
that size of an array, I won't do anything short of double-redundancy. I
don't know if you can configure raid10 to keep 3 copies of everything or
not.
remember that these disk arrays are not high performance systems, they are
high capacity and cheap, but not high performance.
David Lang
i think i know how to deploy such a beast, but wanted to check
my understanding, which is that mdadm is the tool of choice,
and that for performance and reliability, raid10 is the sweet spot
(specificly, not RAID5).
does anyone have anything specific to say about mdadm,
and the raid it produces, either good or bad?
mdadm has by far the longest track record, with the most raid support. there
is also the dm family of drivers and tools which have some additional
features.
which raid mode you want is highly dependent on what your requirements are
and how much space you are willing to sacrafice.
raid6 is significantly more reliable than raid5, raid1, or raid10, but
suffers the same write performance issues that raid5 has (note that if you
are in something close to a read-only situation, raid6 can be just as fast as
raid10 while still being more reliable, it's what I use for my splunk
datastore for example)
raid5 and raid6 really suffer in the situation where you have lots of small,
random writes. large sequential writes have much less overhead.
David Lang
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/