Re: RAID10 question

2015-12-31 Thread Duncan
Hugo Mills posted on Thu, 31 Dec 2015 11:51:53 + as excerpted:

> On Thu, Dec 31, 2015 at 09:52:16AM +, Xavier Romero wrote:
>> Hello,
>> 
>> I have 2 completely independent set of 12 disks each, let's name them
>> A1, A2, A3... A12 for first set, and B1, B2, B3...B12 for second set.
>> For availability purposes I want disks to be paired that way:
>> A1 <--> B1: RAID1 A2 <--> B2: RAID1 ...
>> A12 <--> B12: RAID1
>> 
>> And then I want a RAID0 out of all these RAID1.
>> 
>> I know I can achieve that by doing all the RAID1 with MD and then build
>> the RAID0 with BTRFS. But my question is: can I achieve that directly
>> with BTRFS RAID10?
> 
>No, not at the moment.

Additionally, if you're going to put btrfs on mdraid, then you may wish 
to consider reversing the above, doing raid01, which while ordinarily 
discouraged in favor of raid10, has some things going for it when the top 
level is btrfs, that raid10 doesn't.

The btrfs feature in question here is data and metadata checksumming and 
file integrity.  Btrfs normally checksums all data and metadata and 
verifies checksums at read-time, but when there's only one copy, as is 
the case with btrfs single and raid0 modes, if there's a checksum verify 
failure, all it can do is report it and fail the read.  If however, 
there's a second copy, as there is with btrfs raid1, then a checksum 
failure on the first copy will automatically failover to trying the 
second.  Assuming the second copy is good, it will use that instead of 
failing the read, and btrfs scrub can be used to systematically scrub and 
detect (if single/raid0 mode) or repair (if raid1/10 mode and the other 
copy is good) the entire filesystem.

Mdraid doesn't have that sort of integrity verification.  All it does 
with raid1 scrub is check that the copies agree, and pick an arbitrary 
copy to replace the other one with if they don't.  But for all it or you 
know, it can be replacing the good copy with the bad one, since it has no 
checksum verification to tell which is actually the good copy.

If that sort of data integrity verification and repair is of interest to 
you, you obviously want btrfs raid1, not mdraid1.  But btrfs, as the 
filesystem, must be the top layer.  So while raid10 is normally preferred 
over raid01, in this case, you may want to do raid01, putting the btrfs 
raid1 on top of the mdraid0.

Unfortunately that won't let you do a1 <-> b1, a2 <-> b2, etc.  But it 
will let you do a[1-6] <-> b[1-6], if that's good enough for your use-
case.

IOW, you have to choose between btrfs raid1 with data integrity repair on 
top, with only two mdraid0's underneath, or btrfs raid0 with only data 
integrity detection, not repair, on top, and a bunch of mdraid1 that 
don't have data integrity at all, underneath.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID10 question

2015-12-31 Thread Hugo Mills
On Thu, Dec 31, 2015 at 09:52:16AM +, Xavier Romero wrote:
> Hello,
> 
> I have 2 completely independent set of 12 disks each, let's name them A1, A2, 
> A3... A12 for first set, and B1, B2, B3...B12 for second set. For 
> availability purposes I want disks to be paired that way:
> A1 <--> B1: RAID1
> A2 <--> B2: RAID1
> ...
> A12 <--> B12: RAID1
> 
> And then I want a RAID0 out of all these RAID1.
> 
> I know I can achieve that by doing all the RAID1 with MD and then build the 
> RAID0 with BTRFS. But my question is: can I achieve that directly with BTRFS 
> RAID10?

   No, not at the moment.

   Hugo.

-- 
Hugo Mills | Comic Sans goes into a bar, and the barman says, "We
hugo@... carfax.org.uk | don't serve your type here."
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: RAID10 question

2015-12-31 Thread Chris Murphy
On Thu, Dec 31, 2015 at 6:31 AM, Duncan <1i5t5.dun...@cox.net> wrote:

> Additionally, if you're going to put btrfs on mdraid, then you may wish
> to consider reversing the above, doing raid01, which while ordinarily
> discouraged in favor of raid10, has some things going for it when the top
> level is btrfs, that raid10 doesn't.

Yes, although it's a fine line how to create such a large volume of so
many drives. If you use many drives per raid0, when there is a failure
it takes a long time to rebuild. If you use few drives per raid0, fast
rebuild, but the exposure/risk with a 2nd failure is higher. e.g. two
extremes:

12x raid0 "bank A" and 12x raid0 "bank B"

If one drive dies, an entire bank is gone, and it's a long rebuild,
but if a 2nd drive dies, nearly 50/50 chance it dies in the same
already dead bank.

2x raid0 "bank A"  and 2x raid0 "bank C" and  through "bank L"

If one drive dies in bank A, then A is gone, short rebuild time, but
if a 2nd drive dies, almost certainly it will not be the 2nd bank A
drive, meaning it's in another bank and that means the whole array is
mortally wounded. Depending on what's missing and what needs to be
accessed, it might work OK for seconds, minutes, or hours, and then
totally implode. There's no way to predict it in advance.

Anyway, I'd sooner go with 3x raid5, or 6x raid6, and then pool them
with glusterfs. Even with a single node using replication only for the
separate raid5 bricks is more reliable than a 24x raid10 no matter
md+xfs or btrfs. That makes it effectively a raid 51. And if half the
storage is put on another node, now you have power supply and some
network redundancy too.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html