Re: I need to P. are we almost there yet?

2015-01-03 Thread Duncan
Bob Marley posted on Sat, 03 Jan 2015 12:34:41 +0100 as excerpted:

 On 29/12/2014 19:56, sys.syphus wrote:
 specifically (P)arity. very specifically n+2. when will raid5  raid6
 be at least as safe to run as raid1 currently is? I don't like the idea
 of being 2 bad drives away from total catastrophe.

 (and yes i backup, it just wouldn't be fun to go down that route.)
 
 What about using btrfs on top of MD raid?

The problem with that is data integrity.  mdraid doesn't have it.  btrfs 
does.

If you present a single mdraid device to btrfs and run single mode on it, 
and one copy on the mdraid is corrupt, mdraid may well simply present it 
as it does no integrity checking.  btrfs will catch and reject that, but 
because it sees a single device, it'll think the entire thing is corrupt.

If you present multiple devices to btrfs and run btrfs raid1 mode, it'll 
have a second copy to check, but if a bad copy exists on each side and 
that's the copy mdraid hands btrfs, again, btrfs will reject it, having 
no idea there's actually a good copy on the mdraid underneath; the mdraid 
simply didn't happen to pick that copy to present.

And mdraid-5/6 doesn't make things any better, because unless there's a 
problem, mdraid will simply read and present the data, ignoring the 
parity with which it could probably correct the bad data (at least with 
raid6).

The only way to get truly verified data with triple-redundancy or 2X 
parity or better is when btrfs handles it, as it keeps and actually 
checks checksums to verify.

But btrfs raid56 mode should be complete with kernel 3.19 and presumably 
btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure.
N-way-mirroring (my particular hotly awaited feature) is next up, but 
given the time raid56 took, I don't think anybody's predicting when it'll 
be actually in-tree and ready for use.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Bob Marley

On 03/01/2015 14:11, Duncan wrote:

Bob Marley posted on Sat, 03 Jan 2015 12:34:41 +0100 as excerpted:


On 29/12/2014 19:56, sys.syphus wrote:

specifically (P)arity. very specifically n+2. when will raid5  raid6
be at least as safe to run as raid1 currently is? I don't like the idea
of being 2 bad drives away from total catastrophe.

(and yes i backup, it just wouldn't be fun to go down that route.)

What about using btrfs on top of MD raid?

The problem with that is data integrity.  mdraid doesn't have it.  btrfs
does.

If you present a single mdraid device to btrfs and run single mode on it,
and one copy on the mdraid is corrupt, mdraid may well simply present it
as it does no integrity checking.  btrfs will catch and reject that, but
because it sees a single device, it'll think the entire thing is corrupt.


Which is really not bad, considering the chance that something gets corrupt.
Already it is an exceedingly rare event. Detection without correction 
can be more than enough. Since always things have worked in the computer 
science field without even the detection feature.
Most likely even your bank account and mine are held in databases which 
are located in filesystems or blockdevices which do not even have the 
corruption detection feature.
And, last but not least, as of now a btrfs bug is more likely than hard 
disks' silent data corruption.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread sys.syphus

 But btrfs raid56 mode should be complete with kernel 3.19 and presumably
 btrfs-progs 3.19 tho I'd give it a kernel or two to mature to be sure.
 N-way-mirroring (my particular hotly awaited feature) is next up, but
 given the time raid56 took, I don't think anybody's predicting when it'll
 be actually in-tree and ready for use.


is that the feature where you say i want x copies of this file and y
copies of this other file? e.g. raid at the file level, with the
ability to adjust redundancy by file?

I wonder if there is any sort of bandaid you can put on top of btrfs
to give some of this redundancy. things exist like git annex, but i
don't love it's bugs and oddball selection of programming language.

Do you guys use any other open source tools on top of btrfs to help
manage your data? (i.e. git annex; camlistore)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Bob Marley

On 29/12/2014 19:56, sys.syphus wrote:

specifically (P)arity. very specifically n+2. when will raid5  raid6
be at least as safe to run as raid1 currently is? I don't like the
idea of being 2 bad drives away from total catastrophe.

(and yes i backup, it just wouldn't be fun to go down that route.)


What about using btrfs on top of MD raid?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread sys.syphus

 Which is really not bad, considering the chance that something gets corrupt.
 Already it is an exceedingly rare event. Detection without correction can be
 more than enough. Since always things have worked in the computer science
 field without even the detection feature.
 Most likely even your bank account and mine are held in databases which are
 located in filesystems or blockdevices which do not even have the corruption
 detection feature.
 And, last but not least, as of now a btrfs bug is more likely than hard
 disks' silent data corruption.



I think thats dangerous thinking and what has gotten us here.

The whole point of zfs / btrfs is that due to the current size of
storage, what was previously unlikely is now a statistical certitude.
In short, Murphy's law.

We are now using green drives and s3 fuse and shitty flash media, the
era of trusting the block device is over.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Roman Mamedov
On Sat, 3 Jan 2015 13:11:57 + (UTC)
Duncan 1i5t5.dun...@cox.net wrote:

  What about using btrfs on top of MD raid?
 
 The problem with that is data integrity.  mdraid doesn't have it.  btrfs 
 does.

Most importantly however, you aren't any worse off with Btrfs on top of MD,
than with Btrfs on a single device, or with Ext4/XFS/JFS/etc on top of MD.

Sure you don't get checksum-based recovery from partial corruption of a RAID,
but you do get other features of Btrfs, such as robust snapshot support,
ability to online-resize up and down, compression, and actually, checksum
verification: even if it won't be able to recover from a corruption, at least
it will warn you of it (and you could recover from backups), while other FSes
will pass through the corrupted data silently.

So until Btrfs multi-device support is feature-complete (and yes that includes
performance-wise), running Btrfs in single-device mode on top of MD RAID is
arguably the most optimal way to use Btrfs in a RAID setup.

(Personally I am running Btrfs on top of 7x2TB MD RAID6, 3x2TB MD RAID5 and
2x2TB MD RAID1).

-- 
With respect,
Roman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Duncan
sys.syphus posted on Sat, 03 Jan 2015 12:55:27 -0600 as excerpted:

 But btrfs raid56 mode should be complete with kernel 3.19 and
 presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature
 to be sure. N-way-mirroring (my particular hotly awaited feature) is
 next up, but given the time raid56 took, I don't think anybody's
 predicting when it'll be actually in-tree and ready for use.


 is that the feature where you say i want x copies of this file and y
 copies of this other file? e.g. raid at the file level, with the ability
 to adjust redundancy by file?

Per-file isn't available yet, tho at least per-subvolume is roadmapped, 
and now that we have the properties framework working via xattr for files 
as well, at least in theory, there is AFAIK no reason to limit it to per-
subvolume, as per-file should be about as easy once the code that 
currently limits it to per-filesystem is rewritten.

But actually fully working per-filesystem raid56 is enough for a lot of 
people, and actually working per-filesystem N-way-mirroring is what I'm 
after, since I already setup multiple filesystems in ordered to keep my 
data eggs from all being in the same filesystem basket.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Duncan
Roman Mamedov posted on Sun, 04 Jan 2015 02:58:35 +0500 as excerpted:

 On Sat, 3 Jan 2015 13:11:57 + (UTC)
 Duncan 1i5t5.dun...@cox.net wrote:
 
  What about using btrfs on top of MD raid?
 
 The problem with that is data integrity.  mdraid doesn't have it. 
 btrfs does.
 
 Most importantly however, you aren't any worse off with Btrfs on top of
 MD, than with Btrfs on a single device, or with Ext4/XFS/JFS/etc on top
 of MD.

Good point! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-03 Thread Hugo Mills
On Sun, Jan 04, 2015 at 03:22:53AM +, Duncan wrote:
 sys.syphus posted on Sat, 03 Jan 2015 12:55:27 -0600 as excerpted:
 
  But btrfs raid56 mode should be complete with kernel 3.19 and
  presumably btrfs-progs 3.19 tho I'd give it a kernel or two to mature
  to be sure. N-way-mirroring (my particular hotly awaited feature) is
  next up, but given the time raid56 took, I don't think anybody's
  predicting when it'll be actually in-tree and ready for use.
 
 
  is that the feature where you say i want x copies of this file and y
  copies of this other file? e.g. raid at the file level, with the ability
  to adjust redundancy by file?
 
 Per-file isn't available yet, tho at least per-subvolume is roadmapped, 
 and now that we have the properties framework working via xattr for files 
 as well, at least in theory, there is AFAIK no reason to limit it to per-
 subvolume, as per-file should be about as easy once the code that 
 currently limits it to per-filesystem is rewritten.

   roadmapped -- fond wish.

   Also, per-file is a bit bloody awkward to get working. Having sat
and thought about it hard for a while, I'm not convinced that it would
actually be worth the implementation effort.

   Certainly, nobody should be thinking about having (say) a different
RAID config for every file -- that way lies madness. I would expect,
at most, small integers (=3) of different profiles for data in any
given filesystem, with the majority of data being of one particular
profile. Anything trying to get more spohisticated than that is likely
asking for intractable space-allocation problems. Think, requiring
regular full-balance operations.

   The behaviour of the chunk allocator in the presence of merely two
allocation profiles (data/metadata) is awkward enough. Introducing
more of them is something that will require a separate research
programme to understand fully.

   I will probably have an opportunity to discuss the basics of
multiple allocations schemes with someone more qualified than I am on
Tuesday, but I doubt that we'll reach any firm conclusion for many
months at best (if ever). The formal maths involved gets quite nasty,
quite quickly.

   Hugo.

 But actually fully working per-filesystem raid56 is enough for a lot of 
 people, and actually working per-filesystem N-way-mirroring is what I'm 
 after, since I already setup multiple filesystems in ordered to keep my 
 data eggs from all being in the same filesystem basket.
 

-- 
Hugo Mills | If it's December 1941 in Casablanca, what time is it
hugo@... carfax.org.uk | in New York?
http://carfax.org.uk/  |
PGP: 65E74AC0  |   Rick Blaine, Casablanca


signature.asc
Description: Digital signature


Re: I need to P. are we almost there yet?

2015-01-02 Thread Austin S Hemmelgarn

On 2014-12-31 12:27, ashf...@whisperpc.com wrote:

Phillip


I had a similar question a year or two ago (
specifically about raid10  ) so I both experimented and read the code
myself to find out.  I was disappointed to find that it won't do
raid10 on 3 disks since the chunk metadata describes raid10 as a
stripe layered on top of a mirror.

Jose's point was also a good one though; one chunk may decide to
mirror disks A and B, so a failure of A and C it could recover from,
but a different chunk could choose to mirror on disks A and C, so that
chunk would be lost if A and C fail.  It would probably be nice if the
chunk allocator tried to be more deterministic about that.


I see this as a CRITICAL design flaw.  The reason for calling it CRITICAL
is that System Administrators have been trained for 20 years that RAID-10
can usually handle a dual-disk failure, but the BTRFS implementation has
effectively ZERO chance of doing so.
No, some rather simple math will tell you that a 4 disk BTRFS filesystem 
in raid10 mode has exactly a 50% chance of surviving a dual disk 
failure, and that as the number of disks goes up, the chance of survival 
will asymptotically approach 100% (but never reach it).
This is the case for _every_ RAID-10 implementation that I have ever 
seen, including hardware raid controllers; the only real difference is 
in the stripe length (usually 512 bytes * half the number of disks for 
hardware raid, 4k * half the number of disks for software raid, and the 
filesystem block size (default is 16k in current versions) * half the 
number of disks for BTRFS).





smime.p7s
Description: S/MIME Cryptographic Signature


Re: I need to P. are we almost there yet?

2015-01-02 Thread Austin S Hemmelgarn

On 2015-01-02 12:45, Brendan Hide wrote:

On 2015/01/02 15:42, Austin S Hemmelgarn wrote:

On 2014-12-31 12:27, ashf...@whisperpc.com wrote:

I see this as a CRITICAL design flaw.  The reason for calling it
CRITICAL
is that System Administrators have been trained for 20 years that
RAID-10
can usually handle a dual-disk failure, but the BTRFS implementation has
effectively ZERO chance of doing so.

No, some rather simple math

That's the problem. The math isn't as simple as you'd expect:

The example below is probably a pathological case - but here goes. Let's
say in this 4-disk example that chunks are striped as d1,d2,d1,d2 where
d1 is the first bit of data and d2 is the second:
Chunk 1 might be striped across disks A,B,C,D d1,d2,d1,d2
Chunk 2 might be striped across disks B,C,A,D d3,d4,d3,d4
Chunk 3 might be striped across disks D,A,C,B d5,d6,d5,d6
Chunk 4 might be striped across disks A,C,B,D d7,d8,d7,d8
Chunk 5 might be striped across disks A,C,D,B d9,d10,d9,d10

Lose any two disks and you have a 50% chance on *each* chunk to have
lost that chunk. With traditional RAID10 you have a 50% chance of losing
the array entirely. With btrfs, the more data you have stored, the
chances get closer to 100% of losing *some* data in a 2-disk failure.

In the above example, losing A and B means you lose d3, d6, and d7
(which ends up being 60% of all chunks).
Losing A and C means you lose d1 (20% of all chunks).OK
Losing A and D means you lose d9 (20% of all chunks).
Losing B and C means you lose d10 (20% of all chunks).
Losing B and D means you lose d2 (20% of all chunks).
Losing C and D means you lose d4,d5, AND d8 (60% of all chunks)

The above skewed example has an average of 40% of all chunks failed. As
you add more data and randomise the allocation, this will approach 50% -
BUT, the chances of losing *some* data is already clearly shown to be
very close to 100%.

OK, I forgot about the randomization effect that the chunk allocation 
and freeing has.  We really should slap a *BIG* warning label on that 
(and ideally find some better way to do it so it's more reliable).


As an aside, I've found that a BTRFS raid1 set on top of 2 LVM/MD RAID0 
sets is actually faster than using a BTRFS raid10 set with the same 
number of disks (how much faster is workload dependent), and provides 
better guarantees than a BTRFS raid10 set.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: I need to P. are we almost there yet?

2015-01-02 Thread Brendan Hide

On 2015/01/02 15:42, Austin S Hemmelgarn wrote:

On 2014-12-31 12:27, ashf...@whisperpc.com wrote:
I see this as a CRITICAL design flaw.  The reason for calling it 
CRITICAL
is that System Administrators have been trained for 20 years that 
RAID-10

can usually handle a dual-disk failure, but the BTRFS implementation has
effectively ZERO chance of doing so.

No, some rather simple math

That's the problem. The math isn't as simple as you'd expect:

The example below is probably a pathological case - but here goes. Let's 
say in this 4-disk example that chunks are striped as d1,d2,d1,d2 where 
d1 is the first bit of data and d2 is the second:

Chunk 1 might be striped across disks A,B,C,D d1,d2,d1,d2
Chunk 2 might be striped across disks B,C,A,D d3,d4,d3,d4
Chunk 3 might be striped across disks D,A,C,B d5,d6,d5,d6
Chunk 4 might be striped across disks A,C,B,D d7,d8,d7,d8
Chunk 5 might be striped across disks A,C,D,B d9,d10,d9,d10

Lose any two disks and you have a 50% chance on *each* chunk to have 
lost that chunk. With traditional RAID10 you have a 50% chance of losing 
the array entirely. With btrfs, the more data you have stored, the 
chances get closer to 100% of losing *some* data in a 2-disk failure.


In the above example, losing A and B means you lose d3, d6, and d7 
(which ends up being 60% of all chunks).

Losing A and C means you lose d1 (20% of all chunks).
Losing A and D means you lose d9 (20% of all chunks).
Losing B and C means you lose d10 (20% of all chunks).
Losing B and D means you lose d2 (20% of all chunks).
Losing C and D means you lose d4,d5, AND d8 (60% of all chunks)

The above skewed example has an average of 40% of all chunks failed. As 
you add more data and randomise the allocation, this will approach 50% - 
BUT, the chances of losing *some* data is already clearly shown to be 
very close to 100%.


--
__
Brendan Hide
http://swiftspirit.co.za/
http://www.webafrica.co.za/?AFF1E97

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2015-01-01 Thread Duncan
Roger Binns posted on Thu, 01 Jan 2015 12:12:31 -0800 as excerpted:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 12/31/2014 05:26 PM, Chris Samuel wrote:
 I suspect this is a knock-on effect of the fact that (unless this has
 changed recently  IIRC) RAID-1 with btrfs will only mirrors data over
 two drives, no matter how many you add to an array.

It hasn't changed yet, but now that raid56 support is basically complete 
(with 3.19, other than bugs of course, it'll be another kernel cycle or 
two before I'd rely on it), that's next up on the raid-features roadmap. 
=:^)

I know as that's my most hotly anticipated roadmapped btrfs feature yet 
to hit, and I've been waiting for it, patiently only because I didn't 
have much choice, for a couple years now.

 I wish btrfs wouldn't use the old school micro-managing storage
 terminology (or only as aliases) and instead let you set the goals. What
 people really mean is that they want their data to survive the failure
 of N drives - exactly how that is done doesn't matter.  It would also be
 nice to be settable as an xattr on files and directories.

Actually, a more flexible terminology has been discussed, and /might/ 
actually be introduced either along with or prior to the multi-way-
mirroring feature (depending on how long the latter takes to develop, I'd 
guess).  The suggested terminology would basically treat number of data 
strips, mirrors, parity, hot-spares, etc, each on its own separate axis, 
with parity levels ultimately extended well beyond 2 (aka raid6) as well 
-- I think to something like a dozen or 16.

Obviously if it's introduced before N-way-mirroring, N-way-parity, etc, 
it would only support the current feature set for now, and would just be 
a different way of configuring mkfs as well as displaying the current 
layouts in btrfs filesystem df and usage.

Hugo's the guy who has proposed that, and has been doing the preliminary 
patch development.

Meanwhile, ultimately the ability to configure all this at least by 
subvolume is planned, and once it's actually possible to set it on less 
than a full filesystem basis, setting it by individual xattr has been 
discussed as well.  I think the latter depends on the sorts of issues 
they run into in actual implementation.

Finally, btrfs is already taking the xattr/property route with this sort 
of attribute.  The basic infrastructure for that went in a couple kernel 
cycles ago, and can be seen and worked with using the btrfs property 
command.  So the basic property/xattr infrastructure is already there, 
and the ability to configure redundancy per subvolume already built into 
the original btrfs design and roadmapped altho it's not yet implemented, 
which means it's actually quite likely to eventually be configurable by 
file via xattr/properties as well -- emphasis on /eventually/, as these 
features /do/ tend to take rather longer to actually develop and 
stabilize than originally predicted.  The raid56 code is a good example, 
as it was originally slated for kernel cycle 3.6 or so, IIRC, but it took 
it over two years to cook and we're finally getting it in 3.19!

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-31 Thread ashford
Phillip

 I had a similar question a year or two ago (
 specifically about raid10  ) so I both experimented and read the code
 myself to find out.  I was disappointed to find that it won't do
 raid10 on 3 disks since the chunk metadata describes raid10 as a
 stripe layered on top of a mirror.

 Jose's point was also a good one though; one chunk may decide to
 mirror disks A and B, so a failure of A and C it could recover from,
 but a different chunk could choose to mirror on disks A and C, so that
 chunk would be lost if A and C fail.  It would probably be nice if the
 chunk allocator tried to be more deterministic about that.

I see this as a CRITICAL design flaw.  The reason for calling it CRITICAL
is that System Administrators have been trained for 20 years that RAID-10
can usually handle a dual-disk failure, but the BTRFS implementation has
effectively ZERO chance of doing so.

According to every description of RAID-10 I've ever seen (including
documentation from MaxStrat), RAID-10 stripes mirrored pairs/sets of
disks.  The device-level description is a critical component of what makes
an array RAID-10, and is the reason for many of the attributes of
RAID-10.  This is NOT what BTRFS has implemented.

While BTRFS may be distributing the chunks according to a RAID-10
methodology, that is NOT what the industry considers to be RAID-10.  While
the current methodology has the data replication of RAID-10, and it may
have the performance of RAID-10, it absolutely DOES NOT have the
robustness or uptime benefits that are expected of RAID-10.

In order to remove this potentially catestrophic confusion, BTRFS should
either call their RAID-10 implementation something else, or they should
adhere to the long-established definition of RAID-10.

Peter Ashford

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-31 Thread Chris Samuel
On Wed, 31 Dec 2014 09:27:14 AM ashf...@whisperpc.com wrote:

 I see this as a CRITICAL design flaw.  The reason for calling it CRITICAL
 is that System Administrators have been trained for 20 years that RAID-10
 can usually handle a dual-disk failure, but the BTRFS implementation has
 effectively ZERO chance of doing so.

I suspect this is a knock-on effect of the fact that (unless this has changed 
recently  IIRC) RAID-1 with btrfs will only mirrors data over two drives, no 
matter how many you add to an array.

-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-30 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/29/2014 7:20 PM, ashf...@whisperpc.com wrote:
 Just some background data on traditional RAID, and the chances of
 survival with a 2-drive failure.
 
 In traditional RAID-10, the chances of surviving a 2-drive failure
 is 66% on a 4-drive array, and approaches 100% as the number of
 drives in the array increase.
 
 In traditional RAID-0+1 (used to be common in low-end fake-RAID
 cards), the chances of surviving a 2-drive failure is 33% on a
 4-drive array, and approaches 50% as the number of drives in the
 array increase.

In terms of data layout, there is really no difference between raid-10
( or raid1+0 ) and raid0+1, aside from the designation you assign to
each drive.  With a dumb implementation of 0+1, any single drive
failure offlines the entire stripe, discarding the remaining good
disks in it, thus giving the probability you describe as the only
possible remaining failure(s) that do not result in the mirror also
failing is a drive in the same stripe as the original.  This however,
is only a deficiency of the implementation, not the data layout, as
all of the data on the first failed drive could be recovered from a
drive in the second stripe, so long as the second drive that failed
was any drive other than the one holding the duplicate data of the first.

This is partly why I agree with linux mdadm that raid10 is *not*
simply raid1+0; the latter is just a naive, degenerate implementation
of the former.

 In traditional RAID-1E, the chances of surviving a 2-drive failure
 is 66% on a 4-drive array, and approaches 100% as the number of
 drives in the array increase.  This is the same as for RAID-10.
 RAID-1E allows an odd number of disks to be actively used in the
 array.

What some vendors have called 1E is simply raid10 in the default
near layout to mdadm.  I prefer the higher performance offset
layout myself.

 I'm wondering which of the above the BTRFS implementation most
 closely resembles.

Unfortunately, btrfs just uses the naive raid1+0, so no 2 or 3 disk
raid10 arrays, and no higher performing offset layout.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUoxyuAAoJENRVrw2cjl5R72oH/1nypXV72Bk4PBeaGAwH7559
lL6JH80216lbhv8hHopIeXKe7uqPGFAE5F1ArChIi08HA+CqKr5cfPNzJPlobyFj
KNLzeXi+wnJO2mbvWnnJak83GVmvpBnYvS+22RCweDELCb3pulybleJnN4yVSL25
WpVfUGnAg5lQJdX2l6THeClWX6V47NKqD6iXbt9+jyADCK2yk/5+TVbS8tixFUtj
PBxe+XGNrkTREnPAAFy6BgwO2vCD92F6+mm/lHJ0fg7gOm41UE09gzabsCGQ9LFA
kk99c9WAnJdkTqUJVw49MEwmmhs/2gluKWTeaHONpBePoFIpQEjHI89TqBsKhY4=
=+oed
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-30 Thread ashford
 Phillip Susi wrote:

 I'm wondering which of the above the BTRFS implementation most
 closely resembles.

 Unfortunately, btrfs just uses the naive raid1+0, so no 2 or 3 disk
 raid10 arrays, and no higher performing offset layout.

 Jose Manuel Perez Bethencourt wrote:

 I think you are missing crucial info on the layout on disk that BTRFS
 implements. While a traditional RAID1 has a rigid layout that has
 fixed and easily predictable locations for all data (exactly on two
 specific disks), BTRFS allocs chunks as needed on ANY two disks.
 Please research into this to understand the problem fully, this is the
 key to your question.

There is a HUGE difference here.  In the first case, the data will have a
50% chance of surviving a 2-drive failure.  In the second case, the data
will have an effectively 0% chance of surviving a 2-drive failure.  I
don't believe I need to mention which of the above is more reliable, or
which I would prefer.

I believe that someone who understands the code in depth (and that may
also be one of the people above) determine exactly how BTRFS implements
RAID-10.

Thank you.

Peter Ashford

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-30 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 12/30/2014 06:17 PM, ashf...@whisperpc.com wrote:
 I believe that someone who understands the code in depth (and that
 may also be one of the people above) determine exactly how BTRFS
 implements RAID-10.

I am such a person.  I had a similar question a year or two ago (
specifically about raid10  ) so I both experimented and read the code
myself to find out.  I was disappointed to find that it won't do
raid10 on 3 disks since the chunk metadata describes raid10 as a
stripe layered on top of a mirror.

Jose's point was also a good one though; one chunk may decide to
mirror disks A and B, so a failure of A and C it could recover from,
but a different chunk could choose to mirror on disks A and C, so that
chunk would be lost if A and C fail.  It would probably be nice if the
chunk allocator tried to be more deterministic about that.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCgAGBQJUo2M8AAoJENRVrw2cjl5RihoH/1ulWpEK6lPaYhBSBbmWQyGu
obJZBTbeMgBAfO9VMq9X2laUfmEprwYi8FuKnCwVgA1KyftFsaJngckqMoTtpwdI
IXx2X2++MjZBkFBUFRhGlSQcbDgeB/RbBx+Vtxi2dNq3/WgZyHRfIJT1moRrxY0V
UTH1kI7JsWg4blpdm+xW4o7UKds7JKHr5Th1PUH9SmJOdsBe2efIFQyC7hyuSQs0
gBUQzxmo3HcRzBtJwJjKRICU16VBN0NW7w3m/y6K1yIlkGi4U7MZgzMSUJw/BiMT
tGX48AhBH3D3R2sjmF2aO5suPaHEVYoZuqhKevKZfTGS7izSYA74LqrGHkq5QBk=
=ESya
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-29 Thread sys.syphus
oh, and sorry to bump myself. but is raid10 *ever* more redundant in
btrfs-speak than raid1? I currently use raid1 but i know in mdadm
speak raid10 means you can lose 2 drives assuming they aren't the
wrong ones, is it safe to say with btrfs / raid 10 you can only lose
one no matter what?
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-29 Thread Hugo Mills
On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote:
 oh, and sorry to bump myself. but is raid10 *ever* more redundant in
 btrfs-speak than raid1? I currently use raid1 but i know in mdadm
 speak raid10 means you can lose 2 drives assuming they aren't the
 wrong ones, is it safe to say with btrfs / raid 10 you can only lose
 one no matter what?

   I think that with an even number of identical-sized devices, you
get the same guarantees (well, behaviour) as you would with
traditional RAID-10.

   I may be wrong about that -- do test before relying on it. The FS
probably won't like losing two devices, though, even if the remaining
data is actually enough to reconstruct the FS.

   Hugo.

-- 
Hugo Mills | I can resist everything except temptation
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature


Re: I need to P. are we almost there yet?

2014-12-29 Thread sys.syphus
so am I to read that as if btrfs redundancy isn't really functional?
if i yank a member of my raid 1 out in live prod is it going to take
a dump on my data?

On Mon, Dec 29, 2014 at 1:04 PM, Hugo Mills h...@carfax.org.uk wrote:
 On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote:
 oh, and sorry to bump myself. but is raid10 *ever* more redundant in
 btrfs-speak than raid1? I currently use raid1 but i know in mdadm
 speak raid10 means you can lose 2 drives assuming they aren't the
 wrong ones, is it safe to say with btrfs / raid 10 you can only lose
 one no matter what?

I think that with an even number of identical-sized devices, you
 get the same guarantees (well, behaviour) as you would with
 traditional RAID-10.

I may be wrong about that -- do test before relying on it. The FS
 probably won't like losing two devices, though, even if the remaining
 data is actually enough to reconstruct the FS.

Hugo.

 --
 Hugo Mills | I can resist everything except temptation
 hugo@... carfax.org.uk |
 http://carfax.org.uk/  |
 PGP: 65E74AC0  |
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-29 Thread Chris Murphy
By asking the question this way, I don't think you understand how
Btrfs development works. But if you check out the git pull for 3.19
you'll see a bunch of patches that pretty much close the feature
parity (no pun intended) gap for raid56 and raid0,1,10. But it is an
rc, and still needs testing, and even once 3.19 becomes a stable
kernel it's new enough code there can always be edge cases. And raid1
has been tested in Btrfs for how many years now? So if you want the
same amount of raid6 testing by time it would be however many years
that's been from the time 3.19 is released.

Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-29 Thread Chris Murphy
On Mon, Dec 29, 2014 at 12:00 PM, sys.syphus syssyp...@gmail.com wrote:
 oh, and sorry to bump myself. but is raid10 *ever* more redundant in
 btrfs-speak than raid1? I currently use raid1 but i know in mdadm
 speak raid10 means you can lose 2 drives assuming they aren't the
 wrong ones, is it safe to say with btrfs / raid 10 you can only lose
 one no matter what?

It's only for sure one in any case even with conventional raid10. It
just depends on which 2 you lose that depends whether your data has
dodged a bullet. Obviously you can't lose a drive and its mirror,
ever, or the array collapses.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I need to P. are we almost there yet?

2014-12-29 Thread Hugo Mills
On Mon, Dec 29, 2014 at 02:25:14PM -0600, sys.syphus wrote:
 so am I to read that as if btrfs redundancy isn't really functional?
 if i yank a member of my raid 1 out in live prod is it going to take
 a dump on my data?

   Eh? Where did that conclusion some from? I said nothing at all
about RAID-1, only RAID-10.

   So, to clarify:

   In the general case, you can safely lose one device from a btrfs
RAID-10. Also in the general case, losing a second device will break
the filesystem (with very high probability).

   In the case I gave below, with an even number of equal sized
devices, the second device to be lost *may* allow the data to be
recovered with sufficient effort, but the FS in general will probably
not be mountable with two missing devices.

   So, btrfs RAID-10 offers the same *guarantees* as traditional
RAID-10. It's generally less effective with the probabilities of the
failure modes beyond the guarantee.

   Hugo.

 On Mon, Dec 29, 2014 at 1:04 PM, Hugo Mills h...@carfax.org.uk wrote:
  On Mon, Dec 29, 2014 at 01:00:05PM -0600, sys.syphus wrote:
  oh, and sorry to bump myself. but is raid10 *ever* more redundant in
  btrfs-speak than raid1? I currently use raid1 but i know in mdadm
  speak raid10 means you can lose 2 drives assuming they aren't the
  wrong ones, is it safe to say with btrfs / raid 10 you can only lose
  one no matter what?
 
 I think that with an even number of identical-sized devices, you
  get the same guarantees (well, behaviour) as you would with
  traditional RAID-10.
 
 I may be wrong about that -- do test before relying on it. The FS
  probably won't like losing two devices, though, even if the remaining
  data is actually enough to reconstruct the FS.
 
 Hugo.
 

-- 
Hugo Mills | emacs: Eighty Megabytes And Constantly Swapping.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: 65E74AC0  |


signature.asc
Description: Digital signature


Re: I need to P. are we almost there yet?

2014-12-29 Thread ashford
 On Mon, Dec 29, 2014 at 12:00 PM, sys.syphus syssyp...@gmail.com wrote:
 oh, and sorry to bump myself. but is raid10 *ever* more redundant in
 btrfs-speak than raid1? I currently use raid1 but i know in mdadm
 speak raid10 means you can lose 2 drives assuming they aren't the
 wrong ones, is it safe to say with btrfs / raid 10 you can only lose
 one no matter what?

 It's only for sure one in any case even with conventional raid10. It
 just depends on which 2 you lose that depends whether your data has
 dodged a bullet. Obviously you can't lose a drive and its mirror,
 ever, or the array collapses.

Just some background data on traditional RAID, and the chances of survival
with a 2-drive failure.

In traditional RAID-10, the chances of surviving a 2-drive failure is 66%
on a 4-drive array, and approaches 100% as the number of drives in the
array increase.

In traditional RAID-0+1 (used to be common in low-end fake-RAID cards),
the chances of surviving a 2-drive failure is 33% on a 4-drive array, and
approaches 50% as the number of drives in the array increase.

In traditional RAID-1E, the chances of surviving a 2-drive failure is 66%
on a 4-drive array, and approaches 100% as the number of drives in the
array increase.  This is the same as for RAID-10.  RAID-1E allows an odd
number of disks to be actively used in the array. 
https://en.wikipedia.org/wiki/File:RAID_1E.png

I'm wondering which of the above the BTRFS implementation most closely
resembles.

 So if you want the same amount of raid6 testing by time it would be
 however many years that's been from the time 3.19 is released.

I don't believe that's correct.  Over those several years, quite a few
tests for corner cases have been developed.  I expect that those tests are
used for regression testing of each release to ensure that old bugs aren't
inadvertently reintroduced.  Furthermore, I expect that a large number of
those corner case tests can be easily modified to test RAID-5 and RAID-6. 
In reality, I expect the stability (i.e. similar to RAID-10 currently) of
RAID-5/6 code in BTRFS will be achieved rather quickly (only a year or
two).

I expect that the difficult part will be to optimize the performance of
BTRFS.  Hopefully those tests (and others, yet to be developed) will be
able to keep it stable while the code is optimized for performance.

Peter Ashford

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html