Re: [RFC v2 0/2] New RAID library supporting up to six parities

2014-01-06 Thread joystick

On 06/01/2014 10:31, Andrea Mazzoleni wrote:

Hi,

This is a port to the Linux kernel of a RAID engine that I'm currently using
in a hobby project called SnapRAID. This engine supports up to six parities
levels and at the same time maintains compatibility with the existing Linux
RAID6 one.



This is just great Andrea,
thank you for such Epiphany present.

Just by looking at the Subjects, it seems patch number 0/1 is missing. 
It might have not gotten through to the lists, or be a numbering mistake.


Does your code also support (shortcut) RMW as opposed to RCW, for all 
parities?
RMW is: for a 4k write: read just nparities+1data disks, recompute 
parities, write nparities+1data disks,

RCW is: read all disks prior to recompute parities...

Part of such RMW code should be in handle_stripe_dirtying which is not 
in your patch 0/2 but that might have been in patch 0/1 which apparently 
didn't get through.


See this patch by Kumar for support of RMW in raid6 (raid5 has it) which 
unfortunately apparently wasn't merged up to now:

http://marc.info/?l=linux-raidm=136624783417452w=2

Thank you
J.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2 0/2] New RAID library supporting up to six parities

2014-01-06 Thread joystick

On 06/01/2014 14:11, Alex Elsayed wrote:

joystick wrote:


Just by looking at the Subjects, it seems patch number 0/1 is missing.
It might have not gotten through to the lists, or be a numbering mistake.

No, the numbering style is ${index}/${total}, where index = 0 is a cover
letter. So there are two patches total, 1/2 and 2/2, with the cover letter
0/2.


Ok sorry, I meant

Just by looking at the Subjects, it seems patch number 1/2 is missing.

and not 0/1

Can you see the 1/2 on the lists? I can't.

J.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-21 Thread joystick

On 21/11/2013 02:28, Stan Hoeppner wrote:

On 11/20/2013 10:16 AM, James Plank wrote:

Hi all -- no real comments, except as I mentioned to Ric, my tutorial
in FAST last February presents Reed-Solomon coding with Cauchy
matrices, and then makes special note of the common pitfall of
assuming that you can append a Vandermonde matrix to an identity
matrix.  Please see
http://web.eecs.utk.edu/~plank/plank/papers/2013-02-11-FAST-Tutorial.pdf,
slides 48-52.

Andrea, does the matrix that you included in an earlier mail (the one
that has Linux RAID-6 in the first two rows) have a general form, or
did you develop it in an ad hoc manner so that it would include Linux
RAID-6 in the first two rows?

Hello Jim,

It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal
today. ;)

I'm not attempting to marginalize Andrea's work here, but I can't help
but ponder what the real value of triple parity RAID is, or quad, or
beyond.  Some time ago parity RAID's primary mission ceased to be
surviving single drive failure, or a 2nd failure during rebuild, and
became mitigating UREs during a drive rebuild.  So we're now talking
about dedicating 3 drives of capacity to avoiding disaster due to
platter defects and secondary drive failure.  For small arrays this is
approaching half the array capacity.  So here parity RAID has lost the
battle with RAID10's capacity disadvantage, yet it still suffers the
vastly inferior performance in normal read/write IO, not to mention
rebuild times that are 3-10x longer.

WRT rebuild times, once drives hit 20TB we're looking at 18 hours just
to mirror a drive at full streaming bandwidth, assuming 300MB/s
average--and that is probably being kind to the drive makers.  With 6 or
8 of these drives, I'd guess a typical md/RAID6 rebuild will take at
minimum 72 hours or more, probably over 100, and probably more yet for
3P.  And with larger drive count arrays the rebuild times approach a
week.  Whose users can go a week with degraded performance?  This is
simply unreasonable, at best.  I say it's completely unacceptable.

With these gargantuan drives coming soon, the probability of multiple
UREs during rebuild are pretty high.


No because if you are correct about the very high CPU overhead during 
rebuild (which I don't see so dramatic as Andrea claims 500MB/sec for 
triple-parity, probably parallelizable on multiple cores), the speed of 
rebuild decreases proportionally and hence the stress and heating on the 
drives proportionally reduces, approximating that of normal operation.
And how often have you seen a drive failure in a week during normal 
operation?


But in reality, consider that a non-naive implementation of 
multiple-parity would probably use just the single parity during 
reconstruction if just one disk fails, using the multiple parities only 
to read the stripes which are unreadable at single parity. So the speed 
and time of reconstruction and performance penalty would be that of 
raid5 except in exceptional situations of multiple failures.




...
What I envision is an array type, something similar to RAID 51, i.e.
striped parity over mirror pairs. 


I don't like your approach of raid 51: it has the write overhead of 
raid5, with the waste of space of raid1.

So it cannot be used as neither a performance array nor a capacity array.
In the scope of this discussion (we are talking about very large 
arrays), the waste of space of your solution, higher than 50%, will make 
your solution costing double the price.


A competitor for the multiple-parity scheme might be raid65 or 66, but 
this is a so much dirtier approach than multiple parity if you think at 
the kind of rmw and overhead that will occur during normal operation.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html