On 28/11/13 08:16, Stan Hoeppner wrote:
Late reply. This one got lost in the flurry of activity...
On 11/22/2013 7:24 AM, David Brown wrote:
On 22/11/13 09:38, Stan Hoeppner wrote:
On 11/21/2013 3:07 AM, David Brown wrote:
For example, with 20 disks at 1 TB each, you can have:
...
Late reply. This one got lost in the flurry of activity...
On 11/22/2013 7:24 AM, David Brown wrote:
On 22/11/13 09:38, Stan Hoeppner wrote:
On 11/21/2013 3:07 AM, David Brown wrote:
For example, with 20 disks at 1 TB each, you can have:
...
Maximum:
RAID 10 = 10 disk redundancy
RAID 15
On Thu, 28 Nov 2013, Stan Hoeppner s...@hardwarefreak.com wrote:
We must follow different definitions of redundancy. I view redundancy
as the number of drives that can fail without taking down the array. In
the case of the above 20 drive RAID15 that maximum is clearly 11
drives-- one of
This is great stuff.
Now, how can we get this into btrfs and md?
On Wed, Nov 20, 2013 at 1:23 PM, Andrea Mazzoleni amadva...@gmail.com wrote:
Hi,
First, create a 3 by 6 cauchy matrix, using x_i = 2^-i, and y_i = 0 for i=0,
and y_i = 2^i for other i.
In this case: x = { 1, 142, 71, 173,
On Fri, Nov 22, 2013 at 07:12:39PM +1100, Russell Coker wrote:
On Thu, 21 Nov 2013 18:30:49 Stan Hoeppner wrote:
I suggest that anyone in the future needing fast random write IOPS is
going to move those workloads to SSD, which is steadily increasing in
capacity. And I suggest anyone
Hi Piergiorgio,
Checking better I found that for PAR3 it was also evaluated a Cauchy
matrix, but they preferred to use a RS with FFT in GF(2^16 +1).
http://sourceforge.net/mailarchive/forum.php?forum_name=parchive-develmax_rows=25style=nestedviewmonth=201006
Note that using a Cauchy matrix for
On 11/23/2013 11:14 PM, John Williams wrote:
On Sat, Nov 23, 2013 at 8:03 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
Parity array rebuilds are read-modify-write operations. The main
difference from normal operation RMWs is that the write is always to the
same disk. As long as the
On 11/23/2013 11:19 PM, Russell Coker wrote:
On Sun, 24 Nov 2013, Stan Hoeppner s...@hardwarefreak.com wrote:
I have always surmised that the culprit is rotational latency, because
we're not able to get a real sector-by-sector streaming read from each
drive. If even only one disk in the array
On Sun, Nov 24, 2013 at 1:44 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
SNIP
Are you suggesting that it would be a common case that people just write data
to an array and never read it or do an array scrub? I hope that it will
become standard practice to have a cron job scrubbing all
On 24-11-13 22:13, Stan Hoeppner wrote:
I freely admit I may have drawn an incorrect conclusion about md
parity rebuild performance based on incomplete data. I simply don't
recall anyone stating here in ~3 years that their parity rebuilds were
speedy, but quite the opposite. I guess it's
On 11/24/2013 5:53 PM, Alex Elsayed wrote:
Stan Hoeppner wrote:
On 11/23/2013 11:14 PM, John Williams wrote:
On Sat, Nov 23, 2013 at 8:03 PM, Stan Hoeppner s...@hardwarefreak.com
wrote:
snip
But I, and a number of other people I have talked to or corresponded
with, have had mdadm RAID 5
On Mon, 25 Nov 2013, Stan Hoeppner s...@hardwarefreak.com wrote:
If that is the problem then the solution would be to just enable
read-ahead. Don't we already have that in both the OS and the disk
hardware? The hard- drive read-ahead buffer should at least cover the
case where a seek
On 22/11/13 23:59, NeilBrown wrote:
On Fri, 22 Nov 2013 10:07:09 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
snip
In the event of a double drive failure in one mirror, the RAID 1 code
will need to be modified in such a way as to allow the RAID 5 code to
rebuild the first replacement
On Sat, Nov 23, 2013 at 8:03 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
Parity array rebuilds are read-modify-write operations. The main
difference from normal operation RMWs is that the write is always to the
same disk. As long as the stripe reads and chunk reconstruction outrun
the
On Sun, 24 Nov 2013, Stan Hoeppner s...@hardwarefreak.com wrote:
I have always surmised that the culprit is rotational latency, because
we're not able to get a real sector-by-sector streaming read from each
drive. If even only one disk in the array has to wait for the platter
to come round
On Thu, 21 Nov 2013 20:54:54 Adam Goryachev wrote:
On a pure storage server, the CPU would normally have nothing to do,
except a little interrupt handling, it is just shuffling bytes around.
Of course, if you need RAID7.5 then you probably have a dedicated
storage server, so I don't see the
Hi David,
On 11/21/2013 3:07 AM, David Brown wrote:
On 21/11/13 02:28, Stan Hoeppner wrote:
...
WRT rebuild times, once drives hit 20TB we're looking at 18 hours just
to mirror a drive at full streaming bandwidth, assuming 300MB/s
average--and that is probably being kind to the drive makers.
On 11/21/2013 3:07 AM, David Brown wrote:
For example, with 20 disks at 1 TB each, you can have:
All correct, and these are maximum redundancies.
Maximum:
raid5 = 19TB, 1 disk redundancy
raid6 = 18TB, 2 disk redundancy
raid6.3 = 17TB, 3 disk redundancy
raid6.4 = 16TB, 4 disk redundancy
On 11/21/2013 5:38 PM, John Williams wrote:
On Thu, Nov 21, 2013 at 2:57 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
He wrote that article in late 2009. It seems pretty clear he wasn't
looking 10 years forward to 20TB drives, where the minimum mirror
rebuild time will be ~18 hours, and
On Fri, Nov 22, 2013 at 1:35 AM, Stan Hoeppner s...@hardwarefreak.com wrote:
Only one graph goes to 2019, the rest are 2010 or less. That being the
case, his 2019 graph deals with projected reliability of single, double,
and triple parity.
The whole article goes to 2019 (or longer). He shows
On 11/22/2013 2:13 AM, Stan Hoeppner wrote:
Hi David,
On 11/21/2013 3:07 AM, David Brown wrote:
...
I don't see that there needs to be any changes to the existing md code
to make raid15 work - it is merely a raid 5 made from a set of raid1
pairs.
The sole purpose of the parity layer of
On Fri, Nov 22, 2013 at 12:13 AM, Stan Hoeppner s...@hardwarefreak.com wrote:
Hi David,
On 11/21/2013 3:07 AM, David Brown wrote:
SNIP
Shouldn't we be talking about RAID 15 here, rather than RAID 51 ? I
interpret RAID 15 to be like RAID 10 - a raid5 set of raid1 mirrors,
while RAID 51 would
Mark Knecht posted on Fri, 22 Nov 2013 08:50:32 -0800 as excerpted:
On Fri, Nov 22, 2013 at 12:13 AM, Stan Hoeppner s...@hardwarefreak.com
wrote:
Now that you mention it, yes, RAID 15 would fit much better with
convention. Not sure why I thought 51. So it's RAID 15 from here.
SNIP
For
Hi David,
On Fri, Nov 22, 2013 at 01:32:09AM +0100, David Brown wrote:
One typical case is when many errors are
found, belonging to the same disk.
This case clearly shows the disk is to be
replaced or the interface checked...
But, again, the user is the master, not the
machine... :-)
On 11/22/2013 9:01 AM, John Williams wrote:
snip
I see no advantage of RAID 15, and several disadvantages.
Of course not, just as I sated previously.
On 11/22/2013 2:13 AM, Stan Hoeppner wrote:
Parity users who currently shun RAID 10 for this reason will also
shun this RAID 15.
With that
On Fri, 22 Nov 2013 10:07:09 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/22/2013 2:13 AM, Stan Hoeppner wrote:
Hi David,
On 11/21/2013 3:07 AM, David Brown wrote:
...
I don't see that there needs to be any changes to the existing md code
to make raid15 work - it is merely
On Thu, 21 Nov 2013 16:57:48 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/21/2013 1:05 AM, John Williams wrote:
On Wed, Nov 20, 2013 at 10:52 PM, Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/20/2013 8:46 PM, John Williams wrote:
For myself or any machines I managed for
On 11/22/2013 5:07 PM, NeilBrown wrote:
On Thu, 21 Nov 2013 16:57:48 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/21/2013 1:05 AM, John Williams wrote:
On Wed, Nov 20, 2013 at 10:52 PM, Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/20/2013 8:46 PM, John Williams wrote:
For
On Fri, 22 Nov 2013 21:46:50 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/22/2013 5:07 PM, NeilBrown wrote:
On Thu, 21 Nov 2013 16:57:48 -0600 Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/21/2013 1:05 AM, John Williams wrote:
On Wed, Nov 20, 2013 at 10:52 PM, Stan
On Fri, Nov 22, 2013 at 9:04 PM, NeilBrown ne...@suse.de wrote:
I guess with that many drives you could hit PCI bus throughput limits.
A 16-lane PCIe 4.0 could just about give 100MB/s to each of 16 devices. So
you would really need top-end hardware to keep all of 16 drives busy in a
On Fri, 22 Nov 2013 21:34:41 -0800 John Williams jwilliams4...@gmail.com
wrote:
On Fri, Nov 22, 2013 at 9:04 PM, NeilBrown ne...@suse.de wrote:
I guess with that many drives you could hit PCI bus throughput limits.
A 16-lane PCIe 4.0 could just about give 100MB/s to each of 16 devices.
Hi Piergiorgio,
How about par2? How does this work?
I checked the matrix they use, and sometimes it contains some singular
square submatrix.
It seems that in GF(2^16) these cases are just less common. Maybe they
were just unnoticed.
Anyway, this seems to be an already known problem for PAR2,
On 21/11/2013 02:28, Stan Hoeppner wrote:
On 11/20/2013 10:16 AM, James Plank wrote:
Hi all -- no real comments, except as I mentioned to Ric, my tutorial
in FAST last February presents Reed-Solomon coding with Cauchy
matrices, and then makes special note of the common pitfall of
assuming that
On 20/11/13 19:09, John Williams wrote:
On Wed, Nov 20, 2013 at 2:31 AM, David Brown david.br...@hesbynett.no wrote:
That's certainly a reasonable way to look at it. We should not limit
the possibilities for high-end systems because of the limitations of
low-end systems that are unlikely to
On 20/11/13 19:34, Andrea Mazzoleni wrote:
Hi David,
The choice of ZFS to use powers of 4 was likely not optimal,
because to multiply by 4, it has to do two multiplications by 2.
I can agree with that. I didn't copy ZFS's choice here
David, it was not my intention to suggest that you
On 21/11/13 02:28, Stan Hoeppner wrote:
On 11/20/2013 10:16 AM, James Plank wrote:
Hi all -- no real comments, except as I mentioned to Ric, my tutorial
in FAST last February presents Reed-Solomon coding with Cauchy
matrices, and then makes special note of the common pitfall of
assuming that
On 21/11/13 10:54, Adam Goryachev wrote:
On 21/11/13 20:07, David Brown wrote:
I can see plenty of reasons why raid15 might be a good idea, and even
raid16 for 5 disk redundancy, compared to multi-parity sets. However,
it costs a lot in disk space. For example, with 20 disks at 1 TB each,
Hi David,
On Thu, Nov 21, 2013 at 09:31:46PM +0100, David Brown wrote:
[...]
If this can all be done to give the user an informed choice, then it
sounds good.
that would be my target.
To _offer_ more options to the (advanced) user.
It _must_ always be under user control.
One issue here is
On 11/21/2013 1:05 AM, John Williams wrote:
On Wed, Nov 20, 2013 at 10:52 PM, Stan Hoeppner s...@hardwarefreak.com
wrote:
On 11/20/2013 8:46 PM, John Williams wrote:
For myself or any machines I managed for work that do not need high
IOPS, I would definitely choose triple- or quad-parity
On Thu, Nov 21, 2013 at 11:13:29AM +0100, David Brown wrote:
[...]
Ah, you are trying to find which disk has incorrect data so that you can
change just that one disk? There are dangers with that...
Hi David,
http://neil.brown.name/blog/20100211050355
I think we already did the exercise,
On 11/21/2013 2:08 AM, joystick wrote:
On 21/11/2013 02:28, Stan Hoeppner wrote:
...
WRT rebuild times, once drives hit 20TB we're looking at 18 hours just
to mirror a drive at full streaming bandwidth, assuming 300MB/s
average--and that is probably being kind to the drive makers. With 6 or
On 21/11/13 21:52, Piergiorgio Sartor wrote:
Hi David,
On Thu, Nov 21, 2013 at 09:31:46PM +0100, David Brown wrote:
[...]
If this can all be done to give the user an informed choice, then it
sounds good.
that would be my target.
To _offer_ more options to the (advanced) user.
It _must_
On 11/21/2013 04:30 PM, Stan Hoeppner wrote:
The rebuild time of a parity array normally has little to do with CPU
overhead.
Unless you have to fall back to table driven code.
Anyway, this looks like a great concept. Now we just need to implement
it ;)
-hpa
--
To unsubscribe from
On 22/11/13 01:30, Stan Hoeppner wrote:
I don't like it either. It's a compromise. But as RAID1/10 will soon
be unusable due to URE probability during rebuild, I think it's a
relatively good compromise for some users, some workloads.
An alternative is to move to 3-way raid1 mirrors rather
On Wed, Nov 20, 2013 at 07:28:37PM -0600, Stan Hoeppner wrote:
[...]
It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal
today. ;)
I'm not attempting to marginalize Andrea's work here, but I can't help
but ponder what the real value of triple parity RAID is, or quad, or
On 20/11/13 02:23, John Williams wrote:
On Tue, Nov 19, 2013 at 4:54 PM, Chris Murphy li...@colorremedies.com
wrote:
If anything, I'd like to see two implementations of RAID 6 dual
parity. The existing implementation in the md driver and btrfs could
remain the default, but users could opt
Hi all -- no real comments, except as I mentioned to Ric, my tutorial in FAST
last February presents Reed-Solomon coding with Cauchy matrices, and then makes
special note of the common pitfall of assuming that you can append a
Vandermonde matrix to an identity matrix. Please see
On Wed, Nov 20, 2013 at 2:31 AM, David Brown david.br...@hesbynett.no wrote:
That's certainly a reasonable way to look at it. We should not limit
the possibilities for high-end systems because of the limitations of
low-end systems that are unlikely to use 3+ parity anyway. I've also
looked
Hi David,
The choice of ZFS to use powers of 4 was likely not optimal,
because to multiply by 4, it has to do two multiplications by 2.
I can agree with that. I didn't copy ZFS's choice here
David, it was not my intention to suggest that you copied from ZFS.
Sorry to have expressed myself
It is also possible to quickly multiply by 2^-1 which makes for an interesting
R parity.
Andrea Mazzoleni amadva...@gmail.com wrote:
Hi David,
The choice of ZFS to use powers of 4 was likely not optimal,
because to multiply by 4, it has to do two multiplications by 2.
I can agree with that.
Hi John,
Yes. There are still AMD CPUs sold without SSSE3. Most notably Athlon.
Instead, Intel is providing SSSE3 from the Core 2 Duo.
A detailed list is available at: http://en.wikipedia.org/wiki/SSSE3
Ciao,
Andrea
On Wed, Nov 20, 2013 at 7:09 PM, John Williams jwilliams4...@gmail.com wrote:
Hi,
Yep. At present to multiply for 2^-1 I'm using in C:
static inline uint64_t d2_64(uint64_t v)
{
uint64_t mask = v 0x0101010101010101U;
mask = (mask 8) - mask;
v = (v 1) 0x7f7f7f7f7f7f7f7fU;
v ^= mask 0x8e8e8e8e8e8e8e8eU;
return v;
}
and for SSE2:
On 11/20/2013 10:56 AM, Andrea Mazzoleni wrote:
Hi,
Yep. At present to multiply for 2^-1 I'm using in C:
static inline uint64_t d2_64(uint64_t v)
{
uint64_t mask = v 0x0101010101010101U;
mask = (mask 8) - mask;
v = (v 1) 0x7f7f7f7f7f7f7f7fU;
v ^=
Hi Jim,
I build the matrix in a way that results in coefficients matching
Linux RAID for the first two rows, and at the same time gives
the guarantee that all the square submatrices are not singular,
resulting in a MDS code.
I start forming a Cauchy matrix setting each element to 1/(xi+yj)
where
On 11/20/2013 11:05 AM, Andrea Mazzoleni wrote:
For the first row with j=0, I use xi = 2^-i and y0 = 0, that results in:
How can xi = 2^-i if x is supposed to be constant?
That doesn't mean that your approach isn't valid, of course, but it
might not be a Cauchy matrix and thus needs
Peter, I think I understand it differently. Concrete example in GF(256) for
k=6, m=4:
First, create a 3 by 6 cauchy matrix, using x_i = 2^-i, and y_i = 0 for i=0,
and y_i = 2^i for other i. In this case: x = { 1, 142, 71, 173, 216, 108 }
y = { 0, 2, 4). The cauchy matrix is:
1 2 4
Hi Peter,
static inline uint64_t d2_64(uint64_t v)
{
uint64_t mask = v 0x0101010101010101U;
mask = (mask 8) - mask;
(mask 7) I assume...
No. It's (mask 8) - mask. We want to expand the bit at position 0
(in each byte) to the full byte, resulting in 0xFF if the bit is at
On 11/20/2013 01:04 PM, Andrea Mazzoleni wrote:
Hi Peter,
static inline uint64_t d2_64(uint64_t v)
{
uint64_t mask = v 0x0101010101010101U;
mask = (mask 8) - mask;
(mask 7) I assume...
No. It's (mask 8) - mask. We want to expand the bit at position 0
(in each byte)
Hi Peter,
Now, that doesn't sound like something that can get neatly meshed into
the Cauchy matrix scheme, I assume.
You are correct. Multiplication by 2^-1 cannot be used for the Cauchy method.
I used it to implement an alternate triple parity not requiring PSHUFB
that I used as reference for
Hi,
First, create a 3 by 6 cauchy matrix, using x_i = 2^-i, and y_i = 0 for i=0,
and y_i = 2^i for other i.
In this case: x = { 1, 142, 71, 173, 216, 108 } y = { 0, 2, 4). The
cauchy matrix is:
1 2 4 8 16 32
244 83 78 183 118 47
167 39 213 59 153 82
Divide row 2 by
On 11/20/2013 12:30 PM, James Plank wrote:
Peter, I think I understand it differently. Concrete example in GF(256) for
k=6, m=4:
First, create a 3 by 6 cauchy matrix, using x_i = 2^-i, and y_i = 0 for i=0,
and y_i = 2^i for other i. In this case: x = { 1, 142, 71, 173, 216, 108 }
y
Hi Piergiorgio,
In RAID-6 (as per raid6check) there is an easy way
to verify where an HDD has incorrect data.
I suspect, for each 2 parity block it should be
possible to find 1 error (and if this is true, then
quad parity is more attractive than triple one).
Yes. The theory say that with
On Wed, Nov 20, 2013 at 11:44:39AM +0100, David Brown wrote:
[...]
In RAID-6 (as per raid6check) there is an easy way
to verify where an HDD has incorrect data.
I think the way to do that is just to generate the parity blocks from
the data blocks, and compare them to the existing parity
On Mon, Nov 18, 2013 at 11:08:59PM +0100, Andrea Mazzoleni wrote:
[...]
I've a side question, a bit OT, but maybe you
could help with the answer.
How about par2? How does this work?
They claim Vendermonde matrix and they seem
to be quite flexible in amount of parities.
The could be in GF(2^16),
For myself or any machines I managed for work that do not need high
IOPS, I would definitely choose triple- or quad-parity over RAID 51 or
similar schemes with arrays of 16 - 32 drives.
No need to go into detail here on a subject Adam Leventhal has already
covered in detail in an article
On 11/20/2013 12:44 PM, Andrea Mazzoleni wrote:
Yes. There are still AMD CPUs sold without SSSE3. Most notably Athlon.
Instead, Intel is providing SSSE3 from the Core 2 Duo.
I hate branding discontinuity, due to the resulting confusion...
Athlon, Athlon64, Athlon64 X2, Athlon X2 (K10), Athlon
On Wed, Nov 20, 2013 at 10:52 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
On 11/20/2013 8:46 PM, John Williams wrote:
For myself or any machines I managed for work that do not need high
IOPS, I would definitely choose triple- or quad-parity over RAID 51 or
similar schemes with arrays of 16
On 19/11/13 00:25, H. Peter Anvin wrote:
On 11/18/2013 02:35 PM, Andrea Mazzoleni wrote:
Hi Peter,
The Cauchy matrix has the mathematical property to always have itself
and all submatrices not singular. So, we are sure that we can always
solve the equations to recover the data disks.
Hi Peter,
Yes, 251 disks for 6 parity.
To build a NxM Cauchy matrix you need to pick N+M distinct values
in the GF(2^8) and we have only 2^8 == 256 available.
This means that for every row we add for an extra parity level, we
have to remove one of the disk columns.
Note that in true, I use an
Hi David,
Just to say that I know your good past work, and it helped me a lot.
Thanks for that!
Unfortunately the Cauchy matrix is not compatible with a triple parity
implementation using power coefficients. They are different and
incompatible roads.
I partially agree on your considerations,
On Mon, Nov 18, 2013 at 11:08:59PM +0100, Andrea Mazzoleni wrote:
Hi,
I want to report that I recently implemented a support for
arbitrary number of parities that could be useful also for Linux
RAID and Btrfs, both currently limited to double parity.
In short, to generate the parity I use
On 11/19/2013 12:28 PM, Andrea Mazzoleni wrote:
Hi Peter,
Yes, 251 disks for 6 parity.
To build a NxM Cauchy matrix you need to pick N+M distinct values
in the GF(2^8) and we have only 2^8 == 256 available.
This means that for every row we add for an extra parity level, we
have to remove one
I'm not going to claim any expert status on this discussion (the
theory makes my head spin) but I will say I agree with Andrea as far
as prefering his implementation for triple parity and beyond.
PSHUFB has been around the intel platform since the Core2 introduced
it as part of SSSE3 back in Q1
On Nov 19, 2013, at 3:51 PM, Drew drew@gmail.com wrote:
I'm not going to claim any expert status on this discussion (the
theory makes my head spin) but I will say I agree with Andrea as far
as prefering his implementation for triple parity and beyond.
PSHUFB has been around the intel
On Tue, Nov 19, 2013 at 4:54 PM, Chris Murphy li...@colorremedies.com
wrote:
If anything, I'd like to see two implementations of RAID 6 dual
parity. The existing implementation in the md driver and btrfs could
remain the default, but users could opt into Cauchy matrix based dual
parity which
On 11/18/2013 02:08 PM, Andrea Mazzoleni wrote:
Hi,
I want to report that I recently implemented a support for
arbitrary number of parities that could be useful also for Linux
RAID and Btrfs, both currently limited to double parity.
In short, to generate the parity I use a Cauchy matrix
Hi Peter,
The Cauchy matrix has the mathematical property to always have itself
and all submatrices not singular. So, we are sure that we can always
solve the equations to recover the data disks.
Besides the mathematical proof, I've also inverted all the
377,342,351,231 possible submatrices for
On 11/18/2013 02:35 PM, Andrea Mazzoleni wrote:
Hi Peter,
The Cauchy matrix has the mathematical property to always have itself
and all submatrices not singular. So, we are sure that we can always
solve the equations to recover the data disks.
Besides the mathematical proof, I've also
78 matches
Mail list logo