Re: Triple parity and beyond

2013-11-22 Thread Piergiorgio Sartor
Hi David,

On Fri, Nov 22, 2013 at 01:32:09AM +0100, David Brown wrote:
  One typical case is when many errors are
  found, belonging to the same disk.
  This case clearly shows the disk is to be
  replaced or the interface checked...
  But, again, the user is the master, not the
  machine... :-)
 
 I don't know what sort of interface you have for the user, but I guess
 that means you'll have to collect a number of failures before showing
 them so that the user can see the correlation on disk number.

as usual in Unix, one software will collect
data to a file, an other one will analyze
that file.
Originally, one idea was even to check at
stripe level how many errors (and where)
are present. From that some statistics will
be presented to the user.
This would be integrated in the check tool,
of course.

  For most ECC schemes, you know that all your blocks are set
  synchronously - so any block that does not fit in, is an error.  With
  raid, it could also be that a stripe is only partly written - you can
  
  Could it be?
  I would consider this an error.
 
 It could occur as the result of a failure of some sort (kernel crash,
 power failure, temporary disk problem, etc.).  More generally, md raid
 doesn't have to be on local physical disks - maybe one of the disks is
 an iSCSI drive or something else over a network that could have failures
 or delays.  I haven't thought through all cases here - I am just
 throwing them out as possibilities that might cause trouble.

OK, I misunderstood you, I was thinking during
normal operation...
Again, the check can find that issue, it will
tell that it cannot find where the problem is.
But it will tell where.
Possibly, an other tool can check the FS at
that position.

bye, 

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-21 Thread Piergiorgio Sartor
Hi David,

On Thu, Nov 21, 2013 at 09:31:46PM +0100, David Brown wrote:
[...]
 If this can all be done to give the user an informed choice, then it
 sounds good.

that would be my target.
To _offer_ more options to the (advanced) user.
It _must_ always be under user control.

 One issue here is whether the check should be done with the filesystem
 mounted and in use, or only off-line.  If it is off-line then it will
 mean a long down-time while the array is checked - but if it is online,
 then there is the risk of confusing the filesystem and caches by
 changing the data.

Currently, raid6check can work with FS mounted.
I got the suggestion from Neil (of course).
It is possible to lock one stripe and check it.
This should be, at any given time, consistent
(that is, the parity should always match the data).
If an error is found, it is reported.
Again, the user can decide to fix it or not,
considering all the FS consequences and so on.

 Most disk errors /are/ detectable, and are reported by the underlying
 hardware - small surface errors are corrected by the disk's own error
 checking and correcting mechanisms, and larger errors are usually
 detected.  It is (or should be!) very rare that a read error goes
 undetected without there being a major problem with the disk controller.
  And if the error is detected, then the normal raid processing kicks in
 as there is no doubt about which block has problems.

That's clear. That case is an erasure (I think)
and it is perfectly in line with the usual operation.
I'm not trying to replace this mechanism.
 
 If you can be /sure/ about which data block is incorrect, then I agree -
 but you can't be /entirely/ sure.  But I agree that you can make a good
 enough guess to recommend a fix to the user - as long as it is not
 automatic.

One typical case is when many errors are
found, belonging to the same disk.
This case clearly shows the disk is to be
replaced or the interface checked...
But, again, the user is the master, not the
machine... :-)
 
 For most ECC schemes, you know that all your blocks are set
 synchronously - so any block that does not fit in, is an error.  With
 raid, it could also be that a stripe is only partly written - you can

Could it be?
I would consider this an error.
The stripe must always be consistent, there
should be a transactional mechanism to make
sure that, if read back, the data is always
matching the parity.
When I write read back I mean from whatever
the data is: physical disk or cache.
Otherwise, the check must run exclusively on
the array (no mounted FS, no other things
running on it).

 have two different valid sets of data mixed to give an inconsistent
 stripe, without any good way of telling what consistent data is the best
 choice.
  
 Perhaps a checking tool can take advantage of a write-intent bitmap (if
 there is one) so that it knows if an inconsistent stripe is partly
 updated or the result of a disk error.

Of course, this is an option, which should be
taken into consideration.

Any improvement idea is welcome!!!

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-21 Thread Piergiorgio Sartor
On Thu, Nov 21, 2013 at 11:13:29AM +0100, David Brown wrote:
[...]
 Ah, you are trying to find which disk has incorrect data so that you can
 change just that one disk?  There are dangers with that...

Hi David,

 http://neil.brown.name/blog/20100211050355

I think we already did the exercise, here :-)

 If you disagree with this blog post (and I urge you to read it in full

We discussed the topic (with Neil) and, if I
recall correctly, he is agaist having an
_automatic_ error detectio and correction _in_
kernel.
I fully agree with that: user space is better
and it should not be automatic, but it should
do things under user control.

The current check operetion is pretty poor.
It just reports how many mismatches, it does
not even report where in the array.
The first step, independent from how many
parities one has, would be to tell the user
where the mismatches occurred, so it would
be possible to check the FS at that position.
Having a multi parity RAID allows to check
even which disk.
This would provide the user with a more
comprehensive (I forgot the spelling)
information.

Of course, since we are there, we can
also give the option to fix it.
This would be much likely a fsck.

 first), then this is how I would do a smart stripe recovery:
 
 First calculate the parities from the data blocks, and compare these
 with the existing parity blocks.
 
 If they all match, the stripe is consistent.
 
 Normal (detectable) disk errors and unrecoverable read errors get
 flagged by the disk and the IO system, and you /know/ there is a problem
 with that block.  Whether it is a data block or a parity block, you
 re-generate the correct data and store it - that's what your raid is for.

That's not always the case, otherwise
having the mismatch count would be useless.
The issue is that errors appear, whatever
the reason, without being reported by the
underlying hardware.
 
 If you have no detected read errors, and there is one parity
 inconsistency, then /probably/ that block has had an undetected read
 error, or it simply has not been written completely before a crash.
 Either way, just re-write the correct parity.

Why re-write the parity if I can get
the correct data there?
If can be sure that one data block is
incorrect and I can re-create properly,
that's the thing to do.
 
 Remember, this is not a general error detection and correction scheme -

It is not, but it could be. For free.

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-21 Thread Piergiorgio Sartor
On Wed, Nov 20, 2013 at 07:28:37PM -0600, Stan Hoeppner wrote:
[...]
 It's always perilous to follow a Ph.D., so I guess I'm feeling suicidal
 today. ;)
 
 I'm not attempting to marginalize Andrea's work here, but I can't help
 but ponder what the real value of triple parity RAID is, or quad, or
 beyond.  Some time ago parity RAID's primary mission ceased to be

Hi Stan,

my opinio is that you have to think
in terms of storage devices which are
not always available.
Those are not simply directly connected
HDDs, it could be more exotic.
The example I consider is a p2p network
storage, where the nodes are very little
reliable.
I guess that could be more.

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-20 Thread Piergiorgio Sartor
On Wed, Nov 20, 2013 at 11:44:39AM +0100, David Brown wrote:
[...]
  In RAID-6 (as per raid6check) there is an easy way
  to verify where an HDD has incorrect data.
  
 
 I think the way to do that is just to generate the parity blocks from
 the data blocks, and compare them to the existing parity blocks.

Uhm, the generic RS decoder should try all
the possible combination of erasure and so
detect the error.
This is unfeasible already with 3 parities,
so there are faster algorithms, I believe:

Peterson–Gorenstein–Zierler algorithm
Berlekamp–Massey algorithm

Nevertheless, I do not know too much about
those, so I cannot state if they apply to
the Cauchy matrix as explained here.

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-20 Thread Piergiorgio Sartor
On Mon, Nov 18, 2013 at 11:08:59PM +0100, Andrea Mazzoleni wrote:
[...]

I've a side question, a bit OT, but maybe you
could help with the answer.

How about par2? How does this work?
They claim Vendermonde matrix and they seem
to be quite flexible in amount of parities.
The could be in GF(2^16), so maybe this makes
a difference...

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Triple parity and beyond

2013-11-19 Thread Piergiorgio Sartor
On Mon, Nov 18, 2013 at 11:08:59PM +0100, Andrea Mazzoleni wrote:
 Hi,
 
 I want to report that I recently implemented a support for
 arbitrary number of parities that could be useful also for Linux
 RAID and Btrfs, both currently limited to double parity.
 
 In short, to generate the parity I use a Cauchy matrix specifically
 built to be compatible with the existing Linux parity computation,
 and extensible to an arbitrary number of parities. This without
 limitations on the number of data disks.
 
 The Cauchy matrix for six parities is:
 
 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01...
 01 02 04 08 10 20 40 80 1d 3a 74 e8 cd 87 13 26 4c 98 2d 5a b4 75...
 01 f5 d2 c4 9a 71 f1 7f fc 87 c1 c6 19 2f 40 55 3d ba 53 04 9c 61...
 01 bb a6 d7 c7 07 ce 82 4a 2f a5 9b b6 60 f1 ad e7 f4 06 d2 df 2e...
 01 97 7f 9c 7c 18 bd a2 58 1a da 74 70 a3 e5 47 29 07 f5 80 23 e9...
 01 2b 3f cf 73 2c d6 ed cb 74 15 78 8a c1 17 c9 89 68 21 ab 76 3b...
 
 You can easily recognize the first row as RAID5 based on a simple
 XOR, and the second row as RAID6 based on multiplications by powers
 of 2. The other rows are for additional parity levels and they
 require multiplications by arbitrary values that can be implemented
 using the PSHUFB instruction.
 
 The performance of triple parity with PSHUFB is comparable at an
 alternate triple parity implementation with the third row of
 coefficients set as powers of 2^-1. This alternate implementation is
 likely the fastest possible for CPUs without PSHUFB or similar
 instruction, but it has the limitation of not supporting beyond triple
 parity.
 
 The Cauchy matrix is instead working for any number of parities and
 at the same time it's compatible with the existing first two parity
 levels. As far as I know, this is a kind of new result, never
 appeared in this list or somewhere else.
 
 You can see more details, performance results and fast
 implementations for up to six parity levels at:
 https://sourceforge.net/p/snapraid/code/ci/master/tree/raid.c
 
 This was developed as part of my hobby project SnapRAID,
 downloadable with full source at:
 http://snapraid.sourceforge.net/
 
 Please let me know if you are interested in a potential Linux
 integration. I can surely help on whatever is needed.
 
 For reference, past discussions about triple parity in the
 linux-raid list can be found at:
 
 http://thread.gmane.org/gmane.linux.raid/34195
 http://thread.gmane.org/gmane.linux.raid/37904

Hi Andrea,

great job, this was exactly what I was looking for.

Do you know if there is a fast way not to correct
errors, but to find them?

In RAID-6 (as per raid6check) there is an easy way
to verify where an HDD has incorrect data.

I suspect, for each 2 parity block it should be
possible to find 1 error (and if this is true, then
quad parity is more attractive than triple one).

Furthermore, my second (of first) target would
be something like: http://www.symform.com/blog/tag/raid-96/

Which uses 32 parities (out of 96 disks).

Keep going!!!

bye,

pg

 
 Ciao,
 Andrea
 --
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html