Re: raid5: two writing algorithms

2008-02-08 Thread Keld Jørn Simonsen
On Fri, Feb 08, 2008 at 12:51:39PM +1100, Neil Brown wrote:
 On Friday February 8, [EMAIL PROTECTED] wrote:
  On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote:
   On Thursday February 7, [EMAIL PROTECTED] wrote:
  
So I hereby give the idea for inspiration to kernel hackers.
   
   and I hereby invite you to read the code ;-)
  
  I did some reading.  Is there somewhere a description of it, especially
  the raid code, or are the comments and the code the best documentation?
 
 No.  If a description was written (and various people have tried to
 describe various parts) it would be out of date within a few months :-(

OK, I was under the impression that some of the code did not change
much. Eg. you said that there had not been any work on optimizing
raid10 for performance since the 2.6.12 kernel I was using. And then at
least the raid5 code, the last copyright notice right in the top is 
Copyright (C) 2002, 2003 H. Peter Anvin. That is 5 years ago. 
And your name is not on it. So I did not look that much into that code,
thinking nothing had been done there for ages. Maybe you could add your
name on it, that would only be fair. The same comment goes for other
modules (for which it is relevant).

 Look for READ_MODIFY_WRITE and RECONSTRUCT_WRITE  no.  That
 only applied to raid6 code now..
 Look instead for the 'rcw' and 'rmw' counters, and then at
 'handle_write_operations5'  which does different things based on the
 'rcw' variable.
 
 It used to be a lot clearer before we implemented xor-offload.  The
 xor-offload stuff is good, but it does make the code more complex.

OK, I think it is fairly well documented here, I can at least follow the
logic, and then I think it is a good approach to have the flow
description/strategy included directly in the code. Given there are many
changes to the code, different files for code and description could
easily mix up the alignment of code and documentation badly.

 
 
  
  Do you say that this is already implemented?
 
 Yes.

That is very good!


Do you konw if other implementations of this, eg. commercial controller
code, have this facility? If not, we could list this as an advantage of 
linux raid. Anyway it would be implicit in performance documentation.
I do plan to write up something on performance, soonish. The howto
is hopelessly outdated.

IMHO such code should make the performance of raid5 random writes not
that bad. Better than the reputation that raid5 is hopelessly slow for
database writing. I think raid5 would be less than double as slow as
raid1 for random writing.

  Well, I do have a hack in mind, on the raid10,f2.
  I need to investigate some more, and possibly test out
  what really happens. But maybe the code already does what I want it to.
  You are possibly the one that knows the code best, so maybe you can tell
  me if raid10,f2 always does its reading in the first part of the disks?
 
 Yes, I know the code best.
 
 No, raid10,f2 doesn't always use the first part of the disk.  Getting
 it to do that would be a fairly small change in 'read_balance' in
 md/raid10.c.
 
 I'm not at all convinced that the read balancing code in raid10 (or
 raid1) really does the best thing.  So any improvements - backed up
 with broad testing - would be most welcome.

I think I know where to do my proposed changes, and how it could be done.
So maybe in a not too distant future I will have done my first kernel
hack!

Best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: two writing algorithms

2008-02-07 Thread Neil Brown
On Thursday February 7, [EMAIL PROTECTED] wrote:
 As I understand it, there are 2 valid algoritms for writing in raid5.
 
 1. calculate the parity data by XOR'ing all data of the relevant data
 chunks.
 
 2. calculate the parity data by kind of XOR-subtracting the old data to
 be changed, and then XOR-adding the new data. (XOR-subtract and XOR-add
 is actually the same).
 
 There are situations where method 1 is the fastest, and situations where
 method 2 is the fastest.
 
 My idea is then that the raid5 code in the kernel can calculate which
 method is the faster. 
 
 method 1 is faster, if all data is already available. I understand that
 this method is employed in the current kernel. This would eg be the case
 with sequential writes.
 
 Method 2 is faster, if no data is available in core. It would require
 2 reads and two writes, which always will be faster than n reads and 1
 write, possibly except for n=2. method 2 is thus faster normally for
 random writes.
 
 I think that method 2 is not used in the kernel today. Mayby I am wrong,
 but I did have a look in the kernel code.

It is very odd that you would think something about the behaviour of
the kernel with actually having looked.

It also seems a little arrogant to have a clever idea and assume that
no one else has thought of it before.

 
 So I hereby give the idea for inspiration to kernel hackers.

and I hereby invite you to read the code ;-)

Code reading is a good first step to being a
 
 Yoyr kernel hacker wannabe
   ^

NeilBrown


 keld
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid5: two writing algorithms

2008-02-07 Thread Keld Jørn Simonsen
As I understand it, there are 2 valid algoritms for writing in raid5.

1. calculate the parity data by XOR'ing all data of the relevant data
chunks.

2. calculate the parity data by kind of XOR-subtracting the old data to
be changed, and then XOR-adding the new data. (XOR-subtract and XOR-add
is actually the same).

There are situations where method 1 is the fastest, and situations where
method 2 is the fastest.

My idea is then that the raid5 code in the kernel can calculate which
method is the faster. 

method 1 is faster, if all data is already available. I understand that
this method is employed in the current kernel. This would eg be the case
with sequential writes.

Method 2 is faster, if no data is available in core. It would require
2 reads and two writes, which always will be faster than n reads and 1
write, possibly except for n=2. method 2 is thus faster normally for
random writes.

I think that method 2 is not used in the kernel today. Mayby I am wrong,
but I did have a look in the kernel code.

So I hereby give the idea for inspiration to kernel hackers.

Yoyr kernel hacker wannabe
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: two writing algorithms

2008-02-07 Thread Keld Jørn Simonsen
On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote:
 On Thursday February 7, [EMAIL PROTECTED] wrote:
  As I understand it, there are 2 valid algoritms for writing in raid5.
  
  1. calculate the parity data by XOR'ing all data of the relevant data
  chunks.
  
  2. calculate the parity data by kind of XOR-subtracting the old data to
  be changed, and then XOR-adding the new data. (XOR-subtract and XOR-add
  is actually the same).
  
  There are situations where method 1 is the fastest, and situations where
  method 2 is the fastest.
  
  My idea is then that the raid5 code in the kernel can calculate which
  method is the faster. 
  
  method 1 is faster, if all data is already available. I understand that
  this method is employed in the current kernel. This would eg be the case
  with sequential writes.
  
  Method 2 is faster, if no data is available in core. It would require
  2 reads and two writes, which always will be faster than n reads and 1
  write, possibly except for n=2. method 2 is thus faster normally for
  random writes.
  
  I think that method 2 is not used in the kernel today. Mayby I am wrong,
  but I did have a look in the kernel code.
 
 It is very odd that you would think something about the behaviour of
 the kernel with actually having looked.
 
 It also seems a little arrogant to have a clever idea and assume that
 no one else has thought of it before.

Oh well, I have to admit that I do not understand the code fully.
I am not a seasoned kernel hacker, as I also indicated in my ad hoc
signature.

  So I hereby give the idea for inspiration to kernel hackers.
 
 and I hereby invite you to read the code ;-)

I did some reading.  Is there somewhere a description of it, especially
the raid code, or are the comments and the code the best documentation?

Do you say that this is already implemented?

I am sorry if you think I am mailing too much on the list.
But I happen to think it is fun.
And I do try to give something back.

 Code reading is a good first step to being a
  
  Yoyr kernel hacker wannabe
^
 
 NeilBrown

Well, I do have a hack in mind, on the raid10,f2.
I need to investigate some more, and possibly test out
what really happens. But maybe the code already does what I want it to.
You are possibly the one that knows the code best, so maybe you can tell
me if raid10,f2 always does its reading in the first part of the disks?

best regards
keld
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5: two writing algorithms

2008-02-07 Thread Neil Brown
On Friday February 8, [EMAIL PROTECTED] wrote:
 On Fri, Feb 08, 2008 at 07:25:31AM +1100, Neil Brown wrote:
  On Thursday February 7, [EMAIL PROTECTED] wrote:
 
   So I hereby give the idea for inspiration to kernel hackers.
  
  and I hereby invite you to read the code ;-)
 
 I did some reading.  Is there somewhere a description of it, especially
 the raid code, or are the comments and the code the best documentation?

No.  If a description was written (and various people have tried to
describe various parts) it would be out of date within a few months :-(

Look for READ_MODIFY_WRITE and RECONSTRUCT_WRITE  no.  That
only applied to raid6 code now..
Look instead for the 'rcw' and 'rmw' counters, and then at
'handle_write_operations5'  which does different things based on the
'rcw' variable.

It used to be a lot clearer before we implemented xor-offload.  The
xor-offload stuff is good, but it does make the code more complex.


 
 Do you say that this is already implemented?

Yes.

 
 I am sorry if you think I am mailing too much on the list.

You aren't.

 But I happen to think it is fun.

Good.

 And I do try to give something back.

We'll look forward to that.

 
  Code reading is a good first step to being a
   
   Yoyr kernel hacker wannabe
 ^
  
  NeilBrown
 
 Well, I do have a hack in mind, on the raid10,f2.
 I need to investigate some more, and possibly test out
 what really happens. But maybe the code already does what I want it to.
 You are possibly the one that knows the code best, so maybe you can tell
 me if raid10,f2 always does its reading in the first part of the disks?

Yes, I know the code best.

No, raid10,f2 doesn't always use the first part of the disk.  Getting
it to do that would be a fairly small change in 'read_balance' in
md/raid10.c.

I'm not at all convinced that the read balancing code in raid10 (or
raid1) really does the best thing.  So any improvements - backed up
with broad testing - would be most welcome.

NeilBrown
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html