Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Michael Tokarev
dean gaudet wrote:
[]
 if this is for a database or fs requiring lots of small writes then 
 raid5/6 are generally a mistake... raid10 is the only way to get 
 performance.  (hw raid5/6 with nvram support can help a bit in this area, 
 but you just can't beat raid10 if you need lots of writes/s.)

A small nitpick.

At least some databases never do small-sized I/O, at least not against
the datafiles.  That is, for example, Oracle uses a fixed-size I/O block
size, specified at database (or tablespace) creation time, -- by default
it's 4Kb or 8Kb, but may be 16Kb or 32Kb as well.  Now, if you'll make your
raid array stripe size to match the blocksize of a database, *and* ensure
the files are aligned on disk properly, it will just work without needless
reads to calculate parity blocks during writes.

But the problem with that is it's near impossible to do.

First, even if the db writes in 32Kb blocks, it means the stripe size should
be 32Kb, which is only suitable for raid5 with 3 disks, having chunk size of
16Kb, or with 5 disks, chunk size 8Kb (this last variant is quite bad, because
chunk size of 8Kb is too small).  In other words, only very limited set of
configurations will be more-or-less good.

And second, most filesystems used for databases don't care about correct
file placement.  For example, ext[23]fs with maximum blocksize of 4Kb will
align files by 4Kb, not by stripe size - which means that a whole 32Kb block
will be laid like - first 4Kb on first stripe, rest 24Kb on the next stripe,
which means that for both parts full read-write cycle will be needed again
to update parity blocks - the thing we tried to avoid by choosing the sizes
in a previous step.  Only xfs so far (from the list of filesystems I've
checked) pays attention to stripe size and tries to ensure files are aligned
to stripe size.  (Yes I know mke2fs's stride=xxx parameter, but it only
affects metadata, not data).

That's why all the above is a small nitpick - i.e., in theory, it IS possible
to use raid5 for database workload in certain cases, but due to all the gory
details, it's nearly impossible to do right.

/mjt
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Bill Davidsen

Robin Bowes wrote:

Bill Davidsen wrote:
  

There have been several recent threads on the list regarding software
RAID-5 performance. The reference might be updated to reflect the poor
write performance of RAID-5 until/unless significant tuning is done.
Read that as tuning obscure parameters and throwing a lot of memory into
stripe cache. The reasons for hardware RAID should include performance
of RAID-5 writes is usually much better than software RAID-5 with
default tuning.



Could you point me at a source of documentation describing how to
perform such tuning?
  
No. There has been a lot of discussion of this topic on this list, and a 
trip through the archives of the last 60 days or so will let you pull 
out a number of tuning tips which allow very good performance. My 
concern was writing large blocks of data, 1MB per write, to RAID-5, and 
didn't involve the overhead of small blocks at all, that leads through 
other code and behavior.


I suppose while it's fresh in my mind I should write a script to rerun 
the whole write test suite and generate some graphs, lists of 
parameters, etc. If you are writing a LOT of data, you may find that 
tuning the dirty_* parameters will result in better system response, 
perhaps at the cost of some small total write throughput, although I 
didn't notice anything significant when I tried them.

Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
SATA card configured as a single RAID6 array (~3TB available space)
  

No hot spare(s)?

--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Robin Bowes
Bill Davidsen wrote:
 Robin Bowes wrote:
 Bill Davidsen wrote:
  
 There have been several recent threads on the list regarding software
 RAID-5 performance. The reference might be updated to reflect the poor
 write performance of RAID-5 until/unless significant tuning is done.
 Read that as tuning obscure parameters and throwing a lot of memory into
 stripe cache. The reasons for hardware RAID should include performance
 of RAID-5 writes is usually much better than software RAID-5 with
 default tuning.
 

 Could you point me at a source of documentation describing how to
 perform such tuning?
   
 No. There has been a lot of discussion of this topic on this list, and a
 trip through the archives of the last 60 days or so will let you pull
 out a number of tuning tips which allow very good performance. My
 concern was writing large blocks of data, 1MB per write, to RAID-5, and
 didn't involve the overhead of small blocks at all, that leads through
 other code and behavior.

Actually Bill, I'm running RAID6 (my mistake for not mentioning it
explicitly before) - I found some material relating to RAID5 but nothing
on RAID6.

Are the concepts similar, or is RAID6 a different beast altogether?

 Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
 SATA card configured as a single RAID6 array (~3TB available space)
   
 No hot spare(s)?

I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
where a drive has failed in a RAID5+1 array and a second has failed
during the rebuild after the hot-spare had kicked in.

R.
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Bill Davidsen

Robin Bowes wrote:

Bill Davidsen wrote:
  

Robin Bowes wrote:


Bill Davidsen wrote:
 
  

There have been several recent threads on the list regarding software
RAID-5 performance. The reference might be updated to reflect the poor
write performance of RAID-5 until/unless significant tuning is done.
Read that as tuning obscure parameters and throwing a lot of memory into
stripe cache. The reasons for hardware RAID should include performance
of RAID-5 writes is usually much better than software RAID-5 with
default tuning.



Could you point me at a source of documentation describing how to
perform such tuning?
  
  

No. There has been a lot of discussion of this topic on this list, and a
trip through the archives of the last 60 days or so will let you pull
out a number of tuning tips which allow very good performance. My
concern was writing large blocks of data, 1MB per write, to RAID-5, and
didn't involve the overhead of small blocks at all, that leads through
other code and behavior.



Actually Bill, I'm running RAID6 (my mistake for not mentioning it
explicitly before) - I found some material relating to RAID5 but nothing
on RAID6.

Are the concepts similar, or is RAID6 a different beast altogether?
  
You mentioned that before, and I think the concepts covered in the 
RAID-5 discussion apply to RAID-6 as well. I don't have enough unused 
drives to really test anything beyond RAID-5, so I have no particular 
tuning information to share. Testing on system drives introduces too 
much jitter to trust the results.
  

Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
SATA card configured as a single RAID6 array (~3TB available space)
  
  

No hot spare(s)?



I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
where a drive has failed in a RAID5+1 array and a second has failed
during the rebuild after the hot-spare had kicked in.

--

bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, Robin Bowes wrote:

 I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
 where a drive has failed in a RAID5+1 array and a second has failed
 during the rebuild after the hot-spare had kicked in.

if the failures were read errors without losing the entire disk (the 
typical case) then new kernels are much better -- on read error md will 
reconstruct the sectors from the other disks and attempt to write it back.

you can also run monthly checks...

echo check /sys/block/mdX/md/sync_action

it'll read the entire array (parity included) and correct read errors as 
they're discovered.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Gordon Henderson

On Mon, 15 Jan 2007, dean gaudet wrote:


you can also run monthly checks...

echo check /sys/block/mdX/md/sync_action

it'll read the entire array (parity included) and correct read errors as
they're discovered.


A-Ha ... I've not been keeping up with the list for a bit - what's the 
minimum kernel version for this to work?


Cheers,

Gordon
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread berk walker


dean gaudet wrote:

On Mon, 15 Jan 2007, Robin Bowes wrote:

  

I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
where a drive has failed in a RAID5+1 array and a second has failed
during the rebuild after the hot-spare had kicked in.



if the failures were read errors without losing the entire disk (the 
typical case) then new kernels are much better -- on read error md will 
reconstruct the sectors from the other disks and attempt to write it back.


you can also run monthly checks...

echo check /sys/block/mdX/md/sync_action

it'll read the entire array (parity included) and correct read errors as 
they're discovered.


-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  


Could I get a pointer as to how I can do this check in my FC5 [BLAG] 
system?  I can find no appropriate check, nor md available to me.  
It would be a good thing if I were able to find potentially weak 
spots, rewrite them to good, and know that it might be time for a new drive.


All of my arrays have drives of approx the same mfg date, so the 
possibility of more than one showing bad at the same time can not be 
ignored.


thanks
b-

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, berk walker wrote:

 dean gaudet wrote:
  echo check /sys/block/mdX/md/sync_action
  
  it'll read the entire array (parity included) and correct read errors as
  they're discovered.

 
 Could I get a pointer as to how I can do this check in my FC5 [BLAG] system?
 I can find no appropriate check, nor md available to me.  It would be a
 good thing if I were able to find potentially weak spots, rewrite them to
 good, and know that it might be time for a new drive.
 
 All of my arrays have drives of approx the same mfg date, so the possibility
 of more than one showing bad at the same time can not be ignored.

it should just be:

echo check /sys/block/mdX/md/sync_action

if you don't have a /sys/block/mdX/md/sync_action file then your kernel is 
too old... or you don't have /sys mounted... (or you didn't replace X with 
the raid number :)

iirc there were kernel versions which had the sync_action file but didn't 
yet support the check action (i think possibly even as recent as 2.6.17 
had a small bug initiating one of the sync_actions but i forget which 
one).  if you can upgrade to 2.6.18.x it should work.

debian unstable (and i presume etch) will do this for all your arrays 
automatically once a month.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread Mr. James W. Laferriere

Hello Dean ,

On Mon, 15 Jan 2007, dean gaudet wrote:
...snip...

it should just be:

echo check /sys/block/mdX/md/sync_action

if you don't have a /sys/block/mdX/md/sync_action file then your kernel is
too old... or you don't have /sys mounted... (or you didn't replace X with
the raid number :)

iirc there were kernel versions which had the sync_action file but didn't
yet support the check action (i think possibly even as recent as 2.6.17
had a small bug initiating one of the sync_actions but i forget which
one).  if you can upgrade to 2.6.18.x it should work.

debian unstable (and i presume etch) will do this for all your arrays
automatically once a month.

-dean


	Being able to run a 'check' is a good thing (tm) .  But without a method 
to acquire statii  data back from the check ,  Seems rather bland .  Is there a 
tool/file to poll/... where data  statii can be acquired ?

Tia ,  JimL

--
+-+
| James   W.   Laferriere | System   Techniques | Give me VMS |
| NetworkEngineer | 663  Beaumont  Blvd |  Give me Linux  |
| [EMAIL PROTECTED] | Pacifica, CA. 94044 |   only  on  AXP |
+-+
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-15 Thread dean gaudet
On Mon, 15 Jan 2007, Mr. James W. Laferriere wrote:

   Hello Dean ,
 
 On Mon, 15 Jan 2007, dean gaudet wrote:
 ...snip...
  it should just be:
  
  echo check /sys/block/mdX/md/sync_action
  
  if you don't have a /sys/block/mdX/md/sync_action file then your kernel is
  too old... or you don't have /sys mounted... (or you didn't replace X with
  the raid number :)
  
  iirc there were kernel versions which had the sync_action file but didn't
  yet support the check action (i think possibly even as recent as 2.6.17
  had a small bug initiating one of the sync_actions but i forget which
  one).  if you can upgrade to 2.6.18.x it should work.
  
  debian unstable (and i presume etch) will do this for all your arrays
  automatically once a month.
  
  -dean
 
   Being able to run a 'check' is a good thing (tm) .  But without a
 method to acquire statii  data back from the check ,  Seems rather bland .
 Is there a tool/file to poll/... where data  statii can be acquired ?

i'm not 100% certain what you mean, but i generally just monitor dmesg for 
the md read error message (mind you the message pre-2.6.19 or .20 isn't 
very informative but it's obvious enough).

there is also a file mismatch_cnt in the same directory as sync_action ... 
the Documentation/md.txt (in 2.6.18) refers to it incorrectly as 
mismatch_count... but anyhow why don't i just repaste the relevant portion 
of md.txt.

-dean

...

Active md devices for levels that support data redundancy (1,4,5,6)
also have

   sync_action
 a text file that can be used to monitor and control the rebuild
 process.  It contains one word which can be one of:
   resync- redundancy is being recalculated after unclean
   shutdown or creation
   recover   - a hot spare is being built to replace a
   failed/missing device
   idle  - nothing is happening
   check - A full check of redundancy was requested and is
   happening.  This reads all block and checks
   them. A repair may also happen for some raid
   levels.
   repair- A full check and repair is happening.  This is
   similar to 'resync', but was requested by the
   user, and the write-intent bitmap is NOT used to
   optimise the process.

  This file is writable, and each of the strings that could be
  read are meaningful for writing.

   'idle' will stop an active resync/recovery etc.  There is no
   guarantee that another resync/recovery may not be automatically
   started again, though some event will be needed to trigger
   this.
'resync' or 'recovery' can be used to restart the
   corresponding operation if it was stopped with 'idle'.
'check' and 'repair' will start the appropriate process
   providing the current state is 'idle'.

   mismatch_count
  When performing 'check' and 'repair', and possibly when
  performing 'resync', md will count the number of errors that are
  found.  The count in 'mismatch_cnt' is the number of sectors
  that were re-written, or (for 'check') would have been
  re-written.  As most raid levels work in units of pages rather
  than sectors, this my be larger than the number of actual errors
  by a factor of the number of sectors in a page.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-13 Thread Dan Williams

On 1/12/07, James Ralston [EMAIL PROTECTED] wrote:

On 2007-01-12 at 09:39-08 dean gaudet [EMAIL PROTECTED] wrote:

 On Thu, 11 Jan 2007, James Ralston wrote:

  I'm having a discussion with a coworker concerning the cost of
  md's raid5 implementation versus hardware raid5 implementations.
 
  Specifically, he states:
 
   The performance [of raid5 in hardware] is so much better with
   the write-back caching on the card and the offload of the
   parity, it seems to me that the minor increase in work of having
   to upgrade the firmware if there's a buggy one is a highly
   acceptable trade-off to the increased performance.  The md
   driver still commits you to longer run queues since IO calls to
   disk, parity calculator and the subsequent kflushd operations
   are non-interruptible in the CPU.  A RAID card with write-back
   cache releases the IO operation virtually instantaneously.
 
  It would seem that his comments have merit, as there appears to be
  work underway to move stripe operations outside of the spinlock:
 
  http://lwn.net/Articles/184102/
 
  What I'm curious about is this: for real-world situations, how
  much does this matter?  In other words, how hard do you have to
  push md raid5 before doing dedicated hardware raid5 becomes a real
  win?

 hardware with battery backed write cache is going to beat the
 software at small write traffic latency essentially all the time but
 it's got nothing to do with the parity computation.

I'm not convinced that's true.

No, it's true.  md implements a write-through cache to ensure that
data reaches the disk.


What my coworker is arguing is that md
raid5 code spinlocks while it is performing this sequence of
operations:

1.  executing the write

not performed under the lock

2.  reading the blocks necessary for recalculating the parity

not performed under the lock

3.  recalculating the parity
4.  updating the parity block

My [admittedly cursory] read of the code, coupled with the link above,
leads me to believe that my coworker is correct, which is why I was
for trolling for [informed] opinions about how much of a performance
hit the spinlock causes.


The spinlock is not a source of performance loss, the reason for
moving parity calculations outside the lock is to maximize the benefit
of using asynchronous xor+copy engines.

The hardware vs software raid trade-offs are well documented here:
http://linux.yyz.us/why-software-raid.html

Regards,
Dan
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-13 Thread Bill Davidsen

Dan Williams wrote:

On 1/12/07, James Ralston [EMAIL PROTECTED] wrote:

On 2007-01-12 at 09:39-08 dean gaudet [EMAIL PROTECTED] wrote:

 On Thu, 11 Jan 2007, James Ralston wrote:

  I'm having a discussion with a coworker concerning the cost of
  md's raid5 implementation versus hardware raid5 implementations.
 
  Specifically, he states:
 
   The performance [of raid5 in hardware] is so much better with
   the write-back caching on the card and the offload of the
   parity, it seems to me that the minor increase in work of having
   to upgrade the firmware if there's a buggy one is a highly
   acceptable trade-off to the increased performance.  The md
   driver still commits you to longer run queues since IO calls to
   disk, parity calculator and the subsequent kflushd operations
   are non-interruptible in the CPU.  A RAID card with write-back
   cache releases the IO operation virtually instantaneously.
 
  It would seem that his comments have merit, as there appears to be
  work underway to move stripe operations outside of the spinlock:
 
  http://lwn.net/Articles/184102/
 
  What I'm curious about is this: for real-world situations, how
  much does this matter?  In other words, how hard do you have to
  push md raid5 before doing dedicated hardware raid5 becomes a real
  win?

 hardware with battery backed write cache is going to beat the
 software at small write traffic latency essentially all the time but
 it's got nothing to do with the parity computation.

I'm not convinced that's true.

No, it's true.  md implements a write-through cache to ensure that
data reaches the disk.


What my coworker is arguing is that md
raid5 code spinlocks while it is performing this sequence of
operations:

1.  executing the write

not performed under the lock

2.  reading the blocks necessary for recalculating the parity

not performed under the lock

3.  recalculating the parity
4.  updating the parity block

My [admittedly cursory] read of the code, coupled with the link above,
leads me to believe that my coworker is correct, which is why I was
for trolling for [informed] opinions about how much of a performance
hit the spinlock causes.


The spinlock is not a source of performance loss, the reason for
moving parity calculations outside the lock is to maximize the benefit
of using asynchronous xor+copy engines.

The hardware vs software raid trade-offs are well documented here:
http://linux.yyz.us/why-software-raid.html 


There have been several recent threads on the list regarding software 
RAID-5 performance. The reference might be updated to reflect the poor 
write performance of RAID-5 until/unless significant tuning is done. 
Read that as tuning obscure parameters and throwing a lot of memory into 
stripe cache. The reasons for hardware RAID should include performance 
of RAID-5 writes is usually much better than software RAID-5 with 
default tuning.


--
bill davidsen [EMAIL PROTECTED]
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-13 Thread Robin Bowes
Bill Davidsen wrote:

 There have been several recent threads on the list regarding software
 RAID-5 performance. The reference might be updated to reflect the poor
 write performance of RAID-5 until/unless significant tuning is done.
 Read that as tuning obscure parameters and throwing a lot of memory into
 stripe cache. The reasons for hardware RAID should include performance
 of RAID-5 writes is usually much better than software RAID-5 with
 default tuning.

Could you point me at a source of documentation describing how to
perform such tuning?

Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
SATA card configured as a single RAID6 array (~3TB available space)

Thanks,

R.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-13 Thread dean gaudet
On Sat, 13 Jan 2007, Robin Bowes wrote:

 Bill Davidsen wrote:
 
  There have been several recent threads on the list regarding software
  RAID-5 performance. The reference might be updated to reflect the poor
  write performance of RAID-5 until/unless significant tuning is done.
  Read that as tuning obscure parameters and throwing a lot of memory into
  stripe cache. The reasons for hardware RAID should include performance
  of RAID-5 writes is usually much better than software RAID-5 with
  default tuning.
 
 Could you point me at a source of documentation describing how to
 perform such tuning?
 
 Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
 SATA card configured as a single RAID6 array (~3TB available space)

linux sw raid6 small write performance is bad because it reads the entire 
stripe, merges the small write, and writes back the changed disks.  
unlike raid5 where a small write can get away with a partial stripe read 
(i.e. the smallest raid5 write will read the target disk, read the parity, 
write the target, and write the updated parity)... afaik this optimization 
hasn't been implemented in raid6 yet.

depending on your use model you might want to go with raid5+spare.  
benchmark if you're not sure.

for raid5/6 i always recommend experimenting with moving your fs journal 
to a raid1 device instead (on separate spindles -- such as your root 
disks).

if this is for a database or fs requiring lots of small writes then 
raid5/6 are generally a mistake... raid10 is the only way to get 
performance.  (hw raid5/6 with nvram support can help a bit in this area, 
but you just can't beat raid10 if you need lots of writes/s.)

beyond those config choices you'll want to become friendly with /sys/block 
and all the myriad of subdirectories and options under there.

in particular:

/sys/block/*/queue/scheduler
/sys/block/*/queue/read_ahead_kb
/sys/block/*/queue/nr_requests
/sys/block/mdX/md/stripe_cache_size

for * = any of the component disks or the mdX itself...

some systems have an /etc/sysfs.conf you can place these settings in to 
have them take effect on reboot.  (sysfsutils package on debuntu)

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-12 Thread dean gaudet
On Thu, 11 Jan 2007, James Ralston wrote:

 I'm having a discussion with a coworker concerning the cost of md's
 raid5 implementation versus hardware raid5 implementations.
 
 Specifically, he states:
 
  The performance [of raid5 in hardware] is so much better with the
  write-back caching on the card and the offload of the parity, it
  seems to me that the minor increase in work of having to upgrade the
  firmware if there's a buggy one is a highly acceptable trade-off to
  the increased performance.  The md driver still commits you to
  longer run queues since IO calls to disk, parity calculator and the
  subsequent kflushd operations are non-interruptible in the CPU.  A
  RAID card with write-back cache releases the IO operation virtually
  instantaneously.
 
 It would seem that his comments have merit, as there appears to be
 work underway to move stripe operations outside of the spinlock:
 
 http://lwn.net/Articles/184102/
 
 What I'm curious about is this: for real-world situations, how much
 does this matter?  In other words, how hard do you have to push md
 raid5 before doing dedicated hardware raid5 becomes a real win?

hardware with battery backed write cache is going to beat the software at 
small write traffic latency essentially all the time but it's got nothing 
to do with the parity computation.

-dean
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 software vs hardware: parity calculations?

2007-01-12 Thread James Ralston
On 2007-01-12 at 09:39-08 dean gaudet [EMAIL PROTECTED] wrote:

 On Thu, 11 Jan 2007, James Ralston wrote:
 
  I'm having a discussion with a coworker concerning the cost of
  md's raid5 implementation versus hardware raid5 implementations.
  
  Specifically, he states:
  
   The performance [of raid5 in hardware] is so much better with
   the write-back caching on the card and the offload of the
   parity, it seems to me that the minor increase in work of having
   to upgrade the firmware if there's a buggy one is a highly
   acceptable trade-off to the increased performance.  The md
   driver still commits you to longer run queues since IO calls to
   disk, parity calculator and the subsequent kflushd operations
   are non-interruptible in the CPU.  A RAID card with write-back
   cache releases the IO operation virtually instantaneously.
  
  It would seem that his comments have merit, as there appears to be
  work underway to move stripe operations outside of the spinlock:
  
  http://lwn.net/Articles/184102/
  
  What I'm curious about is this: for real-world situations, how
  much does this matter?  In other words, how hard do you have to
  push md raid5 before doing dedicated hardware raid5 becomes a real
  win?
 
 hardware with battery backed write cache is going to beat the
 software at small write traffic latency essentially all the time but
 it's got nothing to do with the parity computation.

I'm not convinced that's true.  What my coworker is arguing is that md
raid5 code spinlocks while it is performing this sequence of
operations:

1.  executing the write
2.  reading the blocks necessary for recalculating the parity
3.  recalculating the parity
4.  updating the parity block

My [admittedly cursory] read of the code, coupled with the link above,
leads me to believe that my coworker is correct, which is why I was
for trolling for [informed] opinions about how much of a performance
hit the spinlock causes.

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html