Re: [vdr] mdadm software raid5 arrays?

2009-12-03 Thread Stephan Loescher
Simon Baxter linu...@nzbaxters.com writes:

 Anyway, I've bought 3x 1.5 TB SATA disks which I'd like to put into a
 software (mdadm) raid 5 array.

[...]

 But does anyone have any production VDR experience with mdadm - good or bad?

If you like good performance and simple recovery, then do not use
RAID5. Use RAID1 instead.

I use RAID5 only because I am too lazy to buy some new and larger disks
for my VDR at the moment :-)

I have had serious performance-problems with parallel recordings and
some Linux-background-jobs (like system-backup).
I solved it by raising the I/O-priority of vdr with ionice:

ionice -c2 -n0 vdr -w 120 -v $VIDEODIR -d -t /dev/tty5 -g /tmp ...

Stephan.

-- 
loesc...@gmx.de
http://www.loescher-online.de/
Try LEO: http://www.leo.org/

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-12-03 Thread Simon Baxter
But does anyone have any production VDR experience with mdadm - good or 
bad?


I've now tested and implemented RAID5 on my system.  The biggest CPU hit is 
still with the OSD or noad processes - below is a bunch of tests I ran and 
the top processes during the test:


1 recording to raid, watching another live
top - 07:52:18 up 1 day, 20:29,  3 users,  load average: 0.66, 0.54, 0.42
Tasks: 168 total,   2 running, 156 sleeping,  10 stopped,   0 zombie
Cpu(s): 20.2%us,  6.7%sy,  0.5%ni, 68.3%id,  1.5%wa,  0.5%hi,  2.4%si, 
0.0%st

Mem:   2059352k total,  2038636k used,20716k free,19724k buffers
Swap:  1903692k total,  292k used,  1903400k free,  1412672k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
2372 vdruser   20   0  542m  37m  18m S 37.1  1.9   1:09.50 xine
2330 vdruser   20   0  486m  26m 4616 S 10.6  1.3   0:21.68 vdr
3020 root  20   0  369m  24m  18m R  9.6  1.2 157:51.36 Xorg
3521 vdruser   20   0  923m 306m  25m S  1.0 15.3  50:59.10 mms
3723 root  15  -5 000 S  0.7  0.0  10:12.08 md0_raid5
2455 root  20   0 14880 1188  872 R  0.3  0.1   0:00.32 top
2593 root  15  -5 000 S  0.3  0.0   1:04.10 kdvb-ca-1:0
3168 vdruser   20   0  139m 5468 3984 S  0.3  0.3   6:03.34 fluxbox


2 recording to raid, watching another live
top - 07:55:09 up 1 day, 20:32,  3 users,  load average: 1.67, 1.00, 0.61
Tasks: 168 total,   1 running, 157 sleeping,  10 stopped,   0 zombie
Cpu(s): 21.8%us,  7.8%sy,  0.3%ni, 66.6%id,  0.7%wa,  0.7%hi,  2.1%si, 
0.0%st

Mem:   2059352k total,  203k used,20464k free,20732k buffers
Swap:  1903692k total,  292k used,  1903400k free,  1406060k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
2372 vdruser   20   0  542m  37m  18m S 39.4  1.9   2:17.04 xine
2330 vdruser   20   0  511m  32m 4616 S 13.9  1.6   0:45.66 vdr
3020 root  20   0  369m  24m  18m S  9.9  1.2 158:08.39 Xorg
3723 root  15  -5 000 S  1.3  0.0  10:14.20 md0_raid5
3521 vdruser   20   0  923m 306m  25m S  1.0 15.3  51:00.61 mms
2455 root  20   0 14880 1188  872 R  0.7  0.1   0:01.19 top
 271 root  20   0 000 S  0.3  0.0   0:09.16 pdflush


3 recording to raid, watching another live
top - 07:55:52 up 1 day, 20:32,  3 users,  load average: 1.69, 1.08, 0.65
Tasks: 168 total,   1 running, 157 sleeping,  10 stopped,   0 zombie
Cpu(s): 22.1%us,  7.2%sy,  0.5%ni, 68.1%id,  0.0%wa,  0.0%hi,  2.1%si, 
0.0%st

Mem:   2059352k total,  2040896k used,18456k free,20964k buffers
Swap:  1903692k total,  292k used,  1903400k free,  1401876k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
2372 vdruser   20   0  606m  37m  18m S 34.2  1.9   2:32.21 xine
2330 vdruser   20   0  536m  37m 4616 S 14.3  1.9   0:51.80 vdr
3020 root  20   0  369m  24m  18m S  9.6  1.2 158:12.50 Xorg
3723 root  15  -5 000 S  1.7  0.0  10:14.83 md0_raid5
2455 root  20   0 14880 1188  872 R  0.7  0.1   0:01.41 top
3168 vdruser   20   0  139m 5468 3984 S  0.7  0.3   6:04.13 fluxbox
3521 vdruser   20   0  923m 306m  25m S  0.7 15.3  51:00.98 mms


4 recording to raid, watching another live
top - 07:56:37 up 1 day, 20:33,  3 users,  load average: 1.89, 1.19, 0.71
Tasks: 168 total,   2 running, 156 sleeping,  10 stopped,   0 zombie
Cpu(s): 23.6%us,  8.0%sy,  0.3%ni, 66.3%id,  0.0%wa,  0.3%hi,  1.5%si, 
0.0%st

Mem:   2059352k total,  2042836k used,16516k free,21264k buffers
Swap:  1903692k total,  292k used,  1903400k free,  1393688k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
2372 vdruser   20   0  606m  38m  18m S 36.8  1.9   2:48.28 xine
2330 vdruser   20   0  583m  44m 4616 S 15.6  2.2   0:58.77 vdr
3020 root  20   0  369m  24m  18m S  9.0  1.2 158:16.78 Xorg
3723 root  15  -5 000 S  2.3  0.0  10:15.70 md0_raid5
3521 vdruser   20   0  923m 306m  25m S  1.0 15.3  51:01.39 mms
3168 vdruser   20   0  139m 5468 3984 S  0.7  0.3   6:04.30 fluxbox
2455 root  20   0 14880 1188  872 R  0.3  0.1   0:01.64 top
2593 root  15  -5 000 S  0.3  0.0   1:04.21 kdvb-ca-1:0
   1 root  20   0  4080  852  604 S  0.0  0.0   0:00.50 init


4 recording to raid, watching another live, OSD up
top - 07:57:19 up 1 day, 20:34,  3 users,  load average: 1.87, 1.28, 0.75
Tasks: 168 total,   1 running, 157 sleeping,  10 stopped,   0 zombie
Cpu(s): 32.6%us,  7.6%sy,  0.5%ni, 55.4%id,  1.1%wa,  0.2%hi,  2.6%si, 
0.0%st

Mem:   2059352k total,  2033996k used,25356k free,21516k buffers
Swap:  1903692k total,  292k used,  1903400k free,  1383424k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
2372 vdruser   20   0  603m  36m  15m S 57.9  1.8   3:05.55 xine
2330 vdruser   20   0  583m  45m 4616 S 12.9  2.2   1:05.39 vdr
3020 root  20   0  366m  20m  14m S  8.9  1.0 158:20.76 Xorg
3723 root  15  -5 000 S  2.3  0.0  10:16.61 md0_raid5

Re: [vdr] mdadm software raid5 arrays?

2009-11-21 Thread Udo Richter

On 18.11.2009 18:28, H. Langos wrote:

I/O-load can have some nasty effects. E.g. if your heads have to jump
back and forth between an area from where you are reading and an area
to which you are recording.


I remember reading some tests about file system write strategies that 
showed major differences between file systems when writing several file 
streams in parallel. IIRC the old EXT2/3 was way at the lower end, while 
XFS scored more at the upper end.


One major point here is to avoid heavy seeking, by massive use of write 
caching and read ahead caching. Another one is a smart allocation 
strategy so that the files don't get interleaved too much, and so that 
metadata doesn't have to be read/written too often. (- extents)



In a raid1 setup you have two sets of heads that you can work with.
(Or more if you are willing to put in more disks.)


In theory yes, but I would really like to know whether raid systems are 
actually smart enough to split their read operations between the heads 
in an efficient way. For example, while reading two data streams, its 
probably the best to use one head for each stream. Unless one of the 
streams needs a higher bandwidth, in which case it would be more wise to 
use one head exclusively, and let the other jump between the streams. 
And what if there are several small reads in parallel? Which head should 
be interrupted?


In the end you can probably put a lot of strategy fine tuning into this, 
and there will still be situations where a different strategy would 
still improve performance in some scenarios - or in others not.



Cheers,

Udo

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-19 Thread Pasi Kärkkäinen
On Tue, Nov 17, 2009 at 03:34:59PM +, Steve wrote:
 Alex Betis wrote:
 I don't record much, so I don't worry about speed.
 
 While there's no denying that RAID5 *at best* has a write speed
 equivalent to about 1.3x a single disk and if you're not careful with
 stride/block settings can be a lot slower, that's no worse for our
 purposes that, erm, having a single disk in the first place. And reading
 is *always* faster...
 
 Example. I'm not bothered about write speed (only having 3 tuners) so I
 didn't get too carried away setting up my 3-active disk 3TB RAID5 array,
 accepting all the default values.
 
 Rough speed test:
 #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s
 

You should use oflag=direct to make it actually write the file to disk..

 #dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s
 

And now most probably the file will come from linux kernel cache. 
Use iflag=direct to read it actually from the disk.

-- Pasi

 Don't know about anyone else's setup, but if I were to record all
 streams from all tuners, there would still be I/O bandwidth left.
 Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so
 for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The
 78MB/s of my RAID5 doesn't seem to be much of an issue then.
 
 Steve
 
 
 
 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-19 Thread Steve

H. Langos wrote:
Depending on the amount of RAM, the cache can screw up your results 
quite badly. For something a little more realistic try: 
  

Good point!


 sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync
  

Interestingly, not much difference:

# sync; dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 conv=fsync
1073741824 bytes (1.1 GB) copied, 14.6112 s, 73.5 MB/s

Steve

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-19 Thread Steve

Pasi Kärkkäinen wrote:

You should use oflag=direct to make it actually write the file to disk..
  
And now most probably the file will come from linux kernel cache. 
Use iflag=direct to read it actually from the disk.
  


However, in the real world data _is_ going to be cached via the kernel 
cache, at least (we hope) a stride's worth minimum. We're talking about 
recording video aren't we, and that's surely almost always sequentially 
written, not random seeks everywhere?


For completeness, the results are:

#dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct
1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s

# dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct
1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s

So, still no issue with recording entire transponders; using 1/4 of the 
available raw bandwidth with no buffering.


Interesting stuff, this :)

Steve

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-19 Thread H. Langos
On Thu, Nov 19, 2009 at 01:37:46PM +, Steve wrote:
 Pasi Kärkkäinen wrote:
 You should use oflag=direct to make it actually write the file to disk..
   And now most probably the file will come from linux kernel cache.  
 Use iflag=direct to read it actually from the disk.
   

 However, in the real world data _is_ going to be cached via the kernel  
 cache, at least (we hope) a stride's worth minimum. We're talking about  
 recording video aren't we, and that's surely almost always sequentially  
 written, not random seeks everywhere?

True. Video is going to be written and read sequentially. However the
effects of cache are always that of a short time gain. E.g. write caches 
mask a slow disk by signaling ready to the application while in reality the
kernel is still holding the data in RAM. If you continue to write at a speed
faster than the disk can handle, then cache will fill up and at some point
in time your application's write requests will be slowed down to what the
disk can handle. 

If however your application writes to the same block again, before the 
cache has been written to disk, then your cache truely has gained you 
performance even in the long run, by avoiding writing data that already 
has been replaced.


Same thing with read caches. They only help if you are reading the same data
again.

The effect that you _will_ see is that of reading ahead. That helps if 
your application reads one block, and then another and the kernel has 
already looked ahead and fetched more blocks than originally requested
from the disk.

This also has the effect of avoiding too many seeks if you are reading from 
more than one place on the disk at once .. but again. The effect in regard to
read throughput however fades away as you read large amounts of data only once.

What it boils down to is this:

  Caches improve latency, not throughput.


What read-ahead and write-caches will do in this scenario, is to help you
mask the effects of seeks on your disk by reading ahead and by aggregating
write requests and sorting them in a way that reduces seek times. In this
regard writing multiple streams is easier than reading. When writing stuff,
you can let your kernel decide to keep some of the data 10 or 15 seconds 
in RAM before commiting it to disk. However if you are _reading_ you will 
be pretty miffed if your video stalls for 15 seconds because the kernel
found something more interesting to read first :-)

 For completeness, the results are:

 #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct
 1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s

Interesting. The difference between this and the oflag=fsync is that
in the later the kernel gets to sort all of the write requests more or less
as its wants to. So I guess for recording video, the 73MB/s will be your
bandwidth, while this test here shows the performance that a data integrity 
focused application like e.g. a database will get from your RAID.

 # dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct
 1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s

 So, still no issue with recording entire transponders; using 1/4 of the  
 available raw bandwidth with no buffering.

Well, using 1/4 bandwidth by one client or shared by multiple clients can 
make all the difference.

How about making some tests with cstream ? I only did a quick apt-cache
search but it seems like cstream could be used to simulate clients with
various bandwidth needs and for measuring the bandwidth that is left.

 Interesting stuff, this :)

Very interesting indeed. Thanks for enriching this discussion with real
data!

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-18 Thread H. Langos
Hi Alex,

On Tue, Nov 17, 2009 at 03:34:59PM +, Steve wrote:
 Alex Betis wrote:
 I don't record much, so I don't worry about speed.

 While there's no denying that RAID5 *at best* has a write speed
 equivalent to about 1.3x a single disk and if you're not careful with
 stride/block settings can be a lot slower, that's no worse for our
 purposes that, erm, having a single disk in the first place. And reading
 is *always* faster...

Thanks for putting some numbers out there. My estimate was more theory
driven. :-)

 Example. I'm not bothered about write speed (only having 3 tuners) so I
 didn't get too carried away setting up my 3-active disk 3TB RAID5 array,
 accepting all the default values.

 Rough speed test:
 #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s

 #dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s

Depending on the amount of RAM, the cache can screw up your results 
quite badly. For something a little more realistic try: 

 sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync

The first sync writes out fs cache so that you start with a 
clean cache and the conv=fsync makes sure that dd doesn't 
finish until it has written its data back to disk.

After the write you need to make sure that your read cache is not still
full of the data you just wrote. 650 MB/s would mean 223 MB/s per disk. 
That sounds a bit too high.

Try to read something different (and big) from that disk before running
the second test. 

 Don't know about anyone else's setup, but if I were to record all
 streams from all tuners, there would still be I/O bandwidth left.
 Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so
 for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The
 78MB/s of my RAID5 doesn't seem to be much of an issue then.

Well, I guess DVB-S2 has higher bandwidth. (numbers anybody?)
But more importantly: The rough speedtests that you used were under 
zero I/O load. 
I/O-load can have some nasty effects. E.g. if your heads have to jump 
back and forth between an area from where you are reading and an area 
to which you are recording. In the case of one read stream and several 
write streams in theory you could adjust the filesystem's allocation 
strategy so that available areas near your read region are used for 
writing (though I doubt that anybody ever implemented this strategy in
a mainstream fs) but when you are reading several streams even 
caching, smart io schedulers, and NCQ can not completely mask the
fact that in raid5 you basically have one set of read/write heads.

In a raid1 setup you have two sets of heads that you can work with.
(Or more if you are willing to put in more disks.)


Basically raid5 and raid1+0 scale differently if you add more disks.

If you put in more disks into raid5 you gain 
 * more capacity (each additional disk counts fully) and 
 * more linear read performance.

If you put in more disks into raid1+0 it depends on where you put the
additional disks to work.
If you grow the _number of mirrors_ you get 
 * more read performance (linear and random)
 * more redundancy
If you grow the _number of stripes_ you get
 * more read and write performance (linear and random)
 * more capacity (but only half of the additonal for 2 disk mirror sets)

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-17 Thread Steve

Alex Betis wrote:

I don't record much, so I don't worry about speed.


While there's no denying that RAID5 *at best* has a write speed
equivalent to about 1.3x a single disk and if you're not careful with
stride/block settings can be a lot slower, that's no worse for our
purposes that, erm, having a single disk in the first place. And reading
is *always* faster...

Example. I'm not bothered about write speed (only having 3 tuners) so I
didn't get too carried away setting up my 3-active disk 3TB RAID5 array,
accepting all the default values.

Rough speed test:
#dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s

#dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024
1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s

Don't know about anyone else's setup, but if I were to record all
streams from all tuners, there would still be I/O bandwidth left.
Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so
for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The
78MB/s of my RAID5 doesn't seem to be much of an issue then.

Steve



___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-12 Thread Alex Betis
Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1
where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall
partitions for /boot and RAID-5 on the rest.
RAID-5 also allows easier expansion in the future.


On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linu...@nzbaxters.com wrote:

 Thanks - very useful!

 So what I'll probably do is as follows...
 * My system has 4x SATA ports on the motherboard, to which I'll connect my
 4x 1.5TB drives.
 * Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB for
 /media
 * I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions on
 2 drives
 * move all active recordings (~400G) to /dev/md2
 * split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of
 ~1.4TB across 4 drives

 At this point I have preserved all my data, and created a raid1+0 for
 recordings and media.

 I should now use the remaining ~100G on each drive for raid protection for
 (root) / and /boot.  I've read lots on the web on this, but what's your
 recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and
 install grub (/boot) on both so either will boot?



  On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

 What about a simple raid 1 mirror set?


 Ok.. short comparison, using a single disk as baseline.

 using 2 disks
 raid0: (striping)
 ++   double read throughput,
 ++   double write throughput,
 --   half the reliability (read: only use with good backup!)

 raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability


 using 3 disks:

 raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability

 raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability

 raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due
 to the necessary reads)
 +sustains failure of any one drive in the set

 using 4 disks:

 raid1+0:
  four times the read performance
 ++   double write performance
 ++   double reliability


 please note: these are approximations and depending on your hardware
 they may be off by quite a bit.

 cheers
 -henrik


 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr



 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-12 Thread Simon Baxter
Thanks Alex.  I think I've decided to go RAID 1+0 rather than RAID 5 as
I'm worried about the write speed.

I often record 3 or 4 channels at once and do see some slow down on OSD
responsiveness during this.

What's your experience with RAID5?


- Original Message -
From: Alex Betis
To: VDR Mailing List
Sent: Friday, November 13, 2009 1:03 AM
Subject: Re: [vdr] mdadm software raid5 arrays?


Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1
where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall
partitions for /boot and RAID-5 on the rest.
RAID-5 also allows easier expansion in the future.



On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linu...@nzbaxters.com wrote:

Thanks - very useful!

So what I'll probably do is as follows...
* My system has 4x SATA ports on the motherboard, to which I'll connect my
4x 1.5TB drives.
* Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB
for /media
* I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions
on 2 drives
* move all active recordings (~400G) to /dev/md2
* split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of
~1.4TB across 4 drives

At this point I have preserved all my data, and created a raid1+0 for
recordings and media.

I should now use the remaining ~100G on each drive for raid protection for
(root) / and /boot.  I've read lots on the web on this, but what's your
recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and
install grub (/boot) on both so either will boot?




On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?



Ok.. short comparison, using a single disk as baseline.

using 2 disks
raid0: (striping)
++   double read throughput,
++   double write throughput,
--   half the reliability (read: only use with good backup!)

raid1: (mirroring)
++   double read throughput.
osame write throughput
++   double the reliability


using 3 disks:

raid0: striping
+++  tripple read performance
+++  tripple write performance
---  third of reliability

raid1: mirroring
+++  tripple read performance
osame write throughput
+++  tripple reliability

raid5: (distributed parity)
+++  tripple read performance
-lower write performance (not due to the second write but due
to the necessary reads)
+sustains failure of any one drive in the set

using 4 disks:

raid1+0:
 four times the read performance
++   double write performance
++   double reliability


please note: these are approximations and depending on your hardware
they may be off by quite a bit.

cheers
-henrik




___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-12 Thread Alex Betis
In general good experience.
I don't record much, so I don't worry about speed.
There are many web pages about raid5 speed optimizations.
The slowdown in raid5 writes mostly happen when a part of a strip (chunk of
data) has to be written,
so the driver has to read the strip, and write it back. The optimizations
talk about alignment of file system block size with raid strip size.

Since we're talking about movie recordings (huge files), then big file
system blocks will not create much waste.
Smaller strip size will probably reduce the read performance a bit, but will
increase write speed since there will be less cases where not the whole
strip has to be updated.

In one sentence, you won't know if it's slow until you'll try :)
RAID 10 will obviously give better write speed, but I'm not yet convinced
that raid 5 can't handle 4 recordings at the same time.

If we're talking about HD recording, it's about 3Gigs/hour, meaning less
than MByte per second.
Don't think there should be a problem to write 3-4 MByte/sec without any
raid.

By the way, I had a very bad experience with LVM on top of raid in latest
distros, so if you want to save some hairs on your head, don't try it :)


On Fri, Nov 13, 2009 at 1:06 AM, Simon Baxter linu...@nzbaxters.com wrote:

 Thanks Alex.  I think I've decided to go RAID 1+0 rather than RAID 5 as
 I'm worried about the write speed.

 I often record 3 or 4 channels at once and do see some slow down on OSD
 responsiveness during this.

 What's your experience with RAID5?


 - Original Message -
 From: Alex Betis
 To: VDR Mailing List
 Sent: Friday, November 13, 2009 1:03 AM
 Subject: Re: [vdr] mdadm software raid5 arrays?


 Simon,

 Pay attention that /boot can be installed only on a single disk or RAID-1
 where every disk can actually work as a stand alone disk.

 I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall
 partitions for /boot and RAID-5 on the rest.
 RAID-5 also allows easier expansion in the future.



 On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linu...@nzbaxters.com
 wrote:

 Thanks - very useful!

 So what I'll probably do is as follows...
 * My system has 4x SATA ports on the motherboard, to which I'll connect my
 4x 1.5TB drives.
 * Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB
 for /media
 * I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions
 on 2 drives
 * move all active recordings (~400G) to /dev/md2
 * split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of
 ~1.4TB across 4 drives

 At this point I have preserved all my data, and created a raid1+0 for
 recordings and media.

 I should now use the remaining ~100G on each drive for raid protection for
 (root) / and /boot.  I've read lots on the web on this, but what's your
 recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and
 install grub (/boot) on both so either will boot?




 On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

 What about a simple raid 1 mirror set?



 Ok.. short comparison, using a single disk as baseline.

 using 2 disks
 raid0: (striping)
 ++   double read throughput,
 ++   double write throughput,
 --   half the reliability (read: only use with good backup!)

 raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability


 using 3 disks:

 raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability

 raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability

 raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due
to the necessary reads)
 +sustains failure of any one drive in the set

 using 4 disks:

 raid1+0:
  four times the read performance
 ++   double write performance
 ++   double reliability


 please note: these are approximations and depending on your hardware
 they may be off by quite a bit.

 cheers
 -henrik




 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-12 Thread Magnus Hörlin
I too have only good experiences with md raid5 and have used it for years on
my vdr server. It’s not uncommon that I record 7-8 programs simultaneously,
including HD channels, and I have never had any problems with that. On the
other hand, this is just a VDR/NFS server serving two diskless VDR/XBMC
frontends so maybe it would affect OSD performance if I had it on the same
machine.

/Magnus H

 

 

  _  

Från: vdr-boun...@linuxtv.org [mailto:vdr-boun...@linuxtv.org] För Alex
Betis
Skickat: den 13 november 2009 08:00
Till: VDR Mailing List
Ämne: Re: [vdr] mdadm software raid5 arrays?

 

In general good experience.
I don't record much, so I don't worry about speed.
There are many web pages about raid5 speed optimizations. 
The slowdown in raid5 writes mostly happen when a part of a strip (chunk of
data) has to be written,
so the driver has to read the strip, and write it back. The optimizations
talk about alignment of file system block size with raid strip size.

Since we're talking about movie recordings (huge files), then big file
system blocks will not create much waste.
Smaller strip size will probably reduce the read performance a bit, but will
increase write speed since there will be less cases where not the whole
strip has to be updated.

In one sentence, you won't know if it's slow until you'll try :)
RAID 10 will obviously give better write speed, but I'm not yet convinced
that raid 5 can't handle 4 recordings at the same time.

If we're talking about HD recording, it's about 3Gigs/hour, meaning less
than MByte per second.
Don't think there should be a problem to write 3-4 MByte/sec without any
raid.

By the way, I had a very bad experience with LVM on top of raid in latest
distros, so if you want to save some hairs on your head, don't try it :)



On Fri, Nov 13, 2009 at 1:06 AM, Simon Baxter linu...@nzbaxters.com wrote:

Thanks Alex.  I think I've decided to go RAID 1+0 rather than RAID 5 as
I'm worried about the write speed.

I often record 3 or 4 channels at once and do see some slow down on OSD
responsiveness during this.

What's your experience with RAID5?



- Original Message -
From: Alex Betis
To: VDR Mailing List
Sent: Friday, November 13, 2009 1:03 AM
Subject: Re: [vdr] mdadm software raid5 arrays?


Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1
where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall
partitions for /boot and RAID-5 on the rest.
RAID-5 also allows easier expansion in the future.



On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linu...@nzbaxters.com wrote:

Thanks - very useful!

So what I'll probably do is as follows...
* My system has 4x SATA ports on the motherboard, to which I'll connect my
4x 1.5TB drives.
* Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB
for /media
* I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions
on 2 drives
* move all active recordings (~400G) to /dev/md2
* split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of
~1.4TB across 4 drives

At this point I have preserved all my data, and created a raid1+0 for
recordings and media.

I should now use the remaining ~100G on each drive for raid protection for
(root) / and /boot.  I've read lots on the web on this, but what's your
recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and
install grub (/boot) on both so either will boot?




On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?



Ok.. short comparison, using a single disk as baseline.

using 2 disks
raid0: (striping)
++   double read throughput,
++   double write throughput,
--   half the reliability (read: only use with good backup!)

raid1: (mirroring)
++   double read throughput.
osame write throughput
++   double the reliability


using 3 disks:

raid0: striping
+++  tripple read performance
+++  tripple write performance
---  third of reliability

raid1: mirroring
+++  tripple read performance
osame write throughput
+++  tripple reliability

raid5: (distributed parity)
+++  tripple read performance
-lower write performance (not due to the second write but due
   to the necessary reads)
+sustains failure of any one drive in the set

using 4 disks:

raid1+0:
 four times the read performance
++   double write performance
++   double reliability


please note: these are approximations and depending on your hardware
they may be off by quite a bit.

cheers
-henrik




___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.425 / Virus Database: 270.14.58/2493 - Release Date: 11/12/09
07:38:00

___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo

Re: [vdr] mdadm software raid5 arrays?

2009-11-11 Thread jori.hamalainen

 Ok.. short comparison, using a single disk as baseline.

Good chart, perhaps you should also mention the capacity. With this I 
mean what happens.

1 disk = 1TB for simplicity.

using 2 disks
raid0: (striping)
 ++   double read throughput, 
 ++   double write throughput, 
 --   half the reliability (read: only use with good backup!)
Capacity: 2TB

raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability
Capacity: 1TB

using 3 disks:
raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability
Capacity: 3TB

raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability
Capacity: 1TB

raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due 
  to the necessary reads)
 +sustains failure of any one drive in the set
Capacity: 2TB

using 4 disks:
raid1+0:
  four times the read performance 
 ++   double write performance
 ++   double reliability
Capacity: 2TB

Raid5:
 +++  four times the read performance
 -lower write performance (not due to the second write but due 
  to the necessary reads)
 +sustains failure of any one drive in the set
Capacity: 3TB


smime.p7s
Description: S/MIME cryptographic signature
___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-10 Thread Simon Baxter

What about a simple raid 1 mirror set?

- Original Message - 
From: H. Langos henrik-...@prak.org

To: VDR Mailing List vdr@linuxtv.org
Sent: Tuesday, November 10, 2009 6:49 AM
Subject: Re: [vdr] mdadm software raid5 arrays?



Hi Simon,

On Sat, Nov 07, 2009 at 07:38:03AM +1300, Simon Baxter wrote:

Hi

I've been running logical volume management (LVMs) on my production VDR
box for years, but recently had a drive failure.  To be honest, in the
~20 years I've had PCs in the house, this is the first time a drive
failed!

Anyway, I've bought 3x 1.5 TB SATA disks which I'd like to put into a
software (mdadm) raid 5 array.


...


I regularly record 3 and sometimes 4 channels simultaneously, while
watching a recording.  Under regular LVM, this sometimes seemed to cause
some slow downs.


I know I risk a flame war here but I feel obliged to say it:
Avoid raid5 if you can avoid it! It is fun to play with but
if you care for your data buy a fourth drive and do raid1+0
(mirroring and striping) instead.

Raid 5 is very fast on linear read operations because basically
the load will be spread onto all the available drives.
But if you are going to run vdr on that drive array, you are going
to do a lot of write operations, and raid5 is bad if you do a lot
of writes for a very simple reason.

Take a raid5 array with X devices. If you want to write just one
block, you need to read 2 blocks (the old data that you are
going to overwrite and the old parity) and you need to write 2
blocks (one with the actual data and one with the new parity).

In the best of case, the disk block that you are going to
overwrite is already in ram, but the parity block almost never
will be. Only if you keep writing the same block over and over,
you'll have data and parity blocks cached.
In most cases (and certainly in the case of writing data streams
on disk) you'll need to read two blocks before you can calculate
the new parity and write it back to the disks along with your data.

So in short you do two reads and two writes for every write operation.
There goes your performance...

Now about drive failures... if one of X disks fails, you can still
read blocks on the OK drives with just one read operation but you
need X-1 read operations for every read operation on the failed drive.
Writes on OK drives have the same two reads/two writes as before,
(only if the failed drive contained the parity for this block you
can skip the additional two reads and one write).
If however you need to write on the the failed drive, then you need
to read every other X-1 drive in the array to first reconstruct
the missing data and then you can calculate and write the new
parity. (and then you throw away the actual data that you were
going to write because the drive that you could write it to is
gone...)

Example: You have your three 1.5TB drives A B C in an array
and C fails. In this situation you'd want to treat your drives as
carefully as possible because one more failure and all your data
is gone. Unfortunately continued operating in fail condition will
put your remaining drives under much more stress than usually.

Reading will cause twice the read operations on your remaining
drives.

block: n   n+1 n+2
OK State : a   b   c
Failstate: a   b   ab

Writing (on a small array) will produce the same load of two reads
and two writes average per write.

block: n n+1n+2
OK:acAC  baBA   cbCB
FAIL:  A baBA   baB


Confusingly enough the read load per drive doesn't change if
you have more than three drives in your array. Reads will still
produce on average double the load in failed state.

Writes on a failed array seem to produce the same load as on
an OK array. But this is only true for very small arrays.
If you add more disks you'll see that the read penalty grows
for writing blocks where the data disk is missing and you need
to read all other drives in order to update th parity.


Reconstruction of you array after adding a new drive will take
a long time and most of complete array failures (i.e. data lost
forever) occure during the rebuilding phase, not during in the
fail state. Thats simply because you put a lot of stress on
your drives (that probably come from same batch as the one
that already failed).

Depending on the number and nature of your drives and the
host connection they have, the limiting factor can read
performance (you need to read X-1 drives completely) or
it can be the write performance if your disk is slower on
sustained writing than on reading.

Remember that you need to read and write a whole disks worth
of data, not just the used parts.

Example: Your drives have 1.5tb and we assume that you have
a whoopin 100MB/s on read as well as on write. (pretty much the
fastest there currently is).

You need to read 3tb as well as write 1.5tb. if your system can
handle the load in parallel you can treat it as just writing one
1.5tb drive. 150mb/100mb/s/60s/m makes 250 minutes or 4 hours
and 10 minutes. I am curious if you

Re: [vdr] mdadm software raid5 arrays?

2009-11-10 Thread H. Langos
On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:
 What about a simple raid 1 mirror set?


Ok.. short comparison, using a single disk as baseline.

using 2 disks
raid0: (striping)
 ++   double read throughput, 
 ++   double write throughput, 
 --   half the reliability (read: only use with good backup!)

raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability


using 3 disks:

raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability

raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability

raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due 
  to the necessary reads)
 +sustains failure of any one drive in the set

using 4 disks:

raid1+0:
  four times the read performance 
 ++   double write performance
 ++   double reliability


please note: these are approximations and depending on your hardware
they may be off by quite a bit.

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-10 Thread Simon Baxter

Thanks - very useful!

So what I'll probably do is as follows...
* My system has 4x SATA ports on the motherboard, to which I'll connect my 
4x 1.5TB drives.
* Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB for 
/media
* I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions on 
2 drives

* move all active recordings (~400G) to /dev/md2
* split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of 
~1.4TB across 4 drives


At this point I have preserved all my data, and created a raid1+0 for 
recordings and media.


I should now use the remaining ~100G on each drive for raid protection for 
(root) / and /boot.  I've read lots on the web on this, but what's your 
recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and 
install grub (/boot) on both so either will boot?




On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?



Ok.. short comparison, using a single disk as baseline.

using 2 disks
raid0: (striping)
++   double read throughput,
++   double write throughput,
--   half the reliability (read: only use with good backup!)

raid1: (mirroring)
++   double read throughput.
osame write throughput
++   double the reliability


using 3 disks:

raid0: striping
+++  tripple read performance
+++  tripple write performance
---  third of reliability

raid1: mirroring
+++  tripple read performance
osame write throughput
+++  tripple reliability

raid5: (distributed parity)
+++  tripple read performance
-lower write performance (not due to the second write but due
 to the necessary reads)
+sustains failure of any one drive in the set

using 4 disks:

raid1+0:
 four times the read performance
++   double write performance
++   double reliability


please note: these are approximations and depending on your hardware
they may be off by quite a bit.

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr




___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-10 Thread Jochen Heuer
Hello Simon,

what you also can do is to create the two RAID1 md devices with missing
disks, e.g.:

mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/sdb3
mdadm --create /dev/md3 --level=1 --raid-disks=2 missing /dev/sdd3
mdadm --create /dev/md1 --level=0 --raid-disks=2 /dev/md2 /dev/md3

Then you can create a filesystem on /dev/md1 and mount it, move all your
recordings to that filesystem and lateron you can add the other two partitions
to your RAID1 sets:

mdadm --add /dev/md2 /dev/sda3
mdadm --add /dev/md3 /dev/sdc3

This way you don't have to split anything. You can just setup the two RAID1
arrays with one only one drive and one drive missing.

All of my systems (except of VDR because it currently only has one disk) has a
mirrored / and grub installed on both disks.

I don't know if you can read German or if a Google translation of the following
pages is usable but it might help you to get the correct keywords for a Google
search:

http://linuxwiki.de/mdadm
http://www.howtoforge.de/howto/software-raid1-auf-einem-laufenden-system-inkl-grub-konfiguration-debian-etch-einrichten/

Best regards,

   Jogi

On Wed, Nov 11, 2009 at 07:48:20AM +1300, Simon Baxter wrote:
 Thanks - very useful!

 So what I'll probably do is as follows...
 * My system has 4x SATA ports on the motherboard, to which I'll connect my 
 4x 1.5TB drives.
 * Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB for 
 /media
 * I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions on 
 2 drives
 * move all active recordings (~400G) to /dev/md2
 * split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of 
 ~1.4TB across 4 drives

 At this point I have preserved all my data, and created a raid1+0 for 
 recordings and media.

 I should now use the remaining ~100G on each drive for raid protection for 
 (root) / and /boot.  I've read lots on the web on this, but what's your 
 recommendation?  RAID1 mirror across 2 of the disks for / (/dev/md0) and 
 install grub (/boot) on both so either will boot?


 On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:
 What about a simple raid 1 mirror set?


 Ok.. short comparison, using a single disk as baseline.

 using 2 disks
 raid0: (striping)
 ++   double read throughput,
 ++   double write throughput,
 --   half the reliability (read: only use with good backup!)

 raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability


 using 3 disks:

 raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability

 raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability

 raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due
  to the necessary reads)
 +sustains failure of any one drive in the set

 using 4 disks:

 raid1+0:
  four times the read performance
 ++   double write performance
 ++   double reliability


 please note: these are approximations and depending on your hardware
 they may be off by quite a bit.

 cheers
 -henrik


 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


 ___
 vdr mailing list
 vdr@linuxtv.org
 http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-09 Thread H. Langos
Hi Simon,

On Sat, Nov 07, 2009 at 07:38:03AM +1300, Simon Baxter wrote:
 Hi

 I've been running logical volume management (LVMs) on my production VDR 
 box for years, but recently had a drive failure.  To be honest, in the 
 ~20 years I've had PCs in the house, this is the first time a drive 
 failed!

 Anyway, I've bought 3x 1.5 TB SATA disks which I'd like to put into a  
 software (mdadm) raid 5 array.

...

 I regularly record 3 and sometimes 4 channels simultaneously, while 
 watching a recording.  Under regular LVM, this sometimes seemed to cause 
 some slow downs.

I know I risk a flame war here but I feel obliged to say it:
Avoid raid5 if you can avoid it! It is fun to play with but
if you care for your data buy a fourth drive and do raid1+0 
(mirroring and striping) instead.

Raid 5 is very fast on linear read operations because basically
the load will be spread onto all the available drives.
But if you are going to run vdr on that drive array, you are going
to do a lot of write operations, and raid5 is bad if you do a lot
of writes for a very simple reason.

Take a raid5 array with X devices. If you want to write just one
block, you need to read 2 blocks (the old data that you are
going to overwrite and the old parity) and you need to write 2
blocks (one with the actual data and one with the new parity).

In the best of case, the disk block that you are going to
overwrite is already in ram, but the parity block almost never
will be. Only if you keep writing the same block over and over,
you'll have data and parity blocks cached.
In most cases (and certainly in the case of writing data streams
on disk) you'll need to read two blocks before you can calculate
the new parity and write it back to the disks along with your data.

So in short you do two reads and two writes for every write operation.
There goes your performance...

Now about drive failures... if one of X disks fails, you can still
read blocks on the OK drives with just one read operation but you
need X-1 read operations for every read operation on the failed drive.
Writes on OK drives have the same two reads/two writes as before,
(only if the failed drive contained the parity for this block you
can skip the additional two reads and one write).
If however you need to write on the the failed drive, then you need 
to read every other X-1 drive in the array to first reconstruct 
the missing data and then you can calculate and write the new 
parity. (and then you throw away the actual data that you were 
going to write because the drive that you could write it to is 
gone...)

Example: You have your three 1.5TB drives A B C in an array
and C fails. In this situation you'd want to treat your drives as
carefully as possible because one more failure and all your data
is gone. Unfortunately continued operating in fail condition will
put your remaining drives under much more stress than usually.

Reading will cause twice the read operations on your remaining 
drives.

block: n   n+1 n+2
OK State : a   b   c  
Failstate: a   b   ab  

Writing (on a small array) will produce the same load of two reads
and two writes average per write.

block: n n+1n+2   
OK:acAC  baBA   cbCB 
FAIL:  A baBA   baB


Confusingly enough the read load per drive doesn't change if 
you have more than three drives in your array. Reads will still 
produce on average double the load in failed state.

Writes on a failed array seem to produce the same load as on 
an OK array. But this is only true for very small arrays. 
If you add more disks you'll see that the read penalty grows
for writing blocks where the data disk is missing and you need 
to read all other drives in order to update th parity.


Reconstruction of you array after adding a new drive will take
a long time and most of complete array failures (i.e. data lost
forever) occure during the rebuilding phase, not during in the 
fail state. Thats simply because you put a lot of stress on 
your drives (that probably come from same batch as the one 
that already failed).

Depending on the number and nature of your drives and the
host connection they have, the limiting factor can read 
performance (you need to read X-1 drives completely) or 
it can be the write performance if your disk is slower on 
sustained writing than on reading.

Remember that you need to read and write a whole disks worth
of data, not just the used parts.

Example: Your drives have 1.5tb and we assume that you have
a whoopin 100MB/s on read as well as on write. (pretty much the 
fastest there currently is).

You need to read 3tb as well as write 1.5tb. if your system can
handle the load in parallel you can treat it as just writing one
1.5tb drive. 150mb/100mb/s/60s/m makes 250 minutes or 4 hours 
and 10 minutes. I am curious if you can still use the system under
such an io load. Anybody with experience on this? Anyway the 
reconstruction rate can be tuned via the proc fs.


Now for the raid 1+0 alternative with the same