Re: [vdr] mdadm software raid5 arrays?

2009-11-19 Thread H. Langos
On Thu, Nov 19, 2009 at 01:37:46PM +, Steve wrote:
 Pasi Kärkkäinen wrote:
 You should use oflag=direct to make it actually write the file to disk..
   And now most probably the file will come from linux kernel cache.  
 Use iflag=direct to read it actually from the disk.
   

 However, in the real world data _is_ going to be cached via the kernel  
 cache, at least (we hope) a stride's worth minimum. We're talking about  
 recording video aren't we, and that's surely almost always sequentially  
 written, not random seeks everywhere?

True. Video is going to be written and read sequentially. However the
effects of cache are always that of a short time gain. E.g. write caches 
mask a slow disk by signaling ready to the application while in reality the
kernel is still holding the data in RAM. If you continue to write at a speed
faster than the disk can handle, then cache will fill up and at some point
in time your application's write requests will be slowed down to what the
disk can handle. 

If however your application writes to the same block again, before the 
cache has been written to disk, then your cache truely has gained you 
performance even in the long run, by avoiding writing data that already 
has been replaced.


Same thing with read caches. They only help if you are reading the same data
again.

The effect that you _will_ see is that of reading ahead. That helps if 
your application reads one block, and then another and the kernel has 
already looked ahead and fetched more blocks than originally requested
from the disk.

This also has the effect of avoiding too many seeks if you are reading from 
more than one place on the disk at once .. but again. The effect in regard to
read throughput however fades away as you read large amounts of data only once.

What it boils down to is this:

  Caches improve latency, not throughput.


What read-ahead and write-caches will do in this scenario, is to help you
mask the effects of seeks on your disk by reading ahead and by aggregating
write requests and sorting them in a way that reduces seek times. In this
regard writing multiple streams is easier than reading. When writing stuff,
you can let your kernel decide to keep some of the data 10 or 15 seconds 
in RAM before commiting it to disk. However if you are _reading_ you will 
be pretty miffed if your video stalls for 15 seconds because the kernel
found something more interesting to read first :-)

 For completeness, the results are:

 #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct
 1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s

Interesting. The difference between this and the oflag=fsync is that
in the later the kernel gets to sort all of the write requests more or less
as its wants to. So I guess for recording video, the 73MB/s will be your
bandwidth, while this test here shows the performance that a data integrity 
focused application like e.g. a database will get from your RAID.

 # dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct
 1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s

 So, still no issue with recording entire transponders; using 1/4 of the  
 available raw bandwidth with no buffering.

Well, using 1/4 bandwidth by one client or shared by multiple clients can 
make all the difference.

How about making some tests with cstream ? I only did a quick apt-cache
search but it seems like cstream could be used to simulate clients with
various bandwidth needs and for measuring the bandwidth that is left.

 Interesting stuff, this :)

Very interesting indeed. Thanks for enriching this discussion with real
data!

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-18 Thread H. Langos
Hi Alex,

On Tue, Nov 17, 2009 at 03:34:59PM +, Steve wrote:
 Alex Betis wrote:
 I don't record much, so I don't worry about speed.

 While there's no denying that RAID5 *at best* has a write speed
 equivalent to about 1.3x a single disk and if you're not careful with
 stride/block settings can be a lot slower, that's no worse for our
 purposes that, erm, having a single disk in the first place. And reading
 is *always* faster...

Thanks for putting some numbers out there. My estimate was more theory
driven. :-)

 Example. I'm not bothered about write speed (only having 3 tuners) so I
 didn't get too carried away setting up my 3-active disk 3TB RAID5 array,
 accepting all the default values.

 Rough speed test:
 #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s

 #dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024
 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s

Depending on the amount of RAM, the cache can screw up your results 
quite badly. For something a little more realistic try: 

 sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync

The first sync writes out fs cache so that you start with a 
clean cache and the conv=fsync makes sure that dd doesn't 
finish until it has written its data back to disk.

After the write you need to make sure that your read cache is not still
full of the data you just wrote. 650 MB/s would mean 223 MB/s per disk. 
That sounds a bit too high.

Try to read something different (and big) from that disk before running
the second test. 

 Don't know about anyone else's setup, but if I were to record all
 streams from all tuners, there would still be I/O bandwidth left.
 Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so
 for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The
 78MB/s of my RAID5 doesn't seem to be much of an issue then.

Well, I guess DVB-S2 has higher bandwidth. (numbers anybody?)
But more importantly: The rough speedtests that you used were under 
zero I/O load. 
I/O-load can have some nasty effects. E.g. if your heads have to jump 
back and forth between an area from where you are reading and an area 
to which you are recording. In the case of one read stream and several 
write streams in theory you could adjust the filesystem's allocation 
strategy so that available areas near your read region are used for 
writing (though I doubt that anybody ever implemented this strategy in
a mainstream fs) but when you are reading several streams even 
caching, smart io schedulers, and NCQ can not completely mask the
fact that in raid5 you basically have one set of read/write heads.

In a raid1 setup you have two sets of heads that you can work with.
(Or more if you are willing to put in more disks.)


Basically raid5 and raid1+0 scale differently if you add more disks.

If you put in more disks into raid5 you gain 
 * more capacity (each additional disk counts fully) and 
 * more linear read performance.

If you put in more disks into raid1+0 it depends on where you put the
additional disks to work.
If you grow the _number of mirrors_ you get 
 * more read performance (linear and random)
 * more redundancy
If you grow the _number of stripes_ you get
 * more read and write performance (linear and random)
 * more capacity (but only half of the additonal for 2 disk mirror sets)

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-10 Thread H. Langos
On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:
 What about a simple raid 1 mirror set?


Ok.. short comparison, using a single disk as baseline.

using 2 disks
raid0: (striping)
 ++   double read throughput, 
 ++   double write throughput, 
 --   half the reliability (read: only use with good backup!)

raid1: (mirroring)
 ++   double read throughput.
 osame write throughput
 ++   double the reliability


using 3 disks:

raid0: striping
 +++  tripple read performance
 +++  tripple write performance
 ---  third of reliability

raid1: mirroring
 +++  tripple read performance
 osame write throughput
 +++  tripple reliability

raid5: (distributed parity)
 +++  tripple read performance
 -lower write performance (not due to the second write but due 
  to the necessary reads)
 +sustains failure of any one drive in the set

using 4 disks:

raid1+0:
  four times the read performance 
 ++   double write performance
 ++   double reliability


please note: these are approximations and depending on your hardware
they may be off by quite a bit.

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] mdadm software raid5 arrays?

2009-11-09 Thread H. Langos
Hi Simon,

On Sat, Nov 07, 2009 at 07:38:03AM +1300, Simon Baxter wrote:
 Hi

 I've been running logical volume management (LVMs) on my production VDR 
 box for years, but recently had a drive failure.  To be honest, in the 
 ~20 years I've had PCs in the house, this is the first time a drive 
 failed!

 Anyway, I've bought 3x 1.5 TB SATA disks which I'd like to put into a  
 software (mdadm) raid 5 array.

...

 I regularly record 3 and sometimes 4 channels simultaneously, while 
 watching a recording.  Under regular LVM, this sometimes seemed to cause 
 some slow downs.

I know I risk a flame war here but I feel obliged to say it:
Avoid raid5 if you can avoid it! It is fun to play with but
if you care for your data buy a fourth drive and do raid1+0 
(mirroring and striping) instead.

Raid 5 is very fast on linear read operations because basically
the load will be spread onto all the available drives.
But if you are going to run vdr on that drive array, you are going
to do a lot of write operations, and raid5 is bad if you do a lot
of writes for a very simple reason.

Take a raid5 array with X devices. If you want to write just one
block, you need to read 2 blocks (the old data that you are
going to overwrite and the old parity) and you need to write 2
blocks (one with the actual data and one with the new parity).

In the best of case, the disk block that you are going to
overwrite is already in ram, but the parity block almost never
will be. Only if you keep writing the same block over and over,
you'll have data and parity blocks cached.
In most cases (and certainly in the case of writing data streams
on disk) you'll need to read two blocks before you can calculate
the new parity and write it back to the disks along with your data.

So in short you do two reads and two writes for every write operation.
There goes your performance...

Now about drive failures... if one of X disks fails, you can still
read blocks on the OK drives with just one read operation but you
need X-1 read operations for every read operation on the failed drive.
Writes on OK drives have the same two reads/two writes as before,
(only if the failed drive contained the parity for this block you
can skip the additional two reads and one write).
If however you need to write on the the failed drive, then you need 
to read every other X-1 drive in the array to first reconstruct 
the missing data and then you can calculate and write the new 
parity. (and then you throw away the actual data that you were 
going to write because the drive that you could write it to is 
gone...)

Example: You have your three 1.5TB drives A B C in an array
and C fails. In this situation you'd want to treat your drives as
carefully as possible because one more failure and all your data
is gone. Unfortunately continued operating in fail condition will
put your remaining drives under much more stress than usually.

Reading will cause twice the read operations on your remaining 
drives.

block: n   n+1 n+2
OK State : a   b   c  
Failstate: a   b   ab  

Writing (on a small array) will produce the same load of two reads
and two writes average per write.

block: n n+1n+2   
OK:acAC  baBA   cbCB 
FAIL:  A baBA   baB


Confusingly enough the read load per drive doesn't change if 
you have more than three drives in your array. Reads will still 
produce on average double the load in failed state.

Writes on a failed array seem to produce the same load as on 
an OK array. But this is only true for very small arrays. 
If you add more disks you'll see that the read penalty grows
for writing blocks where the data disk is missing and you need 
to read all other drives in order to update th parity.


Reconstruction of you array after adding a new drive will take
a long time and most of complete array failures (i.e. data lost
forever) occure during the rebuilding phase, not during in the 
fail state. Thats simply because you put a lot of stress on 
your drives (that probably come from same batch as the one 
that already failed).

Depending on the number and nature of your drives and the
host connection they have, the limiting factor can read 
performance (you need to read X-1 drives completely) or 
it can be the write performance if your disk is slower on 
sustained writing than on reading.

Remember that you need to read and write a whole disks worth
of data, not just the used parts.

Example: Your drives have 1.5tb and we assume that you have
a whoopin 100MB/s on read as well as on write. (pretty much the 
fastest there currently is).

You need to read 3tb as well as write 1.5tb. if your system can
handle the load in parallel you can treat it as just writing one
1.5tb drive. 150mb/100mb/s/60s/m makes 250 minutes or 4 hours 
and 10 minutes. I am curious if you can still use the system under
such an io load. Anybody with experience on this? Anyway the 
reconstruction rate can be tuned via the proc fs.


Now for the raid 1+0 alternative with the same 

Re: [vdr] [OT] NVidia ION mini-ITX arriving

2009-05-13 Thread H. Langos
On Wed, May 13, 2009 at 07:02:16PM +0200, Martin Emrich wrote:
 
 Has anybody already spotted a street price for the different variants?
 Especially the one with the external PSU looks interesting...

Try http://www.preisroboter.de/search.php?search=ZOTAC+ION

cheers
-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] Incorrect wakeup time

2009-04-22 Thread H. Langos
Hi Falk,

On Wed, Apr 22, 2009 at 02:43:07PM +0200, Falk Spitzberg wrote:
 Hello,
 
 while playing with ACPI wakeup, i noticed that VDR always sets a
 wakeuptime of now+30 minutes, when the timeframe from 'now' to 'begin of
 next recording' is less than 30 Minutes.
 
 Is that intended behaviour?

Apparently it is:

INSTALL:
 If a timer is currently recording, or a recording would start within the
 next 30 minutes (default for the Min. event timeout setup parameter), and
 the user insists in shutting down now, the first and second parameter will
 correspond to a time that is Min. event timeout minutes in the future.

I walked into the same trap when first playing with the automatic shutdown.
Maybe the syslog output could be improved to indicate if somebody hits this
limit.

cheers
-henrik





___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr


Re: [vdr] Where do you live and what kind of broadcast do you receive?

2009-03-21 Thread H. Langos
Country: Germany (Berlin)
Transmission: DVB-T
Encoding: MPEG-2 SD (27 stations)
Receivers: 
 - MSI digiVox mini II rev.3 (af9015 driver)
 - Fujitsu-Siemens DVB-T Mobile TV Tuner (vp7045)
 (and still looking for a receider with lower power consumption)

-henrik


___
vdr mailing list
vdr@linuxtv.org
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr