Re: [vdr] mdadm software raid5 arrays?
On Thu, Nov 19, 2009 at 01:37:46PM +, Steve wrote: Pasi Kärkkäinen wrote: You should use oflag=direct to make it actually write the file to disk.. And now most probably the file will come from linux kernel cache. Use iflag=direct to read it actually from the disk. However, in the real world data _is_ going to be cached via the kernel cache, at least (we hope) a stride's worth minimum. We're talking about recording video aren't we, and that's surely almost always sequentially written, not random seeks everywhere? True. Video is going to be written and read sequentially. However the effects of cache are always that of a short time gain. E.g. write caches mask a slow disk by signaling ready to the application while in reality the kernel is still holding the data in RAM. If you continue to write at a speed faster than the disk can handle, then cache will fill up and at some point in time your application's write requests will be slowed down to what the disk can handle. If however your application writes to the same block again, before the cache has been written to disk, then your cache truely has gained you performance even in the long run, by avoiding writing data that already has been replaced. Same thing with read caches. They only help if you are reading the same data again. The effect that you _will_ see is that of reading ahead. That helps if your application reads one block, and then another and the kernel has already looked ahead and fetched more blocks than originally requested from the disk. This also has the effect of avoiding too many seeks if you are reading from more than one place on the disk at once .. but again. The effect in regard to read throughput however fades away as you read large amounts of data only once. What it boils down to is this: Caches improve latency, not throughput. What read-ahead and write-caches will do in this scenario, is to help you mask the effects of seeks on your disk by reading ahead and by aggregating write requests and sorting them in a way that reduces seek times. In this regard writing multiple streams is easier than reading. When writing stuff, you can let your kernel decide to keep some of the data 10 or 15 seconds in RAM before commiting it to disk. However if you are _reading_ you will be pretty miffed if your video stalls for 15 seconds because the kernel found something more interesting to read first :-) For completeness, the results are: #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct 1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s Interesting. The difference between this and the oflag=fsync is that in the later the kernel gets to sort all of the write requests more or less as its wants to. So I guess for recording video, the 73MB/s will be your bandwidth, while this test here shows the performance that a data integrity focused application like e.g. a database will get from your RAID. # dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s So, still no issue with recording entire transponders; using 1/4 of the available raw bandwidth with no buffering. Well, using 1/4 bandwidth by one client or shared by multiple clients can make all the difference. How about making some tests with cstream ? I only did a quick apt-cache search but it seems like cstream could be used to simulate clients with various bandwidth needs and for measuring the bandwidth that is left. Interesting stuff, this :) Very interesting indeed. Thanks for enriching this discussion with real data! cheers -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] mdadm software raid5 arrays?
Hi Alex, On Tue, Nov 17, 2009 at 03:34:59PM +, Steve wrote: Alex Betis wrote: I don't record much, so I don't worry about speed. While there's no denying that RAID5 *at best* has a write speed equivalent to about 1.3x a single disk and if you're not careful with stride/block settings can be a lot slower, that's no worse for our purposes that, erm, having a single disk in the first place. And reading is *always* faster... Thanks for putting some numbers out there. My estimate was more theory driven. :-) Example. I'm not bothered about write speed (only having 3 tuners) so I didn't get too carried away setting up my 3-active disk 3TB RAID5 array, accepting all the default values. Rough speed test: #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s #dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s Depending on the amount of RAM, the cache can screw up your results quite badly. For something a little more realistic try: sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync The first sync writes out fs cache so that you start with a clean cache and the conv=fsync makes sure that dd doesn't finish until it has written its data back to disk. After the write you need to make sure that your read cache is not still full of the data you just wrote. 650 MB/s would mean 223 MB/s per disk. That sounds a bit too high. Try to read something different (and big) from that disk before running the second test. Don't know about anyone else's setup, but if I were to record all streams from all tuners, there would still be I/O bandwidth left. Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The 78MB/s of my RAID5 doesn't seem to be much of an issue then. Well, I guess DVB-S2 has higher bandwidth. (numbers anybody?) But more importantly: The rough speedtests that you used were under zero I/O load. I/O-load can have some nasty effects. E.g. if your heads have to jump back and forth between an area from where you are reading and an area to which you are recording. In the case of one read stream and several write streams in theory you could adjust the filesystem's allocation strategy so that available areas near your read region are used for writing (though I doubt that anybody ever implemented this strategy in a mainstream fs) but when you are reading several streams even caching, smart io schedulers, and NCQ can not completely mask the fact that in raid5 you basically have one set of read/write heads. In a raid1 setup you have two sets of heads that you can work with. (Or more if you are willing to put in more disks.) Basically raid5 and raid1+0 scale differently if you add more disks. If you put in more disks into raid5 you gain * more capacity (each additional disk counts fully) and * more linear read performance. If you put in more disks into raid1+0 it depends on where you put the additional disks to work. If you grow the _number of mirrors_ you get * more read performance (linear and random) * more redundancy If you grow the _number of stripes_ you get * more read and write performance (linear and random) * more capacity (but only half of the additonal for 2 disk mirror sets) cheers -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] mdadm software raid5 arrays?
On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote: What about a simple raid 1 mirror set? Ok.. short comparison, using a single disk as baseline. using 2 disks raid0: (striping) ++ double read throughput, ++ double write throughput, -- half the reliability (read: only use with good backup!) raid1: (mirroring) ++ double read throughput. osame write throughput ++ double the reliability using 3 disks: raid0: striping +++ tripple read performance +++ tripple write performance --- third of reliability raid1: mirroring +++ tripple read performance osame write throughput +++ tripple reliability raid5: (distributed parity) +++ tripple read performance -lower write performance (not due to the second write but due to the necessary reads) +sustains failure of any one drive in the set using 4 disks: raid1+0: four times the read performance ++ double write performance ++ double reliability please note: these are approximations and depending on your hardware they may be off by quite a bit. cheers -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] mdadm software raid5 arrays?
Hi Simon, On Sat, Nov 07, 2009 at 07:38:03AM +1300, Simon Baxter wrote: Hi I've been running logical volume management (LVMs) on my production VDR box for years, but recently had a drive failure. To be honest, in the ~20 years I've had PCs in the house, this is the first time a drive failed! Anyway, I've bought 3x 1.5 TB SATA disks which I'd like to put into a software (mdadm) raid 5 array. ... I regularly record 3 and sometimes 4 channels simultaneously, while watching a recording. Under regular LVM, this sometimes seemed to cause some slow downs. I know I risk a flame war here but I feel obliged to say it: Avoid raid5 if you can avoid it! It is fun to play with but if you care for your data buy a fourth drive and do raid1+0 (mirroring and striping) instead. Raid 5 is very fast on linear read operations because basically the load will be spread onto all the available drives. But if you are going to run vdr on that drive array, you are going to do a lot of write operations, and raid5 is bad if you do a lot of writes for a very simple reason. Take a raid5 array with X devices. If you want to write just one block, you need to read 2 blocks (the old data that you are going to overwrite and the old parity) and you need to write 2 blocks (one with the actual data and one with the new parity). In the best of case, the disk block that you are going to overwrite is already in ram, but the parity block almost never will be. Only if you keep writing the same block over and over, you'll have data and parity blocks cached. In most cases (and certainly in the case of writing data streams on disk) you'll need to read two blocks before you can calculate the new parity and write it back to the disks along with your data. So in short you do two reads and two writes for every write operation. There goes your performance... Now about drive failures... if one of X disks fails, you can still read blocks on the OK drives with just one read operation but you need X-1 read operations for every read operation on the failed drive. Writes on OK drives have the same two reads/two writes as before, (only if the failed drive contained the parity for this block you can skip the additional two reads and one write). If however you need to write on the the failed drive, then you need to read every other X-1 drive in the array to first reconstruct the missing data and then you can calculate and write the new parity. (and then you throw away the actual data that you were going to write because the drive that you could write it to is gone...) Example: You have your three 1.5TB drives A B C in an array and C fails. In this situation you'd want to treat your drives as carefully as possible because one more failure and all your data is gone. Unfortunately continued operating in fail condition will put your remaining drives under much more stress than usually. Reading will cause twice the read operations on your remaining drives. block: n n+1 n+2 OK State : a b c Failstate: a b ab Writing (on a small array) will produce the same load of two reads and two writes average per write. block: n n+1n+2 OK:acAC baBA cbCB FAIL: A baBA baB Confusingly enough the read load per drive doesn't change if you have more than three drives in your array. Reads will still produce on average double the load in failed state. Writes on a failed array seem to produce the same load as on an OK array. But this is only true for very small arrays. If you add more disks you'll see that the read penalty grows for writing blocks where the data disk is missing and you need to read all other drives in order to update th parity. Reconstruction of you array after adding a new drive will take a long time and most of complete array failures (i.e. data lost forever) occure during the rebuilding phase, not during in the fail state. Thats simply because you put a lot of stress on your drives (that probably come from same batch as the one that already failed). Depending on the number and nature of your drives and the host connection they have, the limiting factor can read performance (you need to read X-1 drives completely) or it can be the write performance if your disk is slower on sustained writing than on reading. Remember that you need to read and write a whole disks worth of data, not just the used parts. Example: Your drives have 1.5tb and we assume that you have a whoopin 100MB/s on read as well as on write. (pretty much the fastest there currently is). You need to read 3tb as well as write 1.5tb. if your system can handle the load in parallel you can treat it as just writing one 1.5tb drive. 150mb/100mb/s/60s/m makes 250 minutes or 4 hours and 10 minutes. I am curious if you can still use the system under such an io load. Anybody with experience on this? Anyway the reconstruction rate can be tuned via the proc fs. Now for the raid 1+0 alternative with the same
Re: [vdr] [OT] NVidia ION mini-ITX arriving
On Wed, May 13, 2009 at 07:02:16PM +0200, Martin Emrich wrote: Has anybody already spotted a street price for the different variants? Especially the one with the external PSU looks interesting... Try http://www.preisroboter.de/search.php?search=ZOTAC+ION cheers -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] Incorrect wakeup time
Hi Falk, On Wed, Apr 22, 2009 at 02:43:07PM +0200, Falk Spitzberg wrote: Hello, while playing with ACPI wakeup, i noticed that VDR always sets a wakeuptime of now+30 minutes, when the timeframe from 'now' to 'begin of next recording' is less than 30 Minutes. Is that intended behaviour? Apparently it is: INSTALL: If a timer is currently recording, or a recording would start within the next 30 minutes (default for the Min. event timeout setup parameter), and the user insists in shutting down now, the first and second parameter will correspond to a time that is Min. event timeout minutes in the future. I walked into the same trap when first playing with the automatic shutdown. Maybe the syslog output could be improved to indicate if somebody hits this limit. cheers -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr
Re: [vdr] Where do you live and what kind of broadcast do you receive?
Country: Germany (Berlin) Transmission: DVB-T Encoding: MPEG-2 SD (27 stations) Receivers: - MSI digiVox mini II rev.3 (af9015 driver) - Fujitsu-Siemens DVB-T Mobile TV Tuner (vp7045) (and still looking for a receider with lower power consumption) -henrik ___ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr