Re: New XFS benchmarks using David Chinner's recommendations for XFS-based optimizations.
0maxresident)k 0inputs+0outputs (6major+528minor)pagefaults 0swaps extract speed with 1024 chunk: 26.92user 5.69system 0:36.51elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+525minor)pagefaults 0swaps 27.18user 5.43system 0:36.39elapsed 89%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+528minor)pagefaults 0swaps 27.04user 5.60system 0:36.27elapsed 90%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps extract speed with 2048 chunk: 26.97user 5.63system 0:36.99elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+525minor)pagefaults 0swaps 26.98user 5.62system 0:36.90elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+527minor)pagefaults 0swaps 27.15user 5.44system 0:37.06elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps extract speed with 4096 chunk: 27.11user 5.54system 0:38.96elapsed 83%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps 27.09user 5.55system 0:38.85elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+527minor)pagefaults 0swaps 27.12user 5.52system 0:38.80elapsed 84%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+528minor)pagefaults 0swaps extract speed with 8192 chunk: 27.04user 5.57system 0:43.54elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps 27.15user 5.49system 0:43.52elapsed 75%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps 27.11user 5.52system 0:43.66elapsed 74%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+528minor)pagefaults 0swaps extract speed with 16384 chunk: 27.25user 5.45system 0:52.18elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+526minor)pagefaults 0swaps 27.18user 5.52system 0:52.54elapsed 62%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+527minor)pagefaults 0swaps 27.17user 5.50system 0:51.38elapsed 63%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (6major+525minor)pagefaults 0swaps Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: /proc/mdstat docs (was Re: Few questions)
many thanks . david. It is very useful. On 12/8/07, David Greaves [EMAIL PROTECTED] wrote: Michael Makuch wrote: So my questions are: ... - Is this a.o.k for a raid5 array? So I realised that /proc/mdstat isn't documented too well anywhere... http://linux-raid.osdl.org/index.php/Mdstat Comments welcome... David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mounting raid5 with different unit values
Well... this thing actually works just fine with a newer kernel ( 2.6.18-8-el5 centos5 ). I managed to mount / mkfs.xfs over raid5 with a pseudo raid5 unit size, and with the appropriate raid 5 patches and user space access-pattern, I elimintaed in 99% cases the read penalty . I sincerly hope I won't be getting any crashes with this file system tunnings. so ... first, chris and all you xfs guys, many many thanks. Chris, How dangerous these tunnings are ? Am I to expect weird behaviour of the file system ? On 10/8/07, Chris Wedgwood [EMAIL PROTECTED] wrote: On Sun, Oct 07, 2007 at 11:48:14AM -0400, Justin Piszcz wrote: man mount :) Ah of course. But those will be more restrictive that what you can specify when you make the file-system (because mkfs.xfs can aligned the AGs to suit). -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Superblocks
why you zeroing hdd ? should you clear sdd? On 10/26/07, Greg Cormier [EMAIL PROTECTED] wrote: Can someone help me understand superblocks and MD a little bit? I've got a raid5 array with 3 disks - sdb1, sdc1, sdd1. --examine on these 3 drives shows correct information. However, if I also examine the raw disk devices, sdb and sdd, they also appear to have superblocks with some semi valid looking information. sdc has no superblock. How can I clear these? If I unmount my raid, stop md0, it won't clear it. [EMAIL PROTECTED] ~]# mdadm --zero-superblock /dev/hdd mdadm: Couldn't open /dev/hdd for write - not zeroing I'd like to rule out these oddities before I start on my next troubleshooting of why my array rebuilds every time I reboot :) Thanks, Greg - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS on x86_64 Linux Question
Justin hello I have tested 32 to 64 bit porting of linux raid5 and xfs and LVM it worked. though i cannot say I have tested throughly. it was a POC. On 4/28/07, Justin Piszcz [EMAIL PROTECTED] wrote: With correct CC'd address. On Sat, 28 Apr 2007, Justin Piszcz wrote: Hello-- Had a quick question, if I re-provision a host with an Intel Core Duo CPU with x86_64 Linux; I create a software raid array and use the XFS filesystem-- all in 64bit space... If I boot a recovery image such as Knoppix, it will not be able to work on the filesystem correct? I would need a 64bit live CD? Does the same apply to software raid? Can I mount a software raid created in a 64bit environment in a 32bit environment? Justin. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
On 4/16/07, Raz Ben-Jehuda(caro) [EMAIL PROTECTED] wrote: On 4/13/07, Neil Brown [EMAIL PROTECTED] wrote: On Saturday March 31, [EMAIL PROTECTED] wrote: 4. I am going to work on this with other configurations, such as raid5's with more disks and raid50. I will be happy to hear your opinion on this matter. what puzzles me is why deadline must be so long as 10 ms? the less deadline the more reads I am getting. I've finally had a bit of a look at this. The extra reads are being caused because for the 3msec unplug timeout. Once you plug a queue it will automatically get unplugged 3 msec later. When this happens, any stripes that are on the pending list (waiting to see if more blocks will be written to them) get processed and some pre-reading happens. If you remove the 3msec timeout (I changed it to 300msec) in block/ll_rw_blk.c, the reads go away. However that isn't a good solution. Your patch effectively ensures that a stripe gets to last at least N msec before being unplugged and pre-reading starts. Why does it need to be 10 msec? Let's see. When you start writing, you will quickly fill up the stripe cache and then have to wait for stripes to be fully written and become free before you can start attaching more write requests. You could have to wait for a full chunk-wide stripe to be written before another chunk of stripes can proceed. The first blocks of the second stripe could stay in the stripe cache for the time it takes to write out a stripe. With a 1024K chunk size and 30Meg/second write speed it will take 1/30 of a second to write out a chunk-wide stripe, or about 33msec. So I'm surprised you get by with a deadline of 'only' 10msec. Maybe there is some over-lapping of chunks that I wasn't taking into account (I did oversimplify the model a bit). So, what is the right heuristic to use to determine when we should start write-processing on an incomplete stripe? Obviously '3msec' is bad. It seems we don't want to start processing incomplete stripes while there are full stripes being written, but we also don't want to hold up incomplete stripes forever if some other thread is successfully writing complete stripes. So maybe something like this: - We keep a (cyclic) counter of the number of stripes on which we have started write, and the number which have completed. - every time we add a write request to a stripe, we set the deadline to 3msec in the future, and we record in the stripe the current value of the number that have started write. - We process a stripe requiring preread when both the deadline has expired, and the count of completed writes reaches the recorded count of commenced writes. Does that make sense? Would you like to try it? NeilBrown Neil Hello I have been doing some thinking. I feel we should take a different path here. In my tests I actually accumulate the user's buffers and when ready I submit them, an elevator like algorithm. The main problem is the amount of IO's the stripe cache can hold which is too small. My suggestion is to add an elevator of bios before moving them to the stripe cache, trying to postpone as much as needed allocation of a new stripe. This way we will be able to move as much as IOs to the raid logic without congesting it and still filling stripes if possible. Psuedo code; make_request() ... if IO direction is WRITE and IO not in stripe cache add IO to raid elevator .. raid5d() ... Is there a set of IOs in raid elevator such that they make a full stripe move IOs to raid handling while oldest IO in raid elevator is deadlined( 3ms ? ) move IO to raid handling Does it make any sense ? thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
On 4/2/07, Dan Williams [EMAIL PROTECTED] wrote: On 3/30/07, Raz Ben-Jehuda(caro) [EMAIL PROTECTED] wrote: Please see bellow. On 8/28/06, Neil Brown [EMAIL PROTECTED] wrote: On Sunday August 13, [EMAIL PROTECTED] wrote: well ... me again Following your advice I added a deadline for every WRITE stripe head when it is created. in raid5_activate_delayed i checked if deadline is expired and if not i am setting the sh to prereadactive mode as . This small fix ( and in few other places in the code) reduced the amount of reads to zero with dd but with no improvement to throghput. But with random access to the raid ( buffers are aligned by the stripe width and with the size of stripe width ) there is an improvement of at least 20 % . Problem is that a user must know what he is doing else there would be a reduction in performance if deadline line it too long (say 100 ms). So if I understand you correctly, you are delaying write requests to partial stripes slightly (your 'deadline') and this is sometimes giving you a 20% improvement ? I'm not surprised that you could get some improvement. 20% is quite surprising. It would be worth following through with this to make that improvement generally available. As you say, picking a time in milliseconds is very error prone. We really need to come up with something more natural. I had hopped that the 'unplug' infrastructure would provide the right thing, but apparently not. Maybe unplug is just being called too often. I'll see if I can duplicate this myself and find out what is really going on. Thanks for the report. NeilBrown Neil Hello. I am sorry for this interval , I was assigned abruptly to a different project. 1. I'd taken a look at the raid5 delay patch I have written a while ago. I ported it to 2.6.17 and tested it. it makes sounds of working and when used correctly it eliminates the reads penalty. 2. Benchmarks . configuration: I am testing a raid5 x 3 disks with 1MB chunk size. IOs are synchronous and non-buffered(o_direct) , 2 MB in size and always aligned to the beginning of a stripe. kernel is 2.6.17. The stripe_delay was set to 10ms. Attached is the simple_write code. command : simple_write /dev/md1 2048 0 1000 simple_write raw writes (O_DIRECT) sequentially starting from offset zero 2048 kilobytes 1000 times. Benchmark Before patch sda1848.00 8384.00 50992.00 8384 50992 sdb1995.00 12424.00 51008.00 12424 51008 sdc1698.00 8160.00 51000.00 8160 51000 sdd 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 md1 450.00 0.00102400.00 0 102400 Benchmark After patch sda 389.11 0.00128530.69 0 129816 sdb 381.19 0.00129354.46 0 130648 sdc 383.17 0.00128530.69 0 129816 sdd 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 md11140.59 0.00259548.51 0 262144 As one can see , no additional reads were done. One can actually calculate the raid's utilization: n-1/n * ( single disk throughput with 1M writes ) . 3. The patch code. Kernel tested above was 2.6.17. The patch is of 2.6.20.2 because I have noticed a big code differences between 17 to 20.x . This patch was not tested on 2.6.20.2 but it is essentialy the same. I have not tested (yet) degraded mode or any other non-common pathes. This is along the same lines of what I am working on, new cache policies for raid5/6, so I want to give it a try as well. Unfortunately gmail has mangled your patch. Can you resend as an attachment? patch: malformed patch at line 10: (((conf)-stripe_hashtbl[((sect) STRIPE_SHIFT) HASH_MASK])) Thanks, Dan Dan hello. Attached are the patches. Also , I have added another test unit : random_writev. It is not much of a code but it does the work. It tests writing a vector .it shows the same results as writing using a single buffer. What is the new cache poilcies ? Please note ! I haven't indented the patch nor did the instructions according to SubmitingPatches document. If Neil would approve this patch or parts of it, I will do so. # Benchmark 3: Testing 8 disks raid5. Tyan Numa dual (amd) CPU machine, with 8 sata maxtor disks, controller is promise in jbod mode. raid conf: md1 : active raid5 sda2[0] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb2[1] 3404964864 blocks level 5, 1024k chunk, algorithm 2 [8/8] [] In order to achieve zero reads I had to tune the deadline to 20ms
Re: raid5 write performance
On 3/31/07, Bill Davidsen [EMAIL PROTECTED] wrote: Raz Ben-Jehuda(caro) wrote: Please see bellow. On 8/28/06, Neil Brown [EMAIL PROTECTED] wrote: On Sunday August 13, [EMAIL PROTECTED] wrote: well ... me again Following your advice I added a deadline for every WRITE stripe head when it is created. in raid5_activate_delayed i checked if deadline is expired and if not i am setting the sh to prereadactive mode as . This small fix ( and in few other places in the code) reduced the amount of reads to zero with dd but with no improvement to throghput. But with random access to the raid ( buffers are aligned by the stripe width and with the size of stripe width ) there is an improvement of at least 20 % . Problem is that a user must know what he is doing else there would be a reduction in performance if deadline line it too long (say 100 ms). So if I understand you correctly, you are delaying write requests to partial stripes slightly (your 'deadline') and this is sometimes giving you a 20% improvement ? I'm not surprised that you could get some improvement. 20% is quite surprising. It would be worth following through with this to make that improvement generally available. As you say, picking a time in milliseconds is very error prone. We really need to come up with something more natural. I had hopped that the 'unplug' infrastructure would provide the right thing, but apparently not. Maybe unplug is just being called too often. I'll see if I can duplicate this myself and find out what is really going on. Thanks for the report. NeilBrown Neil Hello. I am sorry for this interval , I was assigned abruptly to a different project. 1. I'd taken a look at the raid5 delay patch I have written a while ago. I ported it to 2.6.17 and tested it. it makes sounds of working and when used correctly it eliminates the reads penalty. 2. Benchmarks . configuration: I am testing a raid5 x 3 disks with 1MB chunk size. IOs are synchronous and non-buffered(o_direct) , 2 MB in size and always aligned to the beginning of a stripe. kernel is 2.6.17. The stripe_delay was set to 10ms. Attached is the simple_write code. command : simple_write /dev/md1 2048 0 1000 simple_write raw writes (O_DIRECT) sequentially starting from offset zero 2048 kilobytes 1000 times. Benchmark Before patch sda1848.00 8384.00 50992.00 8384 50992 sdb1995.00 12424.00 51008.00 12424 51008 sdc1698.00 8160.00 51000.00 8160 51000 sdd 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 md1 450.00 0.00102400.00 0 102400 Benchmark After patch sda 389.11 0.00128530.69 0 129816 sdb 381.19 0.00129354.46 0 130648 sdc 383.17 0.00128530.69 0 129816 sdd 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 md11140.59 0.00259548.51 0 262144 As one can see , no additional reads were done. One can actually calculate the raid's utilization: n-1/n * ( single disk throughput with 1M writes ) . 3. The patch code. Kernel tested above was 2.6.17. The patch is of 2.6.20.2 because I have noticed a big code differences between 17 to 20.x . This patch was not tested on 2.6.20.2 but it is essentialy the same. I have not tested (yet) degraded mode or any other non-common pathes. My weekend is pretty taken, but I hope to try putting this patch against 2.6.21-rc6-git1 (or whatever is current Monday), to see not only how it works against the test program, but also under some actual load. By eye, my data should be safe, but I think I'll test on a well backed machine anyway ;-) Bill. This test program WRITES data to a raw device, it will destroy everything you have on the RAID. If you want to use a file system test unit, as mentioned I have one for XFS file system. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
@@ * Free stripes pool */ atomic_tactive_stripes; + atomic_tdeadline_ms; struct list_headinactive_list; wait_queue_head_t wait_for_stripe; wait_queue_head_t wait_for_overlap; 3. I have also tested it over XFS file system ( I'd written a special copy method for xfs for this purpose, called r5cp ). I am getting much better numbers with this patch . sdd is the source file system and sd[abc] contain the raid. xfs is mounted over /dev/md1. stripe_deadline=0ms ( disabled) Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 sda 90.10 7033.66 37409.90 7104 37784 sdb 94.06 7168.32 37417.82 7240 37792 sdc 89.11 7215.84 37417.82 7288 37792 sdd 75.25 77053.47 0.00 77824 0 md1 319.80 0.00 77053.47 0 77824 stripe_deadline=10ms ( enabled) Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hda 0.00 0.00 0.00 0 0 md0 0.00 0.00 0.00 0 0 sda 113.00 0.00 67648.00 0 67648 sdb 113.00 0.00 67648.00 0 67648 sdc 113.00 0.00 67648.00 0 67648 sdd 128.00131072.00 0.00 131072 0 md1 561.00 0.00135168.00 0 135168 XFS did not crash nor suffer from any other incosistencies so far. Yet I have only begon. 4. I am going to work on this with other configurations, such as raid5's with more disks and raid50. I will be happy to hear your opinion on this matter. what puzzles me is why deadline must be so long as 10 ms? the less deadline the more reads I am getting. Many thanks Raz #include iostream #include stdio.h #include string #include stddef.h #include sys/time.h #include stdlib.h #include sys/types.h #include sys/stat.h #include fcntl.h #include unistd.h #include libaio.h #include time.h #include stdio.h #include errno.h using namespace std; int main (int argc, char *argv[]) { if (argc5){ cout usage device name size to write in kb offset in kb loop endl; return 0; } char* dev_name = argv[1]; int fd = open(dev_name, O_LARGEFILE | O_DIRECT | O_WRONLY , 777 ); if (fd0){ perror(open ); return (-1); } long long write_sz_bytes = ( (long long)atoi(argv[2]))10; long long offset_sz_bytes = atoi(argv[3])10; int loops = atoi(argv[4]); char* buffer = (char*)valloc(write_sz_bytes); if (!buffer) { perror(alloc : ); return -1; } memset(buffer,0x00,write_sz_bytes); while( (--loops)0 ){ int ret = pwrite64(fd,buffer,write_sz_bytes,offset_sz_bytes); if (ret0) { perror(failed to write: ); printf(read_sz_kb=%d offset_sz_kb=%d\n,write_sz_bytes,offset_sz_bytes); return -1; } offset_sz_bytes += write_sz_bytes; printf(writing %lld bytes at offset %lld\n,write_sz_bytes,offset_sz_bytes); } return(0); }
Re: slow 'check'
I suggest you test all drives concurrently with dd. load dd on sda , then sdb slowly one after the other and see whether the throughput degrades. use iostat. furtheremore, dd is not the measure for random access. On 2/10/07, Bill Davidsen [EMAIL PROTECTED] wrote: Justin Piszcz wrote: On Sat, 10 Feb 2007, Eyal Lebedinsky wrote: Justin Piszcz wrote: On Sat, 10 Feb 2007, Eyal Lebedinsky wrote: I have a six-disk RAID5 over sata. First two disks are on the mobo and last four are on a Promise SATA-II-150-TX4. The sixth disk was added recently and I decided to run a 'check' periodically, and started one manually to see how long it should take. Vanilla 2.6.20. A 'dd' test shows: # dd if=/dev/md0 of=/dev/null bs=1024k count=10240 10240+0 records in 10240+0 records out 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec) This is good for this setup. A check shows: $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] 1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UU] [] check = 0.8% (2518144/312568576) finish=2298.3min speed=2246K/sec unused devices: none which is an order of magnitude slower (the speed is per-disk, call it 13MB/s for the six). There is no activity on the RAID. Is this expected? I assume that the simple dd does the same amount of work (don't we check parity on read?). I have these tweaked at bootup: echo 4096 /sys/block/md0/md/stripe_cache_size blockdev --setra 32768 /dev/md0 Changing the above parameters seems to not have a significant effect. The check logs the following: md: data-check of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for data-check. md: using 128k window, over a total of 312568576 blocks. Does it need a larger window (whatever a window is)? If so, can it be set dynamically? TIA -- Eyal Lebedinsky ([EMAIL PROTECTED]) http://samba.org/eyal/ attach .zip as .dat As you add disks onto the PCI bus it will get slower. For 6 disks you should get faster than 2MB/s however.. You can try increasing the min speed of the raid rebuild. Interesting - this does help. I wonder why it used much more i/o by default before. It still uses only ~16% CPU. # echo 2 /sys/block/md0/md/sync_speed_min # echo check /sys/block/md0/md/sync_action ... wait about 10s for the process to settle... # cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] 1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UU] [] check = 0.1% (364928/312568576) finish=256.6min speed=20273K/sec # echo idle /sys/block/md0/md/sync_action Raising it further only manages about 21MB/s (the _max is set to 200MB/s) as expected; this is what the TX4 delivers with four disks. I need a better controller (or is the linux driver slow?). Justin. You are maxing out the PCI Bus, remember each bit/parity/verify operation has to go to each disk. If you get an entirely PCI-e system you will see rates 50-100-150-200MB/s easily. I used to have 10 x 400GB drives on a PCI bus, after 2 or 3 drives, you max out the PCI bus, this is why you need PCI-e, each slot has its own lane of bandwidth. 21MB/s is about right for 5-6 disks, when you go to 10 it drops to about 5-8MB/s on a PCI system. Wait, let's say that we have three drives and 1m chunk size. So we read 1M here, 1M there, and 1M somewhere else, and get 2M data and 1M parity which we check. With five we would read 4M data and 1M parity, but have 4M checked. The end case is that for each stripe we read N*chunk bytes and verify (N-1)*chunk. In fact the data is (N-1)/N of the stripe, and the percentage gets higher (not lower) as you add drives. I see no reason why more drives would be slower, a higher percentage of the bytes read are data. That doesn't mean that you can't run out of Bus bandwidth, but number of drives is not obviously the issue. -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: slow 'check'
On 2/10/07, Eyal Lebedinsky [EMAIL PROTECTED] wrote: I have a six-disk RAID5 over sata. First two disks are on the mobo and last four are on a Promise SATA-II-150-TX4. The sixth disk was added recently and I decided to run a 'check' periodically, and started one manually to see how long it should take. Vanilla 2.6.20. A 'dd' test shows: # dd if=/dev/md0 of=/dev/null bs=1024k count=10240 10240+0 records in 10240+0 records out 10737418240 bytes transferred in 84.449870 seconds (127145468 bytes/sec) try dd with bs of 4x(5x256) = 5 M. This is good for this setup. A check shows: $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] 1562842880 blocks level 5, 256k chunk, algorithm 2 [6/6] [UU] [] check = 0.8% (2518144/312568576) finish=2298.3min speed=2246K/sec unused devices: none which is an order of magnitude slower (the speed is per-disk, call it 13MB/s for the six). There is no activity on the RAID. Is this expected? I assume that the simple dd does the same amount of work (don't we check parity on read?). I have these tweaked at bootup: echo 4096 /sys/block/md0/md/stripe_cache_size blockdev --setra 32768 /dev/md0 Changing the above parameters seems to not have a significant effect. Stripe cache size is less effective than previous versions of raid5 since in some cases it is being bypassed. Why do you check random access to the raid and not sequential access. The check logs the following: md: data-check of RAID array md0 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 20 KB/sec) for data-check. md: using 128k window, over a total of 312568576 blocks. Does it need a larger window (whatever a window is)? If so, can it be set dynamically? TIA -- Eyal Lebedinsky ([EMAIL PROTECTED]) http://samba.org/eyal/ attach .zip as .dat - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bad performance on RAID 5
iorder to understand what is going in your system you should: 1. determine the access pattern to the volume. meaning: sequetial ? random access ? sync io ? async io ? mostly read ? mostly write ? Are you using small buffers ? big buffers ? 2. you should test the controller capabilty. meaning : see if dd'in for each disk in the system seperately reduces the total throughput. On 1/18/07, Sevrin Robstad [EMAIL PROTECTED] wrote: I've tried to increase the cache size - I can't measure any difference. Raz Ben-Jehuda(caro) wrote: did u increase the stripe cache size ? On 1/18/07, Justin Piszcz [EMAIL PROTECTED] wrote: Sevrin Robstad wrote: I'm suffering from bad performance on my RAID5. a echo check /sys/block/md0/md/sync_action gives a speed at only about 5000K/sec , and HIGH load average : # uptime 20:03:55 up 8 days, 19:55, 1 user, load average: 11.70, 4.04, 1.52 kernel is 2.6.18.1.2257.fc5 mdadm is v2.5.5 the system consist of an athlon XP1,2GHz and two Sil3114 4port S-ATA PCI cards with a total of 6 250gb S-ATA drives connected. [EMAIL PROTECTED] ~]# mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Tue Dec 5 00:33:01 2006 Raid Level : raid5 Array Size : 1218931200 (1162.46 GiB 1248.19 GB) Device Size : 243786240 (232.49 GiB 249.64 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Wed Jan 17 23:14:39 2007 State : clean Active Devices : 6 Working Devices : 6 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 256K UUID : 27dce477:6f45d11b:77377d08:732fa0e6 Events : 0.58 Number Major Minor RaidDevice State 0 810 active sync /dev/sda1 1 8 171 active sync /dev/sdb1 2 8 332 active sync /dev/sdc1 3 8 493 active sync /dev/sdd1 4 8 654 active sync /dev/sde1 5 8 815 active sync /dev/sdf1 [EMAIL PROTECTED] ~]# Sevrin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html If they are on the PCI bus, that is about right, you probably should be getting 10-15MB/s, but it is about right. If you had each drive on its own PCI-e controller, then you would get much faster speeds. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Odd (slow) RAID performance
On 12/12/06, Bill Davidsen [EMAIL PROTECTED] wrote: Neil Brown wrote: On Friday December 8, [EMAIL PROTECTED] wrote: I have measured very slow write throughput for raid5 as well, though 2.6.18 does seem to have the same problem. I'll double check and do a git bisect and see what I can come up with. Correction... it isn't 2.6.18 that fixes the problem. It is compiling without LOCKDEP or PROVE_LOCKING. I remove those and suddenly a 3 drive raid5 is faster than a single drive rather than much slower. Bill: Do you have LOCKDEP or PROVE_LOCKING enabled in your .config ?? YES and NO respectively. I did try increasing the stripe_cache_size and got better but not anywhere near max performance, perhaps the PROVE_LOCKING is still at fault, although performance of RAID-0 is as expected, so I'm dubious. In any case, by pushing the size from 256 to 1024, 4096, and finally 10240 I was able to raise the speed to 82MB/s, which is right at the edge of what I need. I want to read the doc on stripe_cache_size before going huge, if that's K 10MB is a LOT of cache when 256 works perfectly in RAID-0. I noted that the performance really was bad using 2k write, before increasing the stripe_cache, I will repeat that after doing some other real work things. Any additional input appreciated, I would expect the speed to be (Ndisk - 1)*SingleDiskSpeed without a huge buffer, so the fact that it isn't makes me suspect there's unintended serialization or buffering, even when not need (and NOT wanted). Thanks for the feedback, I'm updating the files as I type. http://www.tmr.com/~davidsen/RAID_speed http://www.tmr.com/~davidsen/FC6-config -- bill davidsen [EMAIL PROTECTED] CTO TMR Associates, Inc Doing interesting things with small computers since 1979 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Bill helllo I have been working on raid5 performance write throughout. The whole idea is the access pattern. One should buffers with respect to the size of stripe. this way you will be able to eiliminate the undesired reads. By accessing it correctly I have managed reach a write throughout with respect to the number of disks in the raid. -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Linux: Why software RAID?
Furthremore , hw controller are much less feaure rich than sw raid. many different stripe sizes, stripe cache tunning On 25 Aug 2006 23:50:34 -0400, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hardware RAID can be (!= is) more tolerant of serious drive failures where a single drive locks up the bus. A high-end hardware RAID card may be designed with independent controllers so a single drive failure cannot take other spindles down with it. The same can be accomplished with sw RAID of course if the builder is careful to use multiple PCI cards, etc. Sw RAID over your motherboard's onboard controllers leaves you vulnerable. Which is exactly why I *like* SW RAID - I can, and do, have the mirrors span controllers so a whole controller can fail without taking down the system. With HW RAID cards, if your controller dies, you're SOL. - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 write performance
well ... me again Following your advice I added a deadline for every WRITE stripe head when it is created. in raid5_activate_delayed i checked if deadline is expired and if not i am setting the sh to prereadactive mode as . This small fix ( and in few other places in the code) reduced the amount of reads to zero with dd but with no improvement to throghput. But with random access to the raid ( buffers are aligned by the stripe width and with the size of stripe width ) there is an improvement of at least 20 % . Problem is that a user must know what he is doing else there would be a reduction in performance if deadline line it too long (say 100 ms). raz On 7/3/06, Neil Brown [EMAIL PROTECTED] wrote: On Sunday July 2, [EMAIL PROTECTED] wrote: Neil hello. I have been looking at the raid5 code trying to understand why writes performance is so poor. raid5 write performance is expected to be poor, as you often need to pre-read data or parity before the write can be issued. If I am not mistaken here, It seems that you issue a write in size of one page an no more no matter what buffer size I am using . I doubt the small write size would contribute more than a couple of percent to the speed issue. Scheduling (when to write, when to pre-read, when to wait a moment) is probably much more important. 1. Is this page is directed only to parity disk ? No. All drives are written with one page units. Each request is divided into one-page chunks, these one page chunks are gathered - where possible - into strips, and the strips are handled as units (Where a strip is like a stripe, only 1 page wide rather then one chunk wide - if that makes sense). 2. How can i increase the write throughout ? Look at scheduling patterns - what order are the blocks getting written, do we pre-read when we don't need to, things like that. The current code tries to do the right thing, and it certainly has been worse in the past, but I wouldn't be surprised if it could still be improved. NeilBrown -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5.h documentation
Neil hello. you say raid5.h: ... * Whenever the delayed queue is empty and the device is not plugged, we * move any strips from delayed to handle and clear the DELAYED flag and set PREREAD_ACTIVE. ... i do not understand how can one move from delayed if delayed is empty . thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 write performance
Neil hello. I have been looking at the raid5 code trying to understand why writes performance is so poor. If I am not mistaken here, It seems that you issue a write in size of one page an no more no matter what buffer size I am using . 1. Is this page is directed only to parity disk ? 2. How can i increase the write throughout ? Thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
read perfomance patchset
Neil hello if i am not mistaken here: in first instance of : if(bi) ... ... you return without setting to NULL +static struct bio *remove_bio_from_retry(raid5_conf_t *conf) +{ + struct bio *bi; + + bi = conf-retry_read_aligned; + if (bi) { -- return bi; -- conf-retry_read_aligned = NULL; + } + bi = conf-retry_read_aligned_list; + if(bi) { + conf-retry_read_aligned = bi-bi_next; + bi-bi_next = NULL; + bi-bi_phys_segments = 1; /* biased count of active stripes */ + bi-bi_hw_segments = 0; /* count of processed stripes */ + } + + return bi; +} -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid 5 read performance
Neil hello Sorry for the delay. too many things to do. I have implemented all said in : http://www.spinics.net/lists/raid/msg11838.html As always I have some questions: 1. mergeable_bvec I did not understand first i must admit. now i do not see how it differs from the one of raid0. so i actually copied it and renamed it. 2. statistics. i have added md statistics since the code does not reach the statics in make_request. it returns from make_request before that. 3. i have added the new retry list called toread_aligned to raid5_conf_t . hope this is correct. 4. your instructions are to add a failed bio to sh, but it does not say to handle it directly. i have tried it and something is missing here. raid5d handle stripes only if conf-handle_list is not empty. i added handle_stripe and and release_stripe of my own. this way i managed to get from the completion routine: R5: read error corrected!! message . ( i have tested by failing a ram disk ). 5. I am going to test the non common path heavily before submitting you the patch ( on real disks and use several file systems and several chunk sizes). It is quite a big patch so I need to know which kernel do you want me to use ? i am using poor 2.6.15. I thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
improving raid 5 performance
Neil hello. 1. i have applied the common path according to http://www.spinics.net/lists/raid/msg11838.html as much as i can. it looks ok in terms of throughput. before i continue to a non common path ( step 3 ) i do not understand raid0_mergeable_bvec entirely. as i understand the code checks alignment . i made a version for this purpose which looks like that: static int raid5_mergeable_bvec(request_queue_t *q, struct bio *bio, struct bio_vec *biovec) { mddev_t *mddev = q-queuedata; sector_t sector=bio-bi_sector+get_start_sect(bio-bi_bdev); int max; unsigned int chunk_sectors = mddev-chunk_size 9; unsigned int bio_sectors = bio-bi_size 9; max=(chunk_sectors-((sector(chunk_sectors-1))+bio_sectors))9; if (max 0){ printk(handle_aligned_read not aligned %d %d %d %lld\n,max,chunk_sectors,bio_sectors,sector); return -1; // Is bigger than one chunk size } // printk(handle_aligned_read aligned %d %d %d %lld\n,max,chunk_sectors,bio_sectors,sector); return max; } Questions: 1.1 why did you drop the max=0 case ? 1.2 what these lines mean ? do i need it ? if (max = biovec-bv_len bio_sectors == 0) return biovec-bv_len; else return max; } thank you Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid 5 read performance
Neil hello I am measuring read performance of two raid5 with 7 sata disks, chunk size 1MB. when i set the stripe_cache_size to 4096 i get 240 MB/s. IO'ing from the two raids ended with 270 MB/s. i have added a code in make_request which passes the raid5 logic in the case of read. it looks like this : static int make_request (request_queue_t *q, struct bio * bi) { . if ( conf-raid5_bypass_readbio_data_dir(bi) == READ ) { new_sector = raid5_compute_sector(bi-bi_sector, raid_disks, data_disks, dd_idx, pd_idx, conf); bi-bi_sector = new_sector; bi-bi_bdev = conf-disks[dd_idx].rdev-bdev; // // do some statics // disk_stat_inc(mddev-gendisk, ios[rw]); disk_stat_add(mddev-gendisk, sectors[rw], bio_sectors(bi)); // // make upper level to the work for me // return 1; } ... } it increased the performance to 440 MB/s. Question : What is the cost of not walking trough the raid5 code in the case of READ ? if i add and error handling code will it be suffice ? thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cheap Clustered FS
may be lustre On 4/13/06, Erik Mouw [EMAIL PROTECTED] wrote: On Mon, Apr 10, 2006 at 05:24:34PM -0400, Jon Miller wrote: I have two machines which have redundant paths to the same shared scsi disk. I've had no problem creating the multipath'ed device md0 to handle my redundant pathing. But now I'd like to use a simple FS, such as ext3, mounted rw on the first machine and ro on the second machine. The idea is that the second machine, mounting the FS ro, would be able to read any new data being written in the FS. Everything has been rather easy to setup, but anything being created on the FS is not seen on the other machine with the FS mounted ro. That is, I can create a file on the first machine and I never see that file from the second machine until I remount the FS. At this point, I am actually trying to avoid GFS, OCFS, veritas clustered FS options as well as NFS. If there was a simple hack, that I'm missing, to enable the updates to the FS to be seen in realtime, then I'd actually prefer that method. Any help would be appreciated. I'm affraid the only way out is indeed GFS or OCFS. Those filesystems are specifically designed to be mounted by several hosts and (should) have caching and locking issues covered. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cheap Clustered FS
Eric if the file system is not clustered awared it would never work. the reason is very simple. a file system is a driver. one driver is a state machine running on machine 1 and the other is running on machine 2. there is no synch between the two. if u are a progmmer , try implement a userpace bitmap reader meaning, if machine 1 is the writer, whenever a file is created or modified , send the bitmap to machine 2 reader. i can give an xfs bitmap reader if u want . On 4/13/06, Jon Miller [EMAIL PROTECTED] wrote: Yeah, the Lustre FS looks very promising... I've even concidered the CODA filesystem, but since I'll be implementing this solution where management wants support they pay for, it will most likely be GFS as my servers are RHAS 3.0 machines. Thanks for the help, though. BTW, while I was trying to get my _simple_ ext3 solution working, I tried using mount options such as 'sync' and 'dirsync' but as you already know they didn't help. Just for my own benefit, is the reason none of these options would work is because all FS IO is ran through the VFS and that is where the caching occurs? In particular, I want to say that the buffer_head kernel buffer is the specific slab that is used for the caching? Thanks, Jon On 4/13/06, Raz Ben-Jehuda(caro) [EMAIL PROTECTED] wrote: may be lustre On 4/13/06, Erik Mouw [EMAIL PROTECTED] wrote: On Mon, Apr 10, 2006 at 05:24:34PM -0400, Jon Miller wrote: I have two machines which have redundant paths to the same shared scsi disk. I've had no problem creating the multipath'ed device md0 to handle my redundant pathing. But now I'd like to use a simple FS, such as ext3, mounted rw on the first machine and ro on the second machine. The idea is that the second machine, mounting the FS ro, would be able to read any new data being written in the FS. Everything has been rather easy to setup, but anything being created on the FS is not seen on the other machine with the FS mounted ro. That is, I can create a file on the first machine and I never see that file from the second machine until I remount the FS. At this point, I am actually trying to avoid GFS, OCFS, veritas clustered FS options as well as NFS. If there was a simple hack, that I'm missing, to enable the updates to the FS to be seen in realtime, then I'd actually prefer that method. Any help would be appreciated. I'm affraid the only way out is indeed GFS or OCFS. Those filesystems are specifically designed to be mounted by several hosts and (should) have caching and locking issues covered. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
How is an IO size are determnied ?
Neil/Jens Hello. Hope is this not too much bother for you. Question: how does the psuedo device ( /dev/md ) change the IOs sizes going down into the disks ? Explanation: I am using software raid5 , chunk size is 1024K, 4 disks. I have made a hook in make_request inorder to bypass the raid5 IO methodology .I need to control the amount of IOs going down into the disk and their sizes. the hook looks like this. static int make_request (request_queue_t *q, struct bio * bi){ ... if ( bypass_raid5 bio_data_dir(bi) == READ ) { new_sector = raid5_compute_sector(bi-bi_sector, raid_disks, data_disks, dd_idx, pd_idx, conf); bi-bi_sector = new_sector; bi-bi_bdev = conf-disks[dd_idx].rdev-bdev; return 1; } ... } I have compared the IOs sizes and numbers in the deadline elevator. it seems that an a single direct IO read of 1MB to a disk is divided into two 1/2 MB request_t ( though max_hw_sectors=2048) and when I go through the raid i am getting three request_t's 992 sectors followed by 64 sectors followed by 992 sectors. I have also recorded the IOs going in make_request in this scenario, it is composed of 8 124K and an additional 32K request. the test: My test is simple . I am reading the device in direct io mode and no file system in involved. could you explain this ? why I am not getting two 1/2 MB ? Could it be the slab cache ? ( biovec256) Thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question : raid bio sector size
I was refering to bios reaching make_request in raid5.c . I would be more precise. I am dd'ing dd if=/dev/md1 of=/dev/zero bs=1M count=1 skip=10 I have added the following printk in make_request printk (%d:,bio-bi_size) I am getting sector sizes. 512:512:512:512:512 I suppose they gathered in the elevator, but still why so small ? thank you raz. On 3/27/06, Neil Brown [EMAIL PROTECTED] wrote: On Monday March 27, [EMAIL PROTECTED] wrote: i have playing with raid5 and i noticed that the arriving bios sizes are 1 sector. why is that and where is it set ? bios arriving from where? bios from the filesystem to the raid5 device will be whatever size the fs wants to make them. bios from the raid5 device to the component devices will always be 1 page (typically 8 sectors). This is the size used by the stripe cache which is used to synchronise everything. NeilBrown -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question : raid bio sector size
man .. very very good. blockdev --getsz says 512. On 3/29/06, Neil Brown [EMAIL PROTECTED] wrote: On Wednesday March 29, [EMAIL PROTECTED] wrote: I was refering to bios reaching make_request in raid5.c . I would be more precise. I am dd'ing dd if=/dev/md1 of=/dev/zero bs=1M count=1 skip=10 I have added the following printk in make_request printk (%d:,bio-bi_size) I am getting sector sizes. 512:512:512:512:512 I suppose they gathered in the elevator, but still why so small ? Odd.. When I try that I get 4096 repeatedly. Which kernel are you using? What does blockdev --getbsz /dev/md1 say? Do you have a filesystem mounted on /dev/md1? If so, what sort of filesystem. NeilBrown -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
question : raid bio sector size
i have playing with raid5 and i noticed that the arriving bios sizes are 1 sector. why is that and where is it set ? thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 performance question
Neil. what is the stripe_cache exacly ? First , here are some numbers. Setting it to 1024 gives me 85 MB/s. Setting it to 4096 gives me 105 MB/s. Setting it to 8192 gives me 115 MB/s. the md.txt does not say much about it just that it is the number of entries. here are some tests i have made: test1: when i set the stripe_cache to zero and run: dd if=/dev/md1 of=/dev/zero bs=1M count=10 skip=63 i am getting 120MB/s. when i set the stripe cache to 4096 and : issue the same command i am getting 120 MB/s as well. test 2: I would describe what this tester does: It opens N descriptors over a device. It issues N IOs to the target and waits for the completion of each IO. When the IO is completed the tester has two choices: 1. calculate a new seek posistion over the target. 2. move sequetially to the next position. meaning , if one reads 1MB buffer, the next position is current+1M. I am using direct IO and asynchrnous IO. option 1 simulates non contigous files. option 2 simulates contiguous files. the above numbers were made with option 2. if i am using option 1 i am getting 95 MB/s with stripe_size=4096. A single disk in this manner ( option 1 ) gives ~28 MB/s. A single disk in scenario 2 gives ~30 MB/s. I understand the a question of the IO distribution is something to talk about. but i am submitting 250 IOs so i suppose to be heavy on the raid. Questions 1. how can the stripe size cache gives me a boost when i have total random access to the disk ? 2. Does direct IO passes this cache ? 3. How can a dd of 1 MB over 1MB chunck size acheive this high throughputs of 4 disks even if does not get the stripe cache benifits ? thank you raz. On 3/7/06, Neil Brown [EMAIL PROTECTED] wrote: On Monday March 6, [EMAIL PROTECTED] wrote: Neil Hello . I have a performance question. I am using raid5 stripe size 1024K over 4 disks. I assume you mean a chunksize of 1024K rather than a stripe size. With a 4 disk array, the stripe size will be 3 times the chunksize, and so could not possibly by 1024K. I am benchmarking it with an asynchronous tester. This tester submits 100 IOs of size of 1024 K -- as the stripe size. It reads raw io from the device, no file system is involved. I am making the following comparsion: 1. Reading 4 disks at the same time using 1 MB buffer in random manner. 2. Reading 1 raid5 device using 1MB buffer in random manner. If your chunk size is 1MB, then you will need larger sequential reads to get good throughput. You can also try increasing the size of the stripe cache in /sys/block/mdX/md/stripe_cache_size The units are in pages (normally 4K) per device. The default is 256 which fits only one stripe with a 1 Meg chunk size. Try 1024 ? NeilBrown I am getting terrible results in scenario 2. if scenario 1 gives 120 MB/s from 4 disks, the raid5 device gives 35 MB/s . it is like i am reading a single disk , but by looking at iostat i can see that all disks are active but with low throughput. Any idea ? Thank you. -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 performance question
Neil Hello . I have a performance question. I am using raid5 stripe size 1024K over 4 disks. I am benchmarking it with an asynchronous tester. This tester submits 100 IOs of size of 1024 K -- as the stripe size. It reads raw io from the device, no file system is involved. I am making the following comparsion: 1. Reading 4 disks at the same time using 1 MB buffer in random manner. 2. Reading 1 raid5 device using 1MB buffer in random manner. I am getting terrible results in scenario 2. if scenario 1 gives 120 MB/s from 4 disks, the raid5 device gives 35 MB/s . it is like i am reading a single disk , but by looking at iostat i can see that all disks are active but with low throughput. Any idea ? Thank you. -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 performance question
it reads raw. no filesystem whatsover. On 3/6/06, Gordon Henderson [EMAIL PROTECTED] wrote: On Mon, 6 Mar 2006, Raz Ben-Jehuda(caro) wrote: Neil Hello . I have a performance question. I am using raid5 stripe size 1024K over 4 disks. I am benchmarking it with an asynchronous tester. This tester submits 100 IOs of size of 1024 K -- as the stripe size. It reads raw io from the device, no file system is involved. I am making the following comparsion: 1. Reading 4 disks at the same time using 1 MB buffer in random manner. 2. Reading 1 raid5 device using 1MB buffer in random manner. I am getting terrible results in scenario 2. if scenario 1 gives 120 MB/s from 4 disks, the raid5 device gives 35 MB/s . it is like i am reading a single disk , but by looking at iostat i can see that all disks are active but with low throughput. Any idea ? Is this reading the block device direct, or via a filesystem? If the latter, what filesystem? If ext2/3 have you tried mkfs with a stride option? See: http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.11 Gordon -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NCQ general question
Is NCQ supported when setting the controller to JBOD instead of using HW raid? On 3/5/06, Eric D. Mudama [EMAIL PROTECTED] wrote: On 3/4/06, Steve Byan [EMAIL PROTECTED] wrote: On Mar 4, 2006, at 2:10 PM, Jeff Garzik wrote: Measurements on NCQ in the field show a distinct performance improvement... 30% has been measured on Linux. Nothing to sneeze at. Wow! 30% is amazing. I'd be interested in knowing how the costs break down; are these measurements published anywhere? Full-stroke random reads with small operations (4k or less) typically show 75-85% performance improvement, from the ability of a 7200rpm drive to carve 4ms out of their response time, as well as a huge chunk of seek distance. Random writes, since as you said they're already reordered with cache enabled, don't typically show any sort of increase in desktop applications. NCQ FUA writes or NCQ writes with cache disabled should show the same ballpark performance improvement as random reads in saturated workloads. Again however, this is for the full-stroke random case. Local area workloads need to be analyzed more thoroughly, and may differ in performance gain by manufacturer. --eric -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NCQ general question
Thank you Mr Garzik. Is there a list of all drivers and there features they give ? Raz. On 3/2/06, Jeff Garzik [EMAIL PROTECTED] wrote: Jens Axboe wrote: (don't top post) On Thu, Mar 02 2006, Raz Ben-Jehuda(caro) wrote: i can see the NCQ realy bother people. i am using a promise card sata TX4 150. does any of you has a patch for the driver so it would support NCQ ? I don't know of any documentation for the promise cards (or whether they support NCQ). Does the binary promise driver support NCQ? Jeff likely knows a lot more. The sata2 tx4 150 supports NCQ, and I have docs. sata tx4 150 does not support NCQ. Jeff -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 read performance
1. it is not good to use so many disks in one raid. this means that in degraded mode 10 disks would be needed to reconstruct one slice of data. 2. i did not understand what is raid purpose. 3. 10 MB/s is very slow. what sort of disks do u have ? 4. what is the raid stripe size ? On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote: - Original Message - From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED] To: JaniD++ [EMAIL PROTECTED] Cc: Linux RAID Mailing List linux-raid@vger.kernel.org Sent: Wednesday, January 04, 2006 2:49 PM Subject: Re: raid5 read performance 1. do you want the code ? Yes. If it is difficult to set. I use 4 big raid5 array (4 disk node), and the performace is not too good. My standalone disk can do ~50MB/s, but 11 disk in one raid array does only ~150Mbit/s. (With linear read using dd) At this time i think this is my systems pci-bus bottleneck. But on normal use, and random seeks, i am happy, if one disk-node can do 10MB/s ! :-( Thats why i am guessing this... 2. I managed to gain linear perfromance with raid5. it seems that both raid 5 and raid 0 are caching read a head buffers. raid 5 cached small amount of read a head while raid0 did not. Aham. But... I dont understand... You wrote that, the RAID5 is slower than RAID0. The read a head buffering/caching is bad for performance? Cheers, Janos On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote: - Original Message - From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED] To: Mark Hahn [EMAIL PROTECTED] Cc: Linux RAID Mailing List linux-raid@vger.kernel.org Sent: Wednesday, January 04, 2006 9:14 AM Subject: Re: raid5 read performance I guess i was not clear enough. i am using raid5 over 3 maxtor disks. the chunk size is 1MB. i mesured the io coming from one disk alone when I READ from it with 1MB buffers , and i know that it is ~32MB/s. I created raid0 over two disks and my throughput grown to 64 MB/s. Doing the same thing with raid5 ended in 32 MB/s. I am using async io since i do not want to wait for several disks when i send an IO. By sending a buffer which is striped aligned i am supposed to have one to one relation between a disk and an io. iostat show that all of the three disks work but not fully. Hello, How do you set sync/async io? Please, let me know! :-) Thanks, Janos -- Raz -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 read performance
I guess i was not clear enough. i am using raid5 over 3 maxtor disks. the chunk size is 1MB. i mesured the io coming from one disk alone when I READ from it with 1MB buffers , and i know that it is ~32MB/s. I created raid0 over two disks and my throughput grown to 64 MB/s. Doing the same thing with raid5 ended in 32 MB/s. I am using async io since i do not want to wait for several disks when i send an IO. By sending a buffer which is striped aligned i am supposed to have one to one relation between a disk and an io. iostat show that all of the three disks work but not fully. On 1/3/06, Mark Hahn [EMAIL PROTECTED] wrote: I am checking raid5 performance. reads or writes? I am using asynchronous ios with buffer size as the stripe size. why do you think async matters? In this case i am using a stripe size of 1M with 2+1 disks. do you mean that md says you have 512k chunks? Unlike raid0 , raid5 drops the performance by 50% . that's slightly unclear: -50% relative to what? a raw single disk? is this reads or writes? strictly bandwidth, and if so, do you have multiple outstanding reads? Is it because it does parity checkings ? non-degraded R5 doesn't do parity checks on reads, afaik. -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid5 read performance
1. do you want the code ? 2. I managed to gain linear perfromance with raid5. it seems that both raid 5 and raid 0 are caching read a head buffers. raid 5 cached small amount of read a head while raid0 did not. On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote: - Original Message - From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED] To: Mark Hahn [EMAIL PROTECTED] Cc: Linux RAID Mailing List linux-raid@vger.kernel.org Sent: Wednesday, January 04, 2006 9:14 AM Subject: Re: raid5 read performance I guess i was not clear enough. i am using raid5 over 3 maxtor disks. the chunk size is 1MB. i mesured the io coming from one disk alone when I READ from it with 1MB buffers , and i know that it is ~32MB/s. I created raid0 over two disks and my throughput grown to 64 MB/s. Doing the same thing with raid5 ended in 32 MB/s. I am using async io since i do not want to wait for several disks when i send an IO. By sending a buffer which is striped aligned i am supposed to have one to one relation between a disk and an io. iostat show that all of the three disks work but not fully. Hello, How do you set sync/async io? Please, let me know! :-) Thanks, Janos -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
raid5 read performance
I am checking raid5 performance. I am using asynchronous ios with buffer size as the stripe size. In this case i am using a stripe size of 1M with 2+1 disks. Unlike raid0 , raid5 drops the performance by 50% . Why ? Is it because it does parity checkings ? thank you -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: raid 0 read performance
what wrt stands for ? On 12/29/05, Mark Overmeer [EMAIL PROTECTED] wrote: * Raz Ben-Jehuda(caro) ([EMAIL PROTECTED]) [051229 10:10]: I have tested the overhead of linux raid0. I used two scsi atlas maxtor disks ( 147 MB) and combined them to single raid0 volume. The raid is striped in 256K stripes. Are you sure you tested linux overhead? Maybe you have just tested raid0 properties. I filled the raid0 up to the maximum with files over xfs file system. I've checked the peformance of reading 60 files like that: while need to read for every file read 0.5 M from file I got 50 MB/s . Armed with this knowledge i went and did the same test over one disk and i got 32 MB/s . Question: why is this perfomance drop ? Good performance for such small reads wrt the block-size... Ok, simple calculation: seek time average for one disk is half a rotation for two disks 1 - (1-0.5)*(1-0.5) = 0.75 rotation Without taking into account the time to do actual reads: 0.5/0.75 * (2x32MB/s) = 43MB/s (34% performance drop) The only effect you see is that the probability that both disks are in optimal position to read from them decreases. Solution: take very large files wrt the stripe-size to get double performance. Or take files smaller than the stripe-size. Of course, there can be other reasons which can reduce the performance as well. However, I achieve 200MB/s over 4 striped disk each capable of 50MB/s for huge files and 64K stipes... Linux doesn't seem to be the bottleneck in my setup. -- Regards, MarkOv Mark Overmeer MScMARKOV Solutions [EMAIL PROTECTED] [EMAIL PROTECTED] http://Mark.Overmeer.net http://solutions.overmeer.net -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID0 performance question
look at the cpu consumption. On 11/26/05, JaniD++ [EMAIL PROTECTED] wrote: Hello list, I have searching the bottleneck of my system, and found something what i cant cleanly understand. I have use NBD with 4 disk nodes. (raidtab is the bottom of mail) The cat /dev/nb# /dev/nullmakes ~ 350 Mbit/s on each nodes. The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800 Mbit/s. - i think this is my network bottleneck. But the cat /dev/md31 /dev/null (RAID0, the sum of 4 nodes) only makes ~450-490 Mbit/s, and i dont know why Somebody have an idea? :-) (the nb31,30,29,28 only possible mirrors) Thanks Janos raiddev /dev/md1 raid-level 1 nr-raid-disks 2 chunk-size 32 persistent-superblock 1 device /dev/nb0 raid-disk 0 device /dev/nb31 raid-disk 1 failed-disk /dev/nb31 raiddev /dev/md2 raid-level 1 nr-raid-disks 2 chunk-size 32 persistent-superblock 1 device /dev/nb1 raid-disk 0 device /dev/hb30 raid-disk 1 failed-disk /dev/nb30 raiddev /dev/md3 raid-level 1 nr-raid-disks 2 chunk-size 32 persistent-superblock 1 device /dev/nb2 raid-disk 0 device /dev/nb29 raid-disk 1 failed-disk /dev/nb29 raiddev /dev/md4 raid-level 1 nr-raid-disks 2 chunk-size 32 persistent-superblock 1 device /dev/nb3 raid-disk 0 device /dev/nb28 raid-disk 1 failed-disk /dev/nb28 raiddev /dev/md31 raid-level 0 nr-raid-disks 4 chunk-size 32 persistent-superblock 1 device /dev/md1 raid-disk 0 device /dev/md2 raid-disk 1 device /dev/md3 raid-disk 2 device /dev/md4 raid-disk 3 - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: comparing FreeBSD to linux
fetching from a disk a random block of 1 MB is aproximately 20MB/s for a sata disk ( randomly , all over the disk and over a file system). god help us if the cpu what that slow. On 11/21/05, Guy [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:linux-raid- [EMAIL PROTECTED] On Behalf Of Raz Ben-Jehuda(caro) Sent: Sunday, November 20, 2005 6:50 AM To: Linux RAID Mailing List Subject: comparing FreeBSD to linux I have evaluated which is better in terms cpu load when dealing with raid. FreeBSD vinum's or linux raid. When i issued a huge amount of io's read to linux raid i got 93% cpu load ( 7% idle), while i having 10% cpu load ( 90% idle ) in the freebsd. Maybe a silly question... But is it 9.3 times faster under Linux? :) That would explain the 9.3 times increase in CPU load. It is important that you are comparing the CPU load at the same disks rate, or at least factor in the disk rate. Guy I need to switch to linux from freebsd. I am using in linux 2.6.6 kernel . is problem is a known issue in linux , is it fixed ? -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: comparing FreeBSD to linux
What sort of a test is it ? what filesystem ? I am reading concurrently 50 files . Are you reading one file , several files ? On 11/21/05, Guy [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:linux-raid- [EMAIL PROTECTED] On Behalf Of Raz Ben-Jehuda(caro) Sent: Monday, November 21, 2005 9:47 AM To: Guy Cc: Linux RAID Mailing List Subject: Re: comparing FreeBSD to linux fetching from a disk a random block of 1 MB is aproximately 20MB/s for a sata disk ( randomly , all over the disk and over a file system). god help us if the cpu what that slow. If your CPU load is 90% at 20MB/sec then you have a problem. I get about 56MB per sec on my filesystem which is on a RAID5. The CPU load is less than 50% I have a P3-500MHz 2 CPU system. VERY OLD! I have Linux 2.4.31. I don't know if the newer 2.6 is better or not. Guy On 11/21/05, Guy [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:linux-raid- [EMAIL PROTECTED] On Behalf Of Raz Ben-Jehuda(caro) Sent: Sunday, November 20, 2005 6:50 AM To: Linux RAID Mailing List Subject: comparing FreeBSD to linux I have evaluated which is better in terms cpu load when dealing with raid. FreeBSD vinum's or linux raid. When i issued a huge amount of io's read to linux raid i got 93% cpu load ( 7% idle), while i having 10% cpu load ( 90% idle ) in the freebsd. Maybe a silly question... But is it 9.3 times faster under Linux? :) That would explain the 9.3 times increase in CPU load. It is important that you are comparing the CPU load at the same disks rate, or at least factor in the disk rate. Guy I need to switch to linux from freebsd. I am using in linux 2.6.6 kernel . is problem is a known issue in linux , is it fixed ? -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: comparing FreeBSD to linux
Well , i have tested the disk with a new tester i have written. it seems that the ata driver causes the high cpu and not raid. On 11/21/05, Raz Ben-Jehuda(caro) [EMAIL PROTECTED] wrote: What sort of a test is it ? what filesystem ? I am reading concurrently 50 files . Are you reading one file , several files ? On 11/21/05, Guy [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:linux-raid- [EMAIL PROTECTED] On Behalf Of Raz Ben-Jehuda(caro) Sent: Monday, November 21, 2005 9:47 AM To: Guy Cc: Linux RAID Mailing List Subject: Re: comparing FreeBSD to linux fetching from a disk a random block of 1 MB is aproximately 20MB/s for a sata disk ( randomly , all over the disk and over a file system). god help us if the cpu what that slow. If your CPU load is 90% at 20MB/sec then you have a problem. I get about 56MB per sec on my filesystem which is on a RAID5. The CPU load is less than 50% I have a P3-500MHz 2 CPU system. VERY OLD! I have Linux 2.4.31. I don't know if the newer 2.6 is better or not. Guy On 11/21/05, Guy [EMAIL PROTECTED] wrote: -Original Message- From: [EMAIL PROTECTED] [mailto:linux-raid- [EMAIL PROTECTED] On Behalf Of Raz Ben-Jehuda(caro) Sent: Sunday, November 20, 2005 6:50 AM To: Linux RAID Mailing List Subject: comparing FreeBSD to linux I have evaluated which is better in terms cpu load when dealing with raid. FreeBSD vinum's or linux raid. When i issued a huge amount of io's read to linux raid i got 93% cpu load ( 7% idle), while i having 10% cpu load ( 90% idle ) in the freebsd. Maybe a silly question... But is it 9.3 times faster under Linux? :) That would explain the 9.3 times increase in CPU load. It is important that you are comparing the CPU load at the same disks rate, or at least factor in the disk rate. Guy I need to switch to linux from freebsd. I am using in linux 2.6.6 kernel . is problem is a known issue in linux , is it fixed ? -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: comparing FreeBSD to linux
lspci 00:00.0 Host bridge: Intel Corp.: Unknown device 2588 (rev 05) 00:01.0 PCI bridge: Intel Corp.: Unknown device 2589 (rev 05) 00:02.0 VGA compatible controller: Intel Corp.: Unknown device 258a (rev 05) 00:1c.0 PCI bridge: Intel Corp.: Unknown device 2660 (rev 03) 00:1c.1 PCI bridge: Intel Corp.: Unknown device 2662 (rev 03) 00:1e.0 PCI bridge: Intel Corp. 82801BA/CA/DB PCI Bridge (rev d3) 00:1f.0 ISA bridge: Intel Corp.: Unknown device 2640 (rev 03) 00:1f.1 IDE interface: Intel Corp.: Unknown device 266f (rev 03) 00:1f.2 IDE interface: Intel Corp.: Unknown device 2652 (rev 03) 00:1f.3 SMBus: Intel Corp.: Unknown device 266a (rev 03) 01:00.0 PCI bridge: Intel Corp.: Unknown device 032c (rev 09) 01:00.1 PIC: Intel Corp.: Unknown device 0326 (rev 09) 03:00.0 Ethernet controller: Broadcom Corporation: Unknown device 1659 (rev 11) 04:00.0 Ethernet controller: Broadcom Corporation: Unknown device 1659 (rev 11) .config ... # # SCSI low-level drivers # CONFIG_BLK_DEV_3W__RAID=y CONFIG_SCSI_3W_9XXX=y # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_SCSI_DPT_I2O is not set # CONFIG_MEGARAID_NEWGEN is not set # CONFIG_MEGARAID_LEGACY is not set CONFIG_SCSI_SATA=y # CONFIG_SCSI_SATA_AHCI is not set # CONFIG_SCSI_SATA_SVW is not set CONFIG_SCSI_ATA_PIIX=y # CONFIG_SCSI_SATA_NV is not set CONFIG_SCSI_SATA_PROMISE=y # CONFIG_SCSI_SATA_QSTOR is not set # CONFIG_SCSI_SATA_SX4 is not set CONFIG_SCSI_SATA_SIL=y # CONFIG_SCSI_SATA_SIS is not set # CONFIG_SCSI_SATA_ULI is not set # CONFIG_SCSI_SATA_VIA is not set # CONFIG_SCSI_SATA_VITESSE is not set ... i am using a supermicro board. sata controler is onboard. I know that when i use a controller such as 3ware that hides the disks and export scsi disks i have much less cpu. cpu is xeon 3.0 GHZ. this is the storage information from dmesg. ata1: SATA max UDMA/133 cmd 0xE900 ctl 0xEA02 bmdma 0xED00 irq 10 ata2: SATA max UDMA/133 cmd 0xEB00 ctl 0xEC02 bmdma 0xED08 irq 10 ata1: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4673 85:7c69 86:3e01 87:4663 88:207f ata1: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48 ata1: dev 0 configured for UDMA/133 scsi0 : ata_piix ata2: dev 0 cfg 49:2f00 82:7c6b 83:7f09 84:4673 85:7c69 86:3e01 87:4663 88:207f ata2: dev 0 ATA, max UDMA/133, 490234752 sectors: lba48 ata2: dev 0 configured for UDMA/133 scsi1 : ata_piix Vendor: ATA Model: Maxtor 7L250S0Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 Vendor: ATA Model: Maxtor 7L250S0Rev: BANC Type: Direct-Access ANSI SCSI revision: 05 SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB) SCSI device sda: drive cache: write back SCSI device sda: 490234752 512-byte hdwr sectors (251000 MB) SCSI device sda: drive cache: write back /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3 Attached scsi disk sda at scsi0, channel 0, id 0, lun 0 SCSI device sdb: 490234752 512-byte hdwr sectors (251000 MB) SCSI device sdb: drive cache: write back SCSI device sdb: 490234752 512-byte hdwr sectors (251000 MB) SCSI device sdb: drive cache: write back /dev/scsi/host1/bus0/target0/lun0: p1 p2 p3 Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0 Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0 Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0, type 0 mice: PS/2 mouse device common for all mice input: AT Translated Set 2 keyboard on isa0060/serio0 i2c /dev entries driver md: linear personality registered as nr 1 md: raid0 personality registered as nr 2 md: raid1 personality registered as nr 3 md: raid5 personality registered as nr 4 On 11/21/05, Jeff Garzik [EMAIL PROTECTED] wrote: On Mon, Nov 21, 2005 at 10:15:11AM -0800, Raz Ben-Jehuda(caro) wrote: Well , i have tested the disk with a new tester i have written. it seems that the ata driver causes the high cpu and not raid. Which drivers are you using? lspci and kernel .config? Jeff -- Raz - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 question.
read the blockdev man page On Thu, 2005-08-04 at 16:06 +0200, [EMAIL PROTECTED] wrote: Hi list, Neil! I have a little question because I'v got some performance problem... Is the RAID5 do any type of headahead with the default config? It is possible to disable the readahead? (if it is.) Or just the io_sched do that on the disks? How can I disable the whole headahead on top of the RAID5 array? The RAID5 can do multiple small (4kb) reads of disks for the multithreaded read requests? (chunk size is 32k) Thanks for helping! Janos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 question.
take a look at /proc/sys/dev/raid/speed_limit_max . it is kilobytes if i recall correctly. On Thu, 2005-08-04 at 17:45 +0200, [EMAIL PROTECTED] wrote: Thanks a lot for you, and Raz! The raw devices readahead I already set with the hdparm, and /sys/block/*/queue/read_ahead_kb, but the md device is only changeable with blockdev. :-) I dont use lvm, because it is too slow for me But unfortunately the problem is still here for me. :( I continue searching the bottleneck... Thanks! Janos - Original Message - From: David Greaves [EMAIL PROTECTED] To: Raz Ben Jehuda [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; linux-raid@vger.kernel.org Sent: Thursday, August 04, 2005 4:56 PM Subject: Re: RAID5 question. And notice you can apply different readahead to: The raw devices (/dev/sda) The md device (/dev/mdX) Any lvm device (/dev/lvm_name/lvm_device) David Raz Ben Jehuda wrote: read the blockdev man page On Thu, 2005-08-04 at 16:06 +0200, [EMAIL PROTECTED] wrote: Hi list, Neil! I have a little question because I'v got some performance problem... Is the RAID5 do any type of headahead with the default config? It is possible to disable the readahead? (if it is.) Or just the io_sched do that on the disks? How can I disable the whole headahead on top of the RAID5 array? The RAID5 can do multiple small (4kb) reads of disks for the multithreaded read requests? (chunk size is 32k) Thanks for helping! Janos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID5 question.
how is the disks performance ? is it OK ? On Thu, 2005-08-04 at 18:13 +0200, [EMAIL PROTECTED] wrote: Yes, it is there, I know it. But its only for resync or not? :-) - Original Message - From: Raz Ben Jehuda [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: David Greaves [EMAIL PROTECTED]; linux-raid@vger.kernel.org Sent: Thursday, August 04, 2005 5:56 PM Subject: Re: RAID5 question. take a look at /proc/sys/dev/raid/speed_limit_max . it is kilobytes if i recall correctly. On Thu, 2005-08-04 at 17:45 +0200, [EMAIL PROTECTED] wrote: Thanks a lot for you, and Raz! The raw devices readahead I already set with the hdparm, and /sys/block/*/queue/read_ahead_kb, but the md device is only changeable with blockdev. :-) I dont use lvm, because it is too slow for me But unfortunately the problem is still here for me. :( I continue searching the bottleneck... Thanks! Janos - Original Message - From: David Greaves [EMAIL PROTECTED] To: Raz Ben Jehuda [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; linux-raid@vger.kernel.org Sent: Thursday, August 04, 2005 4:56 PM Subject: Re: RAID5 question. And notice you can apply different readahead to: The raw devices (/dev/sda) The md device (/dev/mdX) Any lvm device (/dev/lvm_name/lvm_device) David Raz Ben Jehuda wrote: read the blockdev man page On Thu, 2005-08-04 at 16:06 +0200, [EMAIL PROTECTED] wrote: Hi list, Neil! I have a little question because I'v got some performance problem... Is the RAID5 do any type of headahead with the default config? It is possible to disable the readahead? (if it is.) Or just the io_sched do that on the disks? How can I disable the whole headahead on top of the RAID5 array? The RAID5 can do multiple small (4kb) reads of disks for the multithreaded read requests? (chunk size is 32k) Thanks for helping! Janos - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware raid question
I know that some raid management information is save on the disks. But I am using 250 GB disks at minimum dd'ing this amount would be too long. Has know the exact position of where to dd ? On Tue, 2005-08-02 at 13:27 -0700, Jason Leach wrote: Raz: The 3ware (at least my 9500S-8) keeps the info about the disk and how it fits into the RAID array on the disk; I think in the DCB is used for this. You can (and I have) unplug the disks, then connect them to different ports and the array will still work fine. When you are adding a disk from a different array it probably has the DCB from that array. This is conflicting with your new array. I would do as the Dan suggests, and clear the disk. Jason. On 8/2/05, Dan Stromberg [EMAIL PROTECTED] wrote: On Tue, 2005-08-02 at 13:52 +0300, Raz Ben-Jehuda(caro) wrote: i have encountered a weired feature of 3ware raid. When i try to put inside an existing raid a disk which belonged to a different 3ware raid if fail. Any idea anyone ? Two thoughts: 1) Maybe test the disk in another machine and see if it's still good 2) Maybe wipe the disk clean in another machine with something like: dd if=/dev/zero of=/dev/sdb ...but be very careful to get the right disk (IE, you may not need /dev/sdb, you might need /dev/hdd or a number of other possibilities). - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3ware raid question
unlock the drive ? On Wed, 2005-08-03 at 10:26 -0400, Mike Dresser wrote: On Tue, 2 Aug 2005, Raz Ben-Jehuda(caro) wrote: i have encountered a weired feature of 3ware raid. When i try to put inside an existing raid a disk which belonged to a different 3ware raid if fail. Any idea anyone ? It's a feature on the 9500S's. You have to unlock the drive to be able to use it in another machine. Protects you from accidentally wiping out a raid volume disk. Mike - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Raz Long live the penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
3ware raid question
i have encountered a weired feature of 3ware raid. When i try to put inside an existing raid a disk which belonged to a different 3ware raid if fail. Any idea anyone ? -- Raz Long Live the Penguin - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html