Hello Jiri,
The high load may be caused by I/O wait (check with sar -u). At any case, 30mb/s seems a little slow for an FC array of any kind.. I don't know your DS-4300 at all but if you're using a SAN or an FC loop to connect to your array, here are (maybe) a few things you might want to look for:
- What kind of disks are used in your DS4300? 10k or 15k rpm FC disks? Did you check how heavily used were your disks during transfers? (there should be software provided with the array to allow that, perhaps even an embedded webserver). - Did you monitor your arrays Fibre Aadapter activity? (unless you're the sole user of the array and no other server can hit the same physical disks in which case you're most likely not overloading it). - Do you have multiple paths from your server to your switch and/or to your array? (even if the array is only active/passive and 2gbps; having multiple paths provides redundancy and better performance with correct configuration). - What kind of data is your FS holding (many little files, hundreds of thousands of file, etc..?). Tuning the FS or switching to a different FS type can help.. - If there is no bottleneck noticed above, then stripping might help (that's what we use here on active/active DMX arrays) but take care not to end up on the same physical disks at the array block level.. My 2c, Vincent On Wed, 24 Mar 2010, Jiri Novosad wrote:
Hello, we have a problem with our disk array. It might be even in HW, I'm not sure. The array holds home directories of our users + mail. HW configuration: a HP DL585 server, with four 6-core Opterons, 128GiB RAM array: IBM DS4300 with 7 LUNs, each a RAID5 with 4 disks (250GB). Fibre Channel: QLogic Corp. ISP2432-based 4Gb Fibre Channel to PCI Express HBA (the array only supports 2Gb) NCQ queue depth is 32 SW configuration: RHEL5.3 the home partition is a linear LVM volume: # lvdisplay -m /dev/array-vg/newhome --- Logical volume --- LV Name /dev/array-vg/newhome VG Name array-vg LV UUID 9XxWH5-5yv4-t661-K24d-Hdzg-G0aW-zUxRul LV Write Access read/write LV Status available # open 1 LV Size 2.18 TB Current LE 571393 Segments 9 Allocation inherit Read ahead sectors auto - currently set to 256 Block device 253:7 --- Segments --- Logical extent 0 to 66998: Type linear Physical volume /dev/sda Physical extents 111470 to 178468 Logical extent 66999 to 133997: Type linear Physical volume /dev/sdb Physical extents 111470 to 178468 Logical extent 133998 to 200996: Type linear Physical volume /dev/sdc Physical extents 111470 to 178468 Logical extent 200997 to 267995: Type linear Physical volume /dev/sdd Physical extents 111470 to 178468 Logical extent 267996 to 334994: Type linear Physical volume /dev/sde Physical extents 111470 to 178468 Logical extent 334995 to 401993: Type linear Physical volume /dev/sdf Physical extents 111470 to 178468 Logical extent 401994 to 468992: Type linear Physical volume /dev/sdg Physical extents 111470 to 178468 Logical extent 468993 to 527946: Type linear Physical volume /dev/sdg Physical extents 15945 to 74898 Logical extent 527947 to 571392: Type linear Physical volume /dev/sdc Physical extents 15945 to 59390 All LUNs use the deadline scheduler. Now the problem: whenever there is a 'large' write (in the order of hundreds of megabytes), the system load rises considerably. Inspection using iostat shows that from something like this: Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 373.00 8.00 7792.00 8 7792 sdb 11.00 8.00 80.00 8 80 sdc 13.00 8.00 96.00 8 96 sdd 9.00 8.00 80.00 8 80 sde 23.00 8.00 296.00 8 296 sdf 9.00 8.00 80.00 8 80 sdg 5.00 8.00 32.00 8 32 after a $ dd if=/dev/zero of=file bs=$((2**20)) count=128 it goes to this: Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sde 31.00 8.00 28944.00 8 28944 sdf 1.00 8.00 0.00 8 0 sdg 1.00 8.00 0.00 8 0 and when I generate some reads it goes from Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 171.00 3200.00 3448.00 3200 3448 sdb 24.00 3336.00 56.00 3336 56 sdc 17.00 3280.00 16.00 3280 16 sdd 15.00 3208.00 24.00 3208 24 sde 18.00 3200.00 56.00 3200 56 sdf 18.00 3192.00 40.00 3192 40 sdg 23.00 3184.00 144.00 3184 144 to Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 5.00 392.00 88.00 392 88 sdb 2.00 352.00 0.00 352 0 sdc 2.00 264.00 0.00 264 0 sdd 2.00 264.00 0.00 264 0 sde 277.00 560.00 38744.00 560 38744 sdf 2.00 264.00 0.00 264 0 sdg 1.00 296.00 0.00 296 0 It looks like the single write somehow cancels out all other requests. Switching to a striped LVM volume would probably help, but the data migration would be really painful for us. Has anyone an idea where the problem might be? Any pointers would be appreciated. Regards, Jiri Novosad
_______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
