I just mounted and configured my brand new LSI 3008-8i. This server had 1 SAS expander connected to 2 backplanes (8 disks to the first backplane and no disks connected to the second backplane). After some testing I found the SAS expander was a bottleneck, so I removed it and connected the first backplane directly to the controller.
The following results are from 4k 100% random reads (32QD) run in parallel on each single SSD: Raw SSDs [ 4k, 100% random reads, 32 Queue Depth] ServeRaid m5110e (with SAS expander) [numjob=1] read : io=5111.2MB, bw=87227KB/s, iops=21806, runt= 60002msec read : io=4800.6MB, bw=81927KB/s, iops=20481, runt= 60002msec read : io=4997.6MB, bw=85288KB/s, iops=21322, runt= 60002msec read : io=4796.2MB, bw=81853KB/s, iops=20463, runt= 60001msec read : io=5062.6MB, bw=86400KB/s, iops=21599, runt= 60001msec read : io=4989.6MB, bw=85154KB/s, iops=21288, runt= 60001msec Total read iops: 126,595 ( ~ 21,160 iops/disk) Raw SSDs [ 4k, 100% random reads, 32 Queue Depth] Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=1] read : io=15032MB, bw=256544KB/s, iops=64136, runt= 60001msec read : io=16679MB, bw=284656KB/s, iops=71163, runt= 60001msec read : io=15046MB, bw=256779KB/s, iops=64194, runt= 60001msec read : io=16667MB, bw=284444KB/s, iops=71111, runt= 60001msec read : io=16692MB, bw=284867KB/s, iops=71216, runt= 60001msec read : io=15149MB, bw=258534KB/s, iops=64633, runt= 60002msec Total read iops: 406,453 ( ~ 67,742 iops/disk) 321% performance improvement. I chose 4k 32QD because it should deliver the maximum iops and should clearly show if the I/O is properly configured. I don’t mind testing the embedded m5110e without the SAS expander because it will be slower for sure. > You might need to increase the number of jobs here. The primary reason for > this parameter is to improve scaling when you’re single thread CPU bound. > With numjob=1 FIO will use only a single thread and there’s only so much a > single CPU core can do. The HBA provided slightly better performance without removing the expander and even more slightly faster after removing the expander, but then I tried increasing numjob from 1 to 16 (tried also 12, 18, 20, 24 and 32 but found 16 to get higher iops) and the benchmarks returned expected results. I guess how this relates with Postgres.. probably effective_io_concurrency, as suggested by Merlin Moncure, should be the counterpart of numjob in fio? > I was a little unclear on the disk cache part. It’s a setting, generally in > the RAID controller / HBA. It’s also a filesystem level option in Linux > (hdparm) and Windows (somewhere in device manager?). The reason to disable > the disk cache is that it’s NOT protected against power loss protection on > the MX300. So by disabling it you can ensure 100% write consistency at the > cost of write performance. (using fully power protected drives will let you > keep disk cache enabled) I always enabled the write cache during my tests. I tried to disable it but performance were too poor. Those SSD are consumer ones and don’t have any capacitor :( > > Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as > > possible and report here the results :) > > It’s more representative of what you’ll see at the application level. (If > you’ve got a running system, you can just use IOstat to see what your average > QD is. (iostat -x 10, and it’s the column: avgqu-sz. Change from 10 seconds > to whatever interval works best for your environment) I tried your suggestion (64k, 70/30 random r/w, 4QD) on RAID0 and RAID10 (mdadm) with the new controller and the results are quite good if we think that the underlying SSDs are consumer with original firmware (overprovisioned at 25%). RAID10 is about 22% slower in both reads and writes compared to RAID0, at least on a 1 minute run. The totals and averages were calculated from the whole fio log output using the single jobs iops. These are the results: ############################################################################ mdadm RAID0 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth] Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=16] ############################################################################ Run status group 0 (all jobs): READ: io=75943MB, aggrb=1265.7MB/s, minb=80445KB/s, maxb=81576KB/s, mint=60001msec, maxt=60004msec WRITE: io=32585MB, aggrb=556072KB/s, minb=34220KB/s, maxb=35098KB/s, mint=60001msec, maxt=60004msec Disk stats (read/write): md127: ios=1213256/520566, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=202541/86892, aggrmerge=0/0, aggrticks=490418/137398, aggrin_queue=628566, aggrutil=99.20% sdf: ios=202557/86818, merge=0/0, ticks=450384/131512, in_queue=582528, util=98.58% sdb: ios=202626/87184, merge=0/0, ticks=573448/177336, in_queue=751784, util=99.20% sdg: ios=202391/86810, merge=0/0, ticks=463644/137084, in_queue=601272, util=98.46% sde: ios=202462/86551, merge=0/0, ticks=470028/121424, in_queue=592500, util=98.79% sda: ios=202287/86697, merge=0/0, ticks=473312/121192, in_queue=595044, util=98.95% sdh: ios=202928/87293, merge=0/0, ticks=511696/135840, in_queue=648272, util=99.14% Total read iops: 20,242 ( ~ 3,374 iops/disk) Total write iops: 8,679 ( ~ 1,447 iops/disk) ############################################################################ mdadm RAID10 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth] Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) [numjob=16] ############################################################################ Run status group 0 (all jobs): READ: io=58624MB, aggrb=976.11MB/s, minb=62125KB/s, maxb=62814KB/s, mint=60001msec, maxt=60005msec WRITE: io=25190MB, aggrb=429874KB/s, minb=26446KB/s, maxb=27075KB/s, mint=60001msec, maxt=60005msec Disk stats (read/write): md127: ios=936349/402381, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=156357/134348, aggrmerge=0/0, aggrticks=433286/262226, aggrin_queue=696052, aggrutil=99.41% sdf: ios=150239/134315, merge=0/0, ticks=298268/168472, in_queue=466852, util=95.31% sdb: ios=153088/133664, merge=0/0, ticks=329160/188060, in_queue=517432, util=96.81% sdg: ios=157361/135065, merge=0/0, ticks=658208/459168, in_queue=1118588, util=99.16% sde: ios=161361/134315, merge=0/0, ticks=476388/278628, in_queue=756056, util=97.61% sda: ios=160431/133664, merge=0/0, ticks=548620/329708, in_queue=878708, util=99.41% sdh: ios=155667/135065, merge=0/0, ticks=289072/149324, in_queue=438680, util=96.71% Total read iops: 15,625 ( ~ 2,604 iops/disk) Total write iops: 6,709 ( ~ 1,118 iops/disk) > > Do you have some HBA card to suggest? What do you think of LSI SAS3008? I > > think it’s the same as the 3108 without RAID On Chip feature. Probably I > > will buy a Lenovo HBA card with that chip. It seems blazing fast (1mln > > IOPS) compared to the actual embedded RAID controller (LSI 2008). > > I’ve been able to consistently get the same performance out of any of the LSI > based cards. The 3008 and 3108 both work great, regardless of vendor. Just > test or read up on the different configuration parameters (read ahead, write > back vs write through, disk cache) Do you have any suggestion for fine tuning this controller? I’m referring to parameters like nr_requests, queue_depth, etc. Also, any way to optimize the various mdadm parameters available at /sys/block/mdX/ ? I disabled the internal bitmap and write performance improved. Thank you Pietro Pugni