I just mounted and configured my brand new LSI 3008-8i. This server had 1 SAS 
expander connected to 2 backplanes (8 disks to the first backplane and no disks 
connected to the second backplane). After some testing I found the SAS expander 
was a bottleneck, so I removed it and connected the first backplane directly to 
the controller. 

The following results are from 4k 100% random reads (32QD) run in parallel on 
each single SSD:

Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
ServeRaid m5110e (with SAS expander) [numjob=1]
  read : io=5111.2MB, bw=87227KB/s, iops=21806, runt= 60002msec
  read : io=4800.6MB, bw=81927KB/s, iops=20481, runt= 60002msec
  read : io=4997.6MB, bw=85288KB/s, iops=21322, runt= 60002msec
  read : io=4796.2MB, bw=81853KB/s, iops=20463, runt= 60001msec
  read : io=5062.6MB, bw=86400KB/s, iops=21599, runt= 60001msec
  read : io=4989.6MB, bw=85154KB/s, iops=21288, runt= 60001msec
Total read iops: 126,595 ( ~ 21,160 iops/disk)


Raw SSDs [ 4k, 100% random reads, 32 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) 
[numjob=1]
  read : io=15032MB, bw=256544KB/s, iops=64136, runt= 60001msec
  read : io=16679MB, bw=284656KB/s, iops=71163, runt= 60001msec
  read : io=15046MB, bw=256779KB/s, iops=64194, runt= 60001msec
  read : io=16667MB, bw=284444KB/s, iops=71111, runt= 60001msec
  read : io=16692MB, bw=284867KB/s, iops=71216, runt= 60001msec
  read : io=15149MB, bw=258534KB/s, iops=64633, runt= 60002msec
Total read iops: 406,453 ( ~ 67,742 iops/disk)


321% performance improvement.
I chose 4k 32QD because it should deliver the maximum iops and should clearly 
show if the I/O is properly configured.
I don’t mind testing the embedded m5110e without the SAS expander because it 
will be slower for sure. 


> You might need to increase the number of jobs here. The primary reason for 
> this parameter is to improve scaling when you’re single thread CPU bound. 
> With numjob=1 FIO will use only a single thread and there’s only so much a 
> single CPU core can do.

The HBA provided slightly better performance without removing the expander and 
even more slightly faster after removing the expander, but then I tried 
increasing numjob from 1 to 16 (tried also 12, 18, 20, 24 and 32 but found 16 
to get higher iops) and the benchmarks returned expected results. I guess how 
this relates with Postgres.. probably effective_io_concurrency, as suggested by 
Merlin Moncure, should be the counterpart of numjob in fio?


> I was a little unclear on the disk cache part. It’s a setting, generally in 
> the RAID controller / HBA. It’s also a filesystem level option in Linux 
> (hdparm) and Windows (somewhere in device manager?). The reason to disable 
> the disk cache is that it’s NOT protected against power loss protection on 
> the MX300. So by disabling it you can ensure 100% write consistency at the 
> cost of write performance. (using fully power protected drives will let you 
> keep disk cache enabled)

I always enabled the write cache during my tests. I tried to disable it but 
performance were too poor. Those SSD are consumer ones and don’t have any 
capacitor :(


> > Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as 
> > possible and report here the results :)
>  
> It’s more representative of what you’ll see at the application level. (If 
> you’ve got a running system, you can just use IOstat to see what your average 
> QD is. (iostat -x 10, and it’s the column: avgqu-sz. Change from 10 seconds 
> to whatever interval works best for your environment)

I tried your suggestion (64k, 70/30 random r/w, 4QD) on RAID0 and RAID10 
(mdadm) with the new controller and the results are quite good if we think that 
the underlying SSDs are consumer with original firmware (overprovisioned at 
25%).

RAID10 is about 22% slower in both reads and writes compared to RAID0, at least 
on a 1 minute run. The totals and averages were calculated from the whole fio 
log output using the single jobs iops.

These are the results:


############################################################################
mdadm RAID0 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) 
[numjob=16]
############################################################################
Run status group 0 (all jobs):
   READ: io=75943MB, aggrb=1265.7MB/s, minb=80445KB/s, maxb=81576KB/s, 
mint=60001msec, maxt=60004msec
  WRITE: io=32585MB, aggrb=556072KB/s, minb=34220KB/s, maxb=35098KB/s, 
mint=60001msec, maxt=60004msec

Disk stats (read/write):
    md127: ios=1213256/520566, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
aggrios=202541/86892, aggrmerge=0/0, aggrticks=490418/137398, 
aggrin_queue=628566, aggrutil=99.20%
  sdf: ios=202557/86818, merge=0/0, ticks=450384/131512, in_queue=582528, 
util=98.58%
  sdb: ios=202626/87184, merge=0/0, ticks=573448/177336, in_queue=751784, 
util=99.20%
  sdg: ios=202391/86810, merge=0/0, ticks=463644/137084, in_queue=601272, 
util=98.46%
  sde: ios=202462/86551, merge=0/0, ticks=470028/121424, in_queue=592500, 
util=98.79%
  sda: ios=202287/86697, merge=0/0, ticks=473312/121192, in_queue=595044, 
util=98.95%
  sdh: ios=202928/87293, merge=0/0, ticks=511696/135840, in_queue=648272, 
util=99.14%

Total read iops: 20,242 ( ~ 3,374 iops/disk)
Total write iops: 8,679 ( ~ 1,447 iops/disk)



############################################################################
mdadm RAID10 [ 64k, 70% random reads, 30% random writes, 04 Queue Depth]
Lenovo N2215 (LSI 3008-8i flashed with LSI IT firmware, without SAS expander) 
[numjob=16]
############################################################################
Run status group 0 (all jobs):
   READ: io=58624MB, aggrb=976.11MB/s, minb=62125KB/s, maxb=62814KB/s, 
mint=60001msec, maxt=60005msec
  WRITE: io=25190MB, aggrb=429874KB/s, minb=26446KB/s, maxb=27075KB/s, 
mint=60001msec, maxt=60005msec

Disk stats (read/write):
    md127: ios=936349/402381, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
aggrios=156357/134348, aggrmerge=0/0, aggrticks=433286/262226, 
aggrin_queue=696052, aggrutil=99.41%
  sdf: ios=150239/134315, merge=0/0, ticks=298268/168472, in_queue=466852, 
util=95.31%
  sdb: ios=153088/133664, merge=0/0, ticks=329160/188060, in_queue=517432, 
util=96.81%
  sdg: ios=157361/135065, merge=0/0, ticks=658208/459168, in_queue=1118588, 
util=99.16%
  sde: ios=161361/134315, merge=0/0, ticks=476388/278628, in_queue=756056, 
util=97.61%
  sda: ios=160431/133664, merge=0/0, ticks=548620/329708, in_queue=878708, 
util=99.41%
  sdh: ios=155667/135065, merge=0/0, ticks=289072/149324, in_queue=438680, 
util=96.71%

Total read iops: 15,625 ( ~ 2,604 iops/disk)
Total write iops: 6,709 ( ~ 1,118 iops/disk)



> > Do you have some HBA card to suggest? What do you think of LSI SAS3008? I 
> > think it’s the same as the 3108 without RAID On Chip feature. Probably I 
> > will buy a Lenovo HBA card with that chip. It seems blazing fast (1mln 
> > IOPS) compared to the actual embedded RAID controller (LSI 2008).
>  
> I’ve been able to consistently get the same performance out of any of the LSI 
> based cards. The 3008 and 3108 both work great, regardless of vendor. Just 
> test or read up on the different configuration parameters (read ahead, write 
> back vs write through, disk cache)

Do you have any suggestion for fine tuning this controller? I’m referring to 
parameters like nr_requests, queue_depth, etc.
Also, any way to optimize the various mdadm parameters available at 
/sys/block/mdX/ ? I disabled the internal bitmap and write performance improved.



Thank you
 Pietro Pugni



Reply via email to