> Disclaimer: I’ve done extensive testing (FIO and postgres) with a few 
> different RAID controllers and HW RAID vs mdadm. We (micron) are crucial but 
> I don’t personally work with the consumer drives.
>  
> Verify whether you have your disk write cache enabled or disabled. If it’s 
> disabled, that will have a large impact on write performance. 

What an honor :)
My SSDs are Crucial MX300 (consumer drives) but, as previously stated, they 
gave ~90k IOPS in all benchmarks I found on the web, while mine tops at ~40k 
IOPS. Being 6 devices bought from 4 different sellers it’s impossible that they 
are all defective.

> Is this the *exact* string you used? `fio --filename=/dev/sdx --direct=1 
> --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio 
> --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 
> --group_reporting --name=4ktest`
>  
> With FIO, you need to multiply iodepth by numjobs to get the final queue 
> depth its pushing. (in this case, 256). Make sure you’re looking at the 
> correct data.

I used —numjobs=1 because I needed the time series values for bandwidth, 
latencies and iops. The command string was the same, except from varying IO 
Depth and numjobs=1.


> Few other things:
> -          Mdadm will give better performance than HW RAID for specific 
> benchmarks.
> -          Performance is NOT linear with drive count for synthetic 
> benchmarks.
> -          It is often nearly linear for application performance.

mdadm RAID10 scaled linearly while mdadm RAID0 scaled much less.


> -          HW RAID can give better performance if your drives do not have a 
> capacitor backed cache (like the MX300) AND the controller has a battery 
> backed cache. *Consumer drives can often get better performance from HW 
> RAID*. (otherwise MDADM has been faster in all of my testing)

My RAID controller doesn’t have a BBU.


> -          Mdadm RAID10 has a bug where reads are not properly distributed 
> between the mirror pairs. (It uses head position calculated from the last IO 
> to determine which drive in a mirror pair should get the next read. It 
> results in really weird behavior of most read IO going to half of your drives 
> instead of being evenly split as should be the case for SSDs).  You can see 
> this by running iostat while you’ve got a load running and you’ll see uneven 
> distribution of IOs. FYI, the RAID1 implementation has an exception where it 
> does NOT use head position for SSDs. I have yet to test this but you should 
> be able to get better performance by manually striping a RAID0 across 
> multiple RAID1s instead of using the default RAID10 implementation.

Very interesting. I will double check this after buying and mounting the new 
HBA. I heard of someone doing what you are suggesting but never tried.


> -          Don’t focus on 4k Random Read. Do something more similar to a PG 
> workload (64k 70/30 R/W @ QD=4 is *reasonably* close to what I see for heavy 
> OLTP).

Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as possible 
and report here the results :)


> I’ve tested multiple controllers based on the LSI 3108 and found that default 
> settings from one vendor to another provide drastically different performance 
> profiles. Vendor A had much better benchmark performance (2x IOPS of B) while 
> vendor B gave better application performance (20% better OLTP performance in 
> Postgres). (I got equivalent performance from A & B when using the same 
> settings). 

Do you have some HBA card to suggest? What do you think of LSI SAS3008? I think 
it’s the same as the 3108 without RAID On Chip feature. Probably I will buy a 
Lenovo HBA card with that chip. It seems blazing fast (1mln IOPS) compared to 
the actual embedded RAID controller (LSI 2008).
I don’t know if I can connect a 12Gb/s HBA directly to my existing 6Gb/s 
expander/backplane.. sure I will have the right cables but don’t know if it 
will work without changing the expander/backplane.


Thank you a lot for your time
 Pietro Pugni



Reply via email to