> Disclaimer: I’ve done extensive testing (FIO and postgres) with a few > different RAID controllers and HW RAID vs mdadm. We (micron) are crucial but > I don’t personally work with the consumer drives. > > Verify whether you have your disk write cache enabled or disabled. If it’s > disabled, that will have a large impact on write performance.
What an honor :) My SSDs are Crucial MX300 (consumer drives) but, as previously stated, they gave ~90k IOPS in all benchmarks I found on the web, while mine tops at ~40k IOPS. Being 6 devices bought from 4 different sellers it’s impossible that they are all defective. > Is this the *exact* string you used? `fio --filename=/dev/sdx --direct=1 > --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio > --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 > --group_reporting --name=4ktest` > > With FIO, you need to multiply iodepth by numjobs to get the final queue > depth its pushing. (in this case, 256). Make sure you’re looking at the > correct data. I used —numjobs=1 because I needed the time series values for bandwidth, latencies and iops. The command string was the same, except from varying IO Depth and numjobs=1. > Few other things: > - Mdadm will give better performance than HW RAID for specific > benchmarks. > - Performance is NOT linear with drive count for synthetic > benchmarks. > - It is often nearly linear for application performance. mdadm RAID10 scaled linearly while mdadm RAID0 scaled much less. > - HW RAID can give better performance if your drives do not have a > capacitor backed cache (like the MX300) AND the controller has a battery > backed cache. *Consumer drives can often get better performance from HW > RAID*. (otherwise MDADM has been faster in all of my testing) My RAID controller doesn’t have a BBU. > - Mdadm RAID10 has a bug where reads are not properly distributed > between the mirror pairs. (It uses head position calculated from the last IO > to determine which drive in a mirror pair should get the next read. It > results in really weird behavior of most read IO going to half of your drives > instead of being evenly split as should be the case for SSDs). You can see > this by running iostat while you’ve got a load running and you’ll see uneven > distribution of IOs. FYI, the RAID1 implementation has an exception where it > does NOT use head position for SSDs. I have yet to test this but you should > be able to get better performance by manually striping a RAID0 across > multiple RAID1s instead of using the default RAID10 implementation. Very interesting. I will double check this after buying and mounting the new HBA. I heard of someone doing what you are suggesting but never tried. > - Don’t focus on 4k Random Read. Do something more similar to a PG > workload (64k 70/30 R/W @ QD=4 is *reasonably* close to what I see for heavy > OLTP). Why 64k and QD=4? I thought of 8k and larger QD. Will test as soon as possible and report here the results :) > I’ve tested multiple controllers based on the LSI 3108 and found that default > settings from one vendor to another provide drastically different performance > profiles. Vendor A had much better benchmark performance (2x IOPS of B) while > vendor B gave better application performance (20% better OLTP performance in > Postgres). (I got equivalent performance from A & B when using the same > settings). Do you have some HBA card to suggest? What do you think of LSI SAS3008? I think it’s the same as the 3108 without RAID On Chip feature. Probably I will buy a Lenovo HBA card with that chip. It seems blazing fast (1mln IOPS) compared to the actual embedded RAID controller (LSI 2008). I don’t know if I can connect a 12Gb/s HBA directly to my existing 6Gb/s expander/backplane.. sure I will have the right cables but don’t know if it will work without changing the expander/backplane. Thank you a lot for your time Pietro Pugni