~50,000 IOPS 4k random read. 200MB/sec, 30% CPU utilization on Nexenta, ~90% utilization on guest OS. I’m guessing guest OS is bottlenecking. Going to try physical hardware next week ~25,000 IOPS 4k random write. 100MB/sec, ~70% CPU utilization on Nexenta, ~45% CPU utilization on guest OS. Feels like Nexenta CPU is bottleneck. Load average of 2.5

A quick test with 128k recordsizes and 128k IO looked to be 400MB/sec performance, can’t remember CPU utilization on either side. Will retest and report those numbers.

It feels like something is adding more overhead here than I would expect on the 4k recordsizes/IO workloads. Any thoughts where I should start on this? I’d really like to see closer to 10Gbit performance here, but it seems like the hardware isn’t able to cope with it?

All systems have a bottleneck. You are highly unlikely to get close to 10Gbit performance with 4k random synchronous write. 25K IOPS seems pretty good to me.

The 2.4GHz clock rate of the 4-core Xeon CPU you are using is not terribly high. Performance is likely better with a higher-clocked more modern design with more cores.

Verify that the zfs checksum algorithm you are using is a low-cost one and that you have not enabled compression or deduplication.

You did not tell us how your zfs pool is organized so it is impossible to comment more.

