Hi, Our current AIO path does a great job at unloading the work from the VM, and combined with IOThreads provides a good performance in most scenarios. But it also comes with its costs, in both a longer execution path and the need of the intervention of the scheduler at various points.
There's one particular workload that suffers from this cost, and that's when you have just 1 or 2 cores on the Guest issuing synchronous requests. This happens to be a pretty common workload for some DBs and, in a general sense, on small VMs. I did a quick'n'dirty implementation on top of virtio-blk to get some numbers. This comes from a VM with 4 CPUs running on an idle server, with a secondary virtio-blk disk backed by a null_blk device with a simulated latency of 30us. - Average latency (us) ---------------------------------------- | | AIO+iothread | SIO+iothread | | 1 job | 70 | 55 | | 2 jobs | 83 | 82 | | 4 jobs | 90 | 159 | ---------------------------------------- In this case the intuition matches the reality, and synchronous IO wins when there's just 1 job issuing the requests, while it loses hard when the are 4. While my first thought was implementing this as a tunable, turns out we have a hint about the nature of the workload in the number of the requests in the VQ. So I updated the code to use SIO if there's just 1 request and AIO otherwise, with these results: ----------------------------------------------------------- | | AIO+iothread | SIO+iothread | AIO+SIO+iothread | | 1 job | 70 | 55 | 55 | | 2 jobs | 83 | 82 | 78 | | 4 jobs | 90 | 159 | 90 | ----------------------------------------------------------- This data makes me think this is something worth pursuing, but I'd like to hear your opinion on it. Thanks, Sergio.