On Mon, Jul 25, 2022 at 11:55:26PM +0300, Andrey Zhadchenko wrote: > Although QEMU virtio-blk is quite fast, there is still some room for > improvements. Disk latency can be reduced if we handle virito-blk requests > in host kernel so we avoid a lot of syscalls and context switches. > > The biggest disadvantage of this vhost-blk flavor is raw format. > Luckily Kirill Thai proposed device mapper driver for QCOW2 format to attach > files as block devices: https://www.spinics.net/lists/kernel/msg4292965.html > > Also by using kernel modules we can bypass iothread limitation and finaly > scale > block requests with cpus for high-performance devices. This is planned to be > implemented in next version. > > Linux kernel module part: > https://lore.kernel.org/kvm/20220725202753.298725-1-andrey.zhadche...@virtuozzo.com/ > > test setups and results: > fio --direct=1 --rw=randread --bs=4k --ioengine=libaio --iodepth=128
> QEMU drive options: cache=none > filesystem: xfs Please post the full QEMU command-line so it's clear exactly what this is benchmarking. A preallocated raw image file is a good baseline with: --object iothread,id=iothread0 \ --blockdev file,filename=test.img,cache.direct=on,aio=native,node-name=drive0 \ --device virtio-blk-pci,drive=drive0,iothread=iothread0 (BTW QEMU's default vq size is 256 descriptors and the number of vqs is the number of vCPUs.) > > SSD: > | randread, IOPS | randwrite, IOPS | > Host | 95.8k | 85.3k | > QEMU virtio | 57.5k | 79.4k | > QEMU vhost-blk | 95.6k | 84.3k | > > RAMDISK (vq == vcpu): With fio numjobs=vcpu here? > | randread, IOPS | randwrite, IOPS | > virtio, 1vcpu | 123k | 129k | > virtio, 2vcpu | 253k (??) | 250k (??) | QEMU's aio=threads (default) gets around the single IOThread. It beats aio=native for this reason in some cases. Were you using aio=native or aio=threads? > virtio, 4vcpu | 158k | 154k | > vhost-blk, 1vcpu | 110k | 113k | > vhost-blk, 2vcpu | 247k | 252k |
signature.asc
Description: PGP signature