On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote: > v4: > * Sorry for the long delay. I considered replacing this series with a simpler > approach. Real hardware ships with a fixed number of queues (e.g. 128). The > equivalent can be done in QEMU too. That way we don't need to magically > size > num_queues. In the end I decided against this approach because the Linux > virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized > all available queues until recently (it was written with > num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to > bring up 128 virtqueues (waste of resources and possibly weird performance > effects with blk-mq). > * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange] > * Update commit descriptions to mention maximum MSI-X vector and virtqueue > caps [Raphael] > v3: > * Introduce virtio_pci_optimal_num_queues() helper to enforce > VIRTIO_QUEUE_MAX > in one place > * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia] > * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael] > v3: > * Add new performance results that demonstrate the scalability > * Mention that this is PCI-specific [Cornelia] > v2: > * Let the virtio-DEVICE-pci device select num-queues because the optimal > multi-queue configuration may differ between virtio-pci, virtio-mmio, and > virtio-ccw [Cornelia] > > Enabling multi-queue on virtio-pci storage devices improves performance on SMP > guests because the completion interrupt is handled on the vCPU that submitted > the I/O request. This avoids IPIs inside the guest. > > Note that performance is unchanged in these cases: > 1. Uniprocessor guests. They don't have IPIs. > 2. Application threads might be scheduled on the sole vCPU that handles > completion interrupts purely by chance. (This is one reason why benchmark > results can vary noticably between runs.) > 3. Users may bind the application to the vCPU that handles completion > interrupts. > > Set the number of queues to the number of vCPUs by default on virtio-blk and > virtio-scsi PCI devices. Older machine types continue to default to 1 queue > for live migration compatibility. > > Random read performance: > IOPS > q=1 78k > q=32 104k +33% > > Boot time: > Duration > q=1 51s > q=32 1m41s +98% > > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks > > Previously measured results on a 4 vCPU guest were also positive but showed a > smaller 1-4% performance improvement. They are no longer valid because > significant event loop optimizations have been merged.
I'm guessing this should be deferred to the next release as it (narrowly) missed the freeze window. Does this make sense to you? > Stefan Hajnoczi (5): > virtio-pci: add virtio_pci_optimal_num_queues() helper > virtio-scsi: introduce a constant for fixed virtqueues > virtio-scsi: default num_queues to -smp N > virtio-blk: default num_queues to -smp N > vhost-user-blk: default num_queues to -smp N > > hw/virtio/virtio-pci.h | 9 +++++++++ > include/hw/virtio/vhost-user-blk.h | 2 ++ > include/hw/virtio/virtio-blk.h | 2 ++ > include/hw/virtio/virtio-scsi.h | 5 +++++ > hw/block/vhost-user-blk.c | 6 +++++- > hw/block/virtio-blk.c | 6 +++++- > hw/core/machine.c | 5 +++++ > hw/scsi/vhost-scsi.c | 3 ++- > hw/scsi/vhost-user-scsi.c | 5 +++-- > hw/scsi/virtio-scsi.c | 13 ++++++++---- > hw/virtio/vhost-scsi-pci.c | 9 +++++++-- > hw/virtio/vhost-user-blk-pci.c | 4 ++++ > hw/virtio/vhost-user-scsi-pci.c | 9 +++++++-- > hw/virtio/virtio-blk-pci.c | 7 ++++++- > hw/virtio/virtio-pci.c | 32 ++++++++++++++++++++++++++++++ > hw/virtio/virtio-scsi-pci.c | 9 +++++++-- > 16 files changed, 110 insertions(+), 16 deletions(-) > > -- > 2.26.2 >