On Wed, Jul 08, 2020 at 06:59:41AM -0400, Michael S. Tsirkin wrote: > On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote: > > v4: > > * Sorry for the long delay. I considered replacing this series with a > > simpler > > approach. Real hardware ships with a fixed number of queues (e.g. 128). > > The > > equivalent can be done in QEMU too. That way we don't need to magically > > size > > num_queues. In the end I decided against this approach because the Linux > > virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally > > initialized > > all available queues until recently (it was written with > > num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to > > bring up 128 virtqueues (waste of resources and possibly weird > > performance > > effects with blk-mq). > > * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange] > > * Update commit descriptions to mention maximum MSI-X vector and virtqueue > > caps [Raphael] > > v3: > > * Introduce virtio_pci_optimal_num_queues() helper to enforce > > VIRTIO_QUEUE_MAX > > in one place > > * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia] > > * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael] > > v3: > > * Add new performance results that demonstrate the scalability > > * Mention that this is PCI-specific [Cornelia] > > v2: > > * Let the virtio-DEVICE-pci device select num-queues because the optimal > > multi-queue configuration may differ between virtio-pci, virtio-mmio, and > > virtio-ccw [Cornelia] > > > > Enabling multi-queue on virtio-pci storage devices improves performance on > > SMP > > guests because the completion interrupt is handled on the vCPU that > > submitted > > the I/O request. This avoids IPIs inside the guest. > > > > Note that performance is unchanged in these cases: > > 1. Uniprocessor guests. They don't have IPIs. > > 2. Application threads might be scheduled on the sole vCPU that handles > > completion interrupts purely by chance. (This is one reason why > > benchmark > > results can vary noticably between runs.) > > 3. Users may bind the application to the vCPU that handles completion > > interrupts. > > > > Set the number of queues to the number of vCPUs by default on virtio-blk and > > virtio-scsi PCI devices. Older machine types continue to default to 1 queue > > for live migration compatibility. > > > > Random read performance: > > IOPS > > q=1 78k > > q=32 104k +33% > > > > Boot time: > > Duration > > q=1 51s > > q=32 1m41s +98% > > > > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks > > > > Previously measured results on a 4 vCPU guest were also positive but showed > > a > > smaller 1-4% performance improvement. They are no longer valid because > > significant event loop optimizations have been merged. > > I'm guessing this should be deferred to the next release as > it (narrowly) missed the freeze window. Does this make sense to you?
Yes, that is fine. Thanks! Stefan
signature.asc
Description: PGP signature
