Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
On Wed, Jul 08, 2020 at 06:59:41AM -0400, Michael S. Tsirkin wrote: > On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote: > > v4: > > * Sorry for the long delay. I considered replacing this series with a > > simpler > >approach. Real hardware ships with a fixed number of queues (e.g. 128). > > The > >equivalent can be done in QEMU too. That way we don't need to magically > > size > >num_queues. In the end I decided against this approach because the Linux > >virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally > > initialized > >all available queues until recently (it was written with > >num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to > >bring up 128 virtqueues (waste of resources and possibly weird > > performance > >effects with blk-mq). > > * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange] > > * Update commit descriptions to mention maximum MSI-X vector and virtqueue > >caps [Raphael] > > v3: > > * Introduce virtio_pci_optimal_num_queues() helper to enforce > > VIRTIO_QUEUE_MAX > >in one place > > * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia] > > * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael] > > v3: > > * Add new performance results that demonstrate the scalability > > * Mention that this is PCI-specific [Cornelia] > > v2: > > * Let the virtio-DEVICE-pci device select num-queues because the optimal > >multi-queue configuration may differ between virtio-pci, virtio-mmio, and > >virtio-ccw [Cornelia] > > > > Enabling multi-queue on virtio-pci storage devices improves performance on > > SMP > > guests because the completion interrupt is handled on the vCPU that > > submitted > > the I/O request. This avoids IPIs inside the guest. > > > > Note that performance is unchanged in these cases: > > 1. Uniprocessor guests. They don't have IPIs. > > 2. Application threads might be scheduled on the sole vCPU that handles > >completion interrupts purely by chance. (This is one reason why > > benchmark > >results can vary noticably between runs.) > > 3. Users may bind the application to the vCPU that handles completion > >interrupts. > > > > Set the number of queues to the number of vCPUs by default on virtio-blk and > > virtio-scsi PCI devices. Older machine types continue to default to 1 queue > > for live migration compatibility. > > > > Random read performance: > > IOPS > > q=178k > > q=32 104k +33% > > > > Boot time: > > Duration > > q=151s > > q=32 1m41s +98% > > > > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks > > > > Previously measured results on a 4 vCPU guest were also positive but showed > > a > > smaller 1-4% performance improvement. They are no longer valid because > > significant event loop optimizations have been merged. > > I'm guessing this should be deferred to the next release as > it (narrowly) missed the freeze window. Does this make sense to you? Yes, that is fine. Thanks! Stefan signature.asc Description: PGP signature
Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default
On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote: > v4: > * Sorry for the long delay. I considered replacing this series with a simpler >approach. Real hardware ships with a fixed number of queues (e.g. 128). The >equivalent can be done in QEMU too. That way we don't need to magically > size >num_queues. In the end I decided against this approach because the Linux >virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized >all available queues until recently (it was written with >num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to >bring up 128 virtqueues (waste of resources and possibly weird performance >effects with blk-mq). > * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange] > * Update commit descriptions to mention maximum MSI-X vector and virtqueue >caps [Raphael] > v3: > * Introduce virtio_pci_optimal_num_queues() helper to enforce > VIRTIO_QUEUE_MAX >in one place > * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia] > * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael] > v3: > * Add new performance results that demonstrate the scalability > * Mention that this is PCI-specific [Cornelia] > v2: > * Let the virtio-DEVICE-pci device select num-queues because the optimal >multi-queue configuration may differ between virtio-pci, virtio-mmio, and >virtio-ccw [Cornelia] > > Enabling multi-queue on virtio-pci storage devices improves performance on SMP > guests because the completion interrupt is handled on the vCPU that submitted > the I/O request. This avoids IPIs inside the guest. > > Note that performance is unchanged in these cases: > 1. Uniprocessor guests. They don't have IPIs. > 2. Application threads might be scheduled on the sole vCPU that handles >completion interrupts purely by chance. (This is one reason why benchmark >results can vary noticably between runs.) > 3. Users may bind the application to the vCPU that handles completion >interrupts. > > Set the number of queues to the number of vCPUs by default on virtio-blk and > virtio-scsi PCI devices. Older machine types continue to default to 1 queue > for live migration compatibility. > > Random read performance: > IOPS > q=178k > q=32 104k +33% > > Boot time: > Duration > q=151s > q=32 1m41s +98% > > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks > > Previously measured results on a 4 vCPU guest were also positive but showed a > smaller 1-4% performance improvement. They are no longer valid because > significant event loop optimizations have been merged. I'm guessing this should be deferred to the next release as it (narrowly) missed the freeze window. Does this make sense to you? > Stefan Hajnoczi (5): > virtio-pci: add virtio_pci_optimal_num_queues() helper > virtio-scsi: introduce a constant for fixed virtqueues > virtio-scsi: default num_queues to -smp N > virtio-blk: default num_queues to -smp N > vhost-user-blk: default num_queues to -smp N > > hw/virtio/virtio-pci.h | 9 + > include/hw/virtio/vhost-user-blk.h | 2 ++ > include/hw/virtio/virtio-blk.h | 2 ++ > include/hw/virtio/virtio-scsi.h| 5 + > hw/block/vhost-user-blk.c | 6 +- > hw/block/virtio-blk.c | 6 +- > hw/core/machine.c | 5 + > hw/scsi/vhost-scsi.c | 3 ++- > hw/scsi/vhost-user-scsi.c | 5 +++-- > hw/scsi/virtio-scsi.c | 13 > hw/virtio/vhost-scsi-pci.c | 9 +++-- > hw/virtio/vhost-user-blk-pci.c | 4 > hw/virtio/vhost-user-scsi-pci.c| 9 +++-- > hw/virtio/virtio-blk-pci.c | 7 ++- > hw/virtio/virtio-pci.c | 32 ++ > hw/virtio/virtio-scsi-pci.c| 9 +++-- > 16 files changed, 110 insertions(+), 16 deletions(-) > > -- > 2.26.2 >