Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default

2020-07-08 Thread Stefan Hajnoczi
On Wed, Jul 08, 2020 at 06:59:41AM -0400, Michael S. Tsirkin wrote:
> On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote:
> > v4:
> >  * Sorry for the long delay. I considered replacing this series with a 
> > simpler
> >approach. Real hardware ships with a fixed number of queues (e.g. 128). 
> > The
> >equivalent can be done in QEMU too. That way we don't need to magically 
> > size
> >num_queues. In the end I decided against this approach because the Linux
> >virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally 
> > initialized
> >all available queues until recently (it was written with
> >num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
> >bring up 128 virtqueues (waste of resources and possibly weird 
> > performance
> >effects with blk-mq).
> >  * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
> >  * Update commit descriptions to mention maximum MSI-X vector and virtqueue
> >caps [Raphael]
> > v3:
> >  * Introduce virtio_pci_optimal_num_queues() helper to enforce 
> > VIRTIO_QUEUE_MAX
> >in one place
> >  * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
> >  * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
> > v3:
> >  * Add new performance results that demonstrate the scalability
> >  * Mention that this is PCI-specific [Cornelia]
> > v2:
> >  * Let the virtio-DEVICE-pci device select num-queues because the optimal
> >multi-queue configuration may differ between virtio-pci, virtio-mmio, and
> >virtio-ccw [Cornelia]
> > 
> > Enabling multi-queue on virtio-pci storage devices improves performance on 
> > SMP
> > guests because the completion interrupt is handled on the vCPU that 
> > submitted
> > the I/O request.  This avoids IPIs inside the guest.
> > 
> > Note that performance is unchanged in these cases:
> > 1. Uniprocessor guests.  They don't have IPIs.
> > 2. Application threads might be scheduled on the sole vCPU that handles
> >completion interrupts purely by chance.  (This is one reason why 
> > benchmark
> >results can vary noticably between runs.)
> > 3. Users may bind the application to the vCPU that handles completion
> >interrupts.
> > 
> > Set the number of queues to the number of vCPUs by default on virtio-blk and
> > virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> > for live migration compatibility.
> > 
> > Random read performance:
> >   IOPS
> > q=178k
> > q=32  104k  +33%
> > 
> > Boot time:
> >   Duration
> > q=151s
> > q=32 1m41s  +98%
> > 
> > Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> > 
> > Previously measured results on a 4 vCPU guest were also positive but showed 
> > a
> > smaller 1-4% performance improvement.  They are no longer valid because
> > significant event loop optimizations have been merged.
> 
> I'm guessing this should be deferred to the next release as
> it (narrowly) missed the freeze window. Does this make sense to you?

Yes, that is fine. Thanks!

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v5 0/5] virtio-pci: enable blk and scsi multi-queue by default

2020-07-08 Thread Michael S. Tsirkin
On Mon, Jul 06, 2020 at 02:56:45PM +0100, Stefan Hajnoczi wrote:
> v4:
>  * Sorry for the long delay. I considered replacing this series with a simpler
>approach. Real hardware ships with a fixed number of queues (e.g. 128). The
>equivalent can be done in QEMU too. That way we don't need to magically 
> size
>num_queues. In the end I decided against this approach because the Linux
>virtio_blk.ko and virtio_scsi.ko guest drivers unconditionally initialized
>all available queues until recently (it was written with
>num_queues=num_vcpus in mind). It doesn't make sense for a 1 CPU guest to
>bring up 128 virtqueues (waste of resources and possibly weird performance
>effects with blk-mq).
>  * Honor maximum number of MSI-X vectors and virtqueues [Daniel Berrange]
>  * Update commit descriptions to mention maximum MSI-X vector and virtqueue
>caps [Raphael]
> v3:
>  * Introduce virtio_pci_optimal_num_queues() helper to enforce 
> VIRTIO_QUEUE_MAX
>in one place
>  * Use VIRTIO_SCSI_VQ_NUM_FIXED constant in all cases [Cornelia]
>  * Update hw/core/machine.c compat properties for QEMU 5.0 [Michael]
> v3:
>  * Add new performance results that demonstrate the scalability
>  * Mention that this is PCI-specific [Cornelia]
> v2:
>  * Let the virtio-DEVICE-pci device select num-queues because the optimal
>multi-queue configuration may differ between virtio-pci, virtio-mmio, and
>virtio-ccw [Cornelia]
> 
> Enabling multi-queue on virtio-pci storage devices improves performance on SMP
> guests because the completion interrupt is handled on the vCPU that submitted
> the I/O request.  This avoids IPIs inside the guest.
> 
> Note that performance is unchanged in these cases:
> 1. Uniprocessor guests.  They don't have IPIs.
> 2. Application threads might be scheduled on the sole vCPU that handles
>completion interrupts purely by chance.  (This is one reason why benchmark
>results can vary noticably between runs.)
> 3. Users may bind the application to the vCPU that handles completion
>interrupts.
> 
> Set the number of queues to the number of vCPUs by default on virtio-blk and
> virtio-scsi PCI devices.  Older machine types continue to default to 1 queue
> for live migration compatibility.
> 
> Random read performance:
>   IOPS
> q=178k
> q=32  104k  +33%
> 
> Boot time:
>   Duration
> q=151s
> q=32 1m41s  +98%
> 
> Guest configuration: 32 vCPUs, 101 virtio-blk-pci disks
> 
> Previously measured results on a 4 vCPU guest were also positive but showed a
> smaller 1-4% performance improvement.  They are no longer valid because
> significant event loop optimizations have been merged.

I'm guessing this should be deferred to the next release as
it (narrowly) missed the freeze window. Does this make sense to you?

> Stefan Hajnoczi (5):
>   virtio-pci: add virtio_pci_optimal_num_queues() helper
>   virtio-scsi: introduce a constant for fixed virtqueues
>   virtio-scsi: default num_queues to -smp N
>   virtio-blk: default num_queues to -smp N
>   vhost-user-blk: default num_queues to -smp N
> 
>  hw/virtio/virtio-pci.h |  9 +
>  include/hw/virtio/vhost-user-blk.h |  2 ++
>  include/hw/virtio/virtio-blk.h |  2 ++
>  include/hw/virtio/virtio-scsi.h|  5 +
>  hw/block/vhost-user-blk.c  |  6 +-
>  hw/block/virtio-blk.c  |  6 +-
>  hw/core/machine.c  |  5 +
>  hw/scsi/vhost-scsi.c   |  3 ++-
>  hw/scsi/vhost-user-scsi.c  |  5 +++--
>  hw/scsi/virtio-scsi.c  | 13 
>  hw/virtio/vhost-scsi-pci.c |  9 +++--
>  hw/virtio/vhost-user-blk-pci.c |  4 
>  hw/virtio/vhost-user-scsi-pci.c|  9 +++--
>  hw/virtio/virtio-blk-pci.c |  7 ++-
>  hw/virtio/virtio-pci.c | 32 ++
>  hw/virtio/virtio-scsi-pci.c|  9 +++--
>  16 files changed, 110 insertions(+), 16 deletions(-)
> 
> -- 
> 2.26.2
>