Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-09 Thread Stefan Hajnoczi
On Thu, Mar 8, 2012 at 11:56 PM, Ross Becker ross.bec...@gmail.com wrote:
 I just joined in order to chime in here-

 I'm seeing the exact same thing as Reeted;  I've got a machine with a
 storage subsystem capable of 400k IOPs, and when I punch the storage up to
 VMs, each VM seems to top out at around 15-20k IOPs.   I've managed to get
 to 115k IOPs by creating 8 VMs, doing appropriate CPU pinning to spread
 them amongst physical cores, and running IO in them simultaneously, but
 I'm unable to get a single VM past 20k IOPs.

 I'm using kvm-qemu 12.1.2, as distributed in RHEL 6.2.

 The hardware is a Dell R910 chassis, with 4 intel E7 processors.  I am
 poking LVM logical volume block devices directly up to VMs as disks,
 format raw, virtio driver, write caching none, IO mode native.  Each VM
 has 4 vCPUs.

 I'm also using fio to do my testing.

 The interesting thing is that throughput is actually pretty fantastic; I'm
 able to push 6.3 GB/sec using 256k blocks, but the IOPs @ 4k block size
 are poor.

There is a stalled effort to improve the virtio-blk guest driver IOPS
performance.  You might be interested in testing these patches
(virtio-blk: Change I/O path from request to BIO):

https://lkml.org/lkml/2011/12/20/419

No one has deeply explored and benchmarked to a point where it's clear
that these patches are the way forward.

What the patches do is change the guest driver to reduce lock
contention and actually skip the guest I/O scheduler in favor of a
more lightweight code-path in the guest kernel.  This should be good
for your 400k IOPS with 4 vcpus setup.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-08 Thread Ross Becker
I just joined in order to chime in here-

I'm seeing the exact same thing as Reeted;  I've got a machine with a
storage subsystem capable of 400k IOPs, and when I punch the storage up to
VMs, each VM seems to top out at around 15-20k IOPs.   I've managed to get
to 115k IOPs by creating 8 VMs, doing appropriate CPU pinning to spread
them amongst physical cores, and running IO in them simultaneously, but
I'm unable to get a single VM past 20k IOPs.

I'm using kvm-qemu 12.1.2, as distributed in RHEL 6.2.

The hardware is a Dell R910 chassis, with 4 intel E7 processors.  I am
poking LVM logical volume block devices directly up to VMs as disks,
format raw, virtio driver, write caching none, IO mode native.  Each VM
has 4 vCPUs.

I'm also using fio to do my testing.

The interesting thing is that throughput is actually pretty fantastic; I'm
able to push 6.3 GB/sec using 256k blocks, but the IOPs @ 4k block size
are poor.

I am happy to provide any config details, or try any tests suggested.


--Ross


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-07 Thread Stefan Hajnoczi
On Tue, Mar 6, 2012 at 10:07 PM, Reeted ree...@shiftmail.org wrote:
 On 03/06/12 13:59, Stefan Hajnoczi wrote:

 On Mon, Mar 5, 2012 at 4:44 PM, Martin Mailandmar...@tuxadero.com
  wrote:

 Am 05.03.2012 17:35, schrieb Stefan Hajnoczi:

 1. Test on i7 Laptop with Cpu governor ondemand.

  v0.14.1
  bw=63492KB/s iops=15873
  bw=63221KB/s iops=15805

  v1.0
  bw=36696KB/s iops=9173
  bw=37404KB/s iops=9350

  master
  bw=36396KB/s iops=9099
  bw=34182KB/s iops=8545

  Change the Cpu governor to performance
  master
  bw=81756KB/s iops=20393
  bw=81453KB/s iops=20257

 Interesting finding.  Did you show the 0.14.1 results with
 performance governor?



 Hi Stefan,
 all results are with ondemand except the one where I changed it to
 performance

 Do you want a v0.14.1 test with the governor on performance?

 Yes, the reason why that would be interesting is because it allows us
 to put the performance gain with master+performance into
 perspective.  We could see how much of a change we get.



 Me too, I would be interested in seeing 0.14.1 being tested with performance
 governor so to compare it to master with performance governor, to make sure
 that this is not a regression.

 BTW, I'll take the opportunity to say that 15.8 or 20.3 k IOPS are very low
 figures compared to what I'd instinctively expect from a paravirtualized
 block driver.
 There are now PCIe SSD cards that do 240 k IOPS (e.g. OCZ RevoDrive 3 x2
 max iops) which is 12-15 times higher, for something that has to go through
 a real driver and a real PCI-express bus, and can't use zero-copy
 techniques.
 The IOPS we can give to a VM is currently less than half that of a single
 SSD SATA drive (60 k IOPS or so, these days).
 That's why I consider this topic of virtio-blk performances very important.
 I hope there can be improvements in this sector...

It depends on the benchmark configuration.  virtio-blk is capable of
doing 100,000s of iops, I've seen results.  My guess is that you can
do 100,000 read iops with virtio-blk on a good machine and stock
qemu-kvm.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-07 Thread Reeted

On 03/07/12 09:04, Stefan Hajnoczi wrote:

On Tue, Mar 6, 2012 at 10:07 PM, Reetedree...@shiftmail.org  wrote:

On 03/06/12 13:59, Stefan Hajnoczi wrote:

BTW, I'll take the opportunity to say that 15.8 or 20.3 k IOPS are very low
figures compared to what I'd instinctively expect from a paravirtualized
block driver.
There are now PCIe SSD cards that do 240 k IOPS (e.g. OCZ RevoDrive 3 x2
max iops) which is 12-15 times higher, for something that has to go through
a real driver and a real PCI-express bus, and can't use zero-copy
techniques.
The IOPS we can give to a VM is currently less than half that of a single
SSD SATA drive (60 k IOPS or so, these days).
That's why I consider this topic of virtio-blk performances very important.
I hope there can be improvements in this sector...

It depends on the benchmark configuration.  virtio-blk is capable of
doing 100,000s of iops, I've seen results.  My guess is that you can
do100,000 read iops with virtio-blk on a good machine and stock
qemu-kvm.


It's very difficult to configure, then.
I also did benchmarks in the past, and I can confirm Martin and Dongsu 
findings of about 15 k IOPS with:
qemu-kvm 0.14.1, Intel Westmere CPU, virtio-blk (kernel 2.6.38 on the 
guest, 3.0 on the host), fio, 4k random *reads* from the Host page cache 
(backend LVM device was fully in cache on the Host), writeback setting, 
cache dropped on the guest prior to benchmark (and insufficient guest 
memory to cache a significant portion of the device).
If you can teach us how to reach 100 k IOPS, I think everyone would be 
grateful :-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-07 Thread Stefan Hajnoczi
On Wed, Mar 7, 2012 at 2:21 PM, Reeted ree...@shiftmail.org wrote:
 On 03/07/12 09:04, Stefan Hajnoczi wrote:

 On Tue, Mar 6, 2012 at 10:07 PM, Reetedree...@shiftmail.org  wrote:

 On 03/06/12 13:59, Stefan Hajnoczi wrote:

 BTW, I'll take the opportunity to say that 15.8 or 20.3 k IOPS are very
 low
 figures compared to what I'd instinctively expect from a paravirtualized
 block driver.
 There are now PCIe SSD cards that do 240 k IOPS (e.g. OCZ RevoDrive 3
 x2
 max iops) which is 12-15 times higher, for something that has to go
 through
 a real driver and a real PCI-express bus, and can't use zero-copy
 techniques.
 The IOPS we can give to a VM is currently less than half that of a
 single
 SSD SATA drive (60 k IOPS or so, these days).
 That's why I consider this topic of virtio-blk performances very
 important.
 I hope there can be improvements in this sector...

 It depends on the benchmark configuration.  virtio-blk is capable of
 doing 100,000s of iops, I've seen results.  My guess is that you can
 do100,000 read iops with virtio-blk on a good machine and stock
 qemu-kvm.


 It's very difficult to configure, then.
 I also did benchmarks in the past, and I can confirm Martin and Dongsu
 findings of about 15 k IOPS with:
 qemu-kvm 0.14.1, Intel Westmere CPU, virtio-blk (kernel 2.6.38 on the guest,
 3.0 on the host), fio, 4k random *reads* from the Host page cache (backend
 LVM device was fully in cache on the Host), writeback setting, cache dropped
 on the guest prior to benchmark (and insufficient guest memory to cache a
 significant portion of the device).
 If you can teach us how to reach 100 k IOPS, I think everyone would be
 grateful :-)

Sorry for being vague, I don't have the details.  I have CCed Khoa,
who might have time to describe a 100,000 iops virtio-blk
configuration.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] virtio-blk performance regression and qemu-kvm

2012-03-06 Thread Reeted

On 03/06/12 13:59, Stefan Hajnoczi wrote:

On Mon, Mar 5, 2012 at 4:44 PM, Martin Mailandmar...@tuxadero.com  wrote:

Am 05.03.2012 17:35, schrieb Stefan Hajnoczi:


1. Test on i7 Laptop with Cpu governor ondemand.

  v0.14.1
  bw=63492KB/s iops=15873
  bw=63221KB/s iops=15805

  v1.0
  bw=36696KB/s iops=9173
  bw=37404KB/s iops=9350

  master
  bw=36396KB/s iops=9099
  bw=34182KB/s iops=8545

  Change the Cpu governor to performance
  master
  bw=81756KB/s iops=20393
  bw=81453KB/s iops=20257

Interesting finding.  Did you show the 0.14.1 results with
performance governor?



Hi Stefan,
all results are with ondemand except the one where I changed it to
performance

Do you want a v0.14.1 test with the governor on performance?

Yes, the reason why that would be interesting is because it allows us
to put the performance gain with master+performance into
perspective.  We could see how much of a change we get.



Me too, I would be interested in seeing 0.14.1 being tested with 
performance governor so to compare it to master with performance 
governor, to make sure that this is not a regression.


BTW, I'll take the opportunity to say that 15.8 or 20.3 k IOPS are very 
low figures compared to what I'd instinctively expect from a 
paravirtualized block driver.
There are now PCIe SSD cards that do 240 k IOPS (e.g. OCZ RevoDrive 3 
x2 max iops) which is 12-15 times higher, for something that has to go 
through a real driver and a real PCI-express bus, and can't use 
zero-copy techniques.
The IOPS we can give to a VM is currently less than half that of a 
single SSD SATA drive (60 k IOPS or so, these days).
That's why I consider this topic of virtio-blk performances very 
important. I hope there can be improvements in this sector...


Thanks for your time
R.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html