Re: KVM performance Java server/MySQL...

2013-02-13 Thread Erik Brakkee
quote who=David Cruz
 Other optimizations people are testing out there.

 - use nohz=off in the kernel loading line y menu.lst
 - Disable Cgroups completely. Using cgclear, and turning off cgred
 cg-config daemons.

 And from a Personal point of view, we've always tried to use MySQL in
 a different server from JBoss.
 99% of the times is far better for performance and tuning.

I am still having problems. Running mysql and JBoss on different VMs is
significantly slower, but that appears to be a pure application issue
relating to the network overhead in a huge amount of queries. This is of
course not an issue for this mailing list.

However, I got a performance degradation of running the same test with
MySQL and JBoss colocated and increasing the CPU count from 4 to 8. Then I
did various tricks to improve performance. Pinning the processes did not
have a significant effect. What appeared to have an effect was to remove
memory ballooning.

But now I get inconsistent results, varying from 21 seconds to 27 seconds.
The 21 figure is acceptable but the 27 figure is not. The tests are being
done on new hardware (see previous posts), with basically the only thing
running the virtual machine.

Summary of configuration:
- host and guest: centos 6.3, transparent hugepages
- vm: cpu mode is host-passthrough, pinning to the cores of one processor,
  removed tablet, sound, and USB devices
- host: hyperthreading switched off in BIOS

Do you have any idea what this could be? I expect it is somehow NUMA
related, but how would I troubleshoot this? Ideally I would like to make
sure the entire VM runs on one CPU and allocates memory from that CPU and
never moves (or both CPU and memory move together). I saw some
presentations on the internet about NUMA work being done for linux. Do you
have any suggestions?

My domain.xml is given below:

domain type='kvm' id='37'
  namemaster-data05-v50/name
  uuid79ddd84d-937e-357b-8e57-c7f487dc3464/uuid
  memory unit='KiB'8388608/memory
  currentMemory unit='KiB'8388608/currentMemory
  vcpu placement='static'8/vcpu
  cputune
vcpupin vcpu='0' cpuset='0'/
vcpupin vcpu='1' cpuset='2'/
vcpupin vcpu='2' cpuset='4'/
vcpupin vcpu='3' cpuset='6'/
vcpupin vcpu='4' cpuset='8'/
vcpupin vcpu='5' cpuset='10'/
vcpupin vcpu='6' cpuset='12'/
vcpupin vcpu='7' cpuset='14'/
  /cputune
  os
type arch='x86_64' machine='rhel6.3.0'hvm/type
boot dev='cdrom'/
boot dev='hd'/
  /os
  features
acpi/
apic/
  /features
  cpu mode='host-passthrough'
  /cpu
  clock offset='utc'/
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashrestart/on_crash
  devices
emulator/usr/libexec/qemu-kvm/emulator
disk type='file' device='cdrom'
  driver name='qemu' type='raw'/
  target dev='hdc' bus='ide'/
  readonly/
  alias name='ide0-1-0'/
  address type='drive' controller='0' bus='1' target='0' unit='0'/
/disk
disk type='block' device='disk'
  driver name='qemu' type='raw' cache='none' io='native'/
  source dev='/dev/raid5/v50disk1'/
  target dev='vda' bus='virtio'/
  alias name='virtio-disk0'/
  address type='pci' domain='0x' bus='0x00' slot='0x05'
function='0x0'/
/disk
disk type='block' device='disk'
  driver name='qemu' type='raw' cache='none' io='native'/
  source dev='/dev/vg_system/v50disk2'/
  target dev='vdb' bus='virtio'/
  alias name='virtio-disk1'/
  address type='pci' domain='0x' bus='0x00' slot='0x08'
function='0x0'/
/disk
disk type='block' device='disk'
  driver name='qemu' type='raw' cache='none' io='native'/
  source dev='/dev/raid5/v50disk3'/
  target dev='vdc' bus='virtio'/
  alias name='virtio-disk2'/
  address type='pci' domain='0x' bus='0x00' slot='0x09'
function='0x0'/
/disk
disk type='file' device='disk'
  driver name='qemu' type='raw' cache='none'/
  source file='/var/mydata/images/configdisks/v50/configdisk.img'/
  target dev='vdz' bus='virtio'/
  alias name='virtio-disk25'/
  address type='pci' domain='0x' bus='0x00' slot='0x07'
function='0x0'/
/disk
controller type='usb' index='0'
  alias name='usb0'/
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x2'/
/controller
controller type='ide' index='0'
  alias name='ide0'/
  address type='pci' domain='0x' bus='0x00' slot='0x01'
function='0x1'/
/controller
interface type='bridge'
  mac address='52:54:00:00:01:50'/
  source bridge='br0'/
  target dev='vnet0'/
  model type='virtio'/
  alias name='net0'/
  address type='pci' domain='0x' bus='0x00' slot='0x03'
function='0x0'/
/interface
serial type='pty'
  source path='/dev/pts/1'/
  target port='0'/
  alias name='serial0'/
/serial
console type='pty' tty='/dev/pts/1'
  source path='/dev/pts/1'/
  target type='serial' port='0'/
  alias name='serial0'/
/console
 

Re: KVM performance Java server/MySQL...

2013-02-13 Thread Erik Brakkee
quote who=Erik Brakkee
 quote who=David Cruz
 Other optimizations people are testing out there.

 - use nohz=off in the kernel loading line y menu.lst
 - Disable Cgroups completely. Using cgclear, and turning off cgred
 cg-config daemons.

I also tried this option but it did not have a significant effect and
degraded performance a bit, although it is difficult to tell because of
the variations in the measured times.

On a positive note, I have managed to get a significant performance
improvement by checking the BIOS. When we got the server, it was
configured to use the Performance Per Watt profile. All current tests were
run with CPU scaling enabled (and also recognized by the linux kernel).
What I did to get increased performance was to enable the 'Performance'
profile instead and let the hardware do the CPU scaling (this is what I
understand that the option means).

Now the measured times have dropped from a consistent 19 seconds on the
physical host to a consistent 10 seconds on the physical host. On the
virtual host, the variable times from 21-27 seconds have now turned into
consistent times of approximately 13.5 seconds. This is with the nohz=off
kernel option. With the nohz=off option removed, I get consistent times
that are mostly a bit under 13 seconds (12.8 upwards). These results are
for a VM with 4 virtual CPUs that are also pinned to cores of the first
CPU.

From the looks of it, the kernel we are using does not fully understand
the CPU it is dealing with. Also, I am doubtful that we ever saw the
actual CPU frequences when doing 'cat /proc/cpuinfo'.

Also, positively, increasing the VCPU count from 4 to 8 and also not
pinning them does not deteriorate performance. In fact, it looks like it
increases performance even a bit, leading to an overhead of 20%
approximately.

All in all, the performance results are acceptable now looking at the
absolute figures. In fact, I am quite happy with this because it means we
can continue to take this setup in production.  However, there still
appears to be a 20% penalty for virtualization in our specific case.

Do you have any ideas on how to get to the bottom of this? Being able to
squeeze the last 20% out of it would really make us even more happy (my
day is already made with this BIOS setting).

Any ideas?



 However, I got a performance degradation of running the same test with
 MySQL and JBoss colocated and increasing the CPU count from 4 to 8. Then I
 did various tricks to improve performance. Pinning the processes did not
 have a significant effect. What appeared to have an effect was to remove
 memory ballooning.

 But now I get inconsistent results, varying from 21 seconds to 27 seconds.
 The 21 figure is acceptable but the 27 figure is not. The tests are being
 done on new hardware (see previous posts), with basically the only thing
 running the virtual machine.

 Summary of configuration:
 - host and guest: centos 6.3, transparent hugepages
 - vm: cpu mode is host-passthrough, pinning to the cores of one processor,
   removed tablet, sound, and USB devices
 - host: hyperthreading switched off in BIOS

 Do you have any idea what this could be? I expect it is somehow NUMA
 related, but how would I troubleshoot this? Ideally I would like to make
 sure the entire VM runs on one CPU and allocates memory from that CPU and
 never moves (or both CPU and memory move together). I saw some
 presentations on the internet about NUMA work being done for linux. Do you
 have any suggestions?

 My domain.xml is given below:

 domain type='kvm' id='37'
   namemaster-data05-v50/name
   uuid79ddd84d-937e-357b-8e57-c7f487dc3464/uuid
   memory unit='KiB'8388608/memory
   currentMemory unit='KiB'8388608/currentMemory
   vcpu placement='static'8/vcpu
   cputune
 vcpupin vcpu='0' cpuset='0'/
 vcpupin vcpu='1' cpuset='2'/
 vcpupin vcpu='2' cpuset='4'/
 vcpupin vcpu='3' cpuset='6'/
 vcpupin vcpu='4' cpuset='8'/
 vcpupin vcpu='5' cpuset='10'/
 vcpupin vcpu='6' cpuset='12'/
 vcpupin vcpu='7' cpuset='14'/
   /cputune
   os
 type arch='x86_64' machine='rhel6.3.0'hvm/type
 boot dev='cdrom'/
 boot dev='hd'/
   /os
   features
 acpi/
 apic/
   /features
   cpu mode='host-passthrough'
   /cpu
   clock offset='utc'/
   on_poweroffdestroy/on_poweroff
   on_rebootrestart/on_reboot
   on_crashrestart/on_crash
   devices
 emulator/usr/libexec/qemu-kvm/emulator
 disk type='file' device='cdrom'
   driver name='qemu' type='raw'/
   target dev='hdc' bus='ide'/
   readonly/
   alias name='ide0-1-0'/
   address type='drive' controller='0' bus='1' target='0' unit='0'/
 /disk
 disk type='block' device='disk'
   driver name='qemu' type='raw' cache='none' io='native'/
   source dev='/dev/raid5/v50disk1'/
   target dev='vda' bus='virtio'/
   alias name='virtio-disk0'/
   address type='pci' domain='0x' bus='0x00' slot='0x05'
 function='0x0'/
 /disk
 disk type='block' device='disk'

Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee
quote who=Gleb Natapov
 On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
 Hi,


 We have been benchmarking a java server application (java 6 update 29)
 that requires a mysql database. The scenario is quite simple. We open a
 web page which displays a lot of search results. To get the content of
 the
 page one big query is done with many smaller queries to retrieve the
 data.
 The test from the java side is single threaded.

 We have used the following deployment scenarios:
 1. JBoss in VM, MySql in separate VM
 2. JBoss in VM, MySQL native
 3. JBoss native, MySQL in vm.
 4. JBoss native and MySQL native on the same physical machine
 5. JBoss and MySQL virtualized on the same VM.

 What we see is that the performance (time to execute) is practically the
 same for all scenarios (approx. 30 seconds), except for scenario 4 that
 takes approx. 21 seconds. This difference is quite large and contrasts
 many other test on the internet and other benchmarks we did previously.

 We have tried pinning the VMs, turning hyperthreading off, varying the
 CPU
 model (including host-passthrough), but this did not have any
 significant
 impact.

 The hardware on which we are running is a dual socket E5-2650 machine
 with
 64 GB memory. The server is a Dell poweredge R720 server with SAS disks,
 RAID controller with battery backup (writeback cache). Transparent huge
 pages is turned on.

 We are at a loss to explain the differences in the test. In particular,
 we
 would have expected the least performance when both were running
 virtualized and we would have expected a better performance when JBoss
 and
 MySQL were running virtualized in the same VM as compared to JBoss and
 MySQL both running in different virtual machines. It looks like we are
 dealing with multiple issues here and not just one.

 Right now we have a 30% penalty for running virtualized which is too
 much
 for us; 10% would be allright. What would you suggest to do to
 troubleshoot this further?


 What is you kernel/qemu versions and command line you are using to start
 a VM?

Centos 6.3, 2.6.32-279.22.1.el6.x86_64
 rpm -qf /usr/libexec/qemu-kvm
qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64

The guest is also running centos 6.3 with the same settings. Settings that
can influence Java performance (such as transparent huge pages) are turned
on on both the host and guest (see the remark on hugepages below).

The command-line is as follows:

/usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
-mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
-nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-drive
file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive
file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
-drive
file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
-drive
file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
-netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6



This virtual machine has three virtio disks, and one file based disk. The
last disk is about 100MB in size and is used only during startup (contains
configurationd data for initializing the vm) and is only read and never
written. It has one CDrom which is not used. It also uses old-style
hugepages. Apparently this did not have any significant effect on
performance over transparent hugepages (as would be expected). We
configured these old style hugepages just to rule out any issue with
transparent hugepages.

Initially we got 30% performance penalty with 16 processors, but in the
current setting of using 4 processors we see a reduced performance penalty
of 15-20%. Also on the physical host, we are not running the numad daemon
at the moment. Also, we tried disabling hyperthreading in the host's BIOS
but the measurements do not change significantly.

The 

Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee
quote who=Erik Brakkee
 The IO scheduler on the host and on the guest is CFS. We also tried with
 deadline scheduler on the host but this did not make any measurable
 difference. We did not try no-op on the host.

I mean of course that we did not try no-op on the guest (not on the host).

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance Java server/MySQL...

2013-02-08 Thread Erik Brakkee
quote who=Erik Brakkee
 quote who=Gleb Natapov
 On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
 Hi,


 We have been benchmarking a java server application (java 6 update 29)
 that requires a mysql database. The scenario is quite simple. We open a
 web page which displays a lot of search results. To get the content of
 the
 page one big query is done with many smaller queries to retrieve the
 data.
 The test from the java side is single threaded.

 We have used the following deployment scenarios:
 1. JBoss in VM, MySql in separate VM
 2. JBoss in VM, MySQL native
 3. JBoss native, MySQL in vm.
 4. JBoss native and MySQL native on the same physical machine
 5. JBoss and MySQL virtualized on the same VM.

 What we see is that the performance (time to execute) is practically
 the
 same for all scenarios (approx. 30 seconds), except for scenario 4 that
 takes approx. 21 seconds. This difference is quite large and contrasts
 many other test on the internet and other benchmarks we did previously.

 We have tried pinning the VMs, turning hyperthreading off, varying the
 CPU
 model (including host-passthrough), but this did not have any
 significant
 impact.

 The hardware on which we are running is a dual socket E5-2650 machine
 with
 64 GB memory. The server is a Dell poweredge R720 server with SAS
 disks,
 RAID controller with battery backup (writeback cache). Transparent huge
 pages is turned on.

 We are at a loss to explain the differences in the test. In particular,
 we
 would have expected the least performance when both were running
 virtualized and we would have expected a better performance when JBoss
 and
 MySQL were running virtualized in the same VM as compared to JBoss and
 MySQL both running in different virtual machines. It looks like we are
 dealing with multiple issues here and not just one.

 Right now we have a 30% penalty for running virtualized which is too
 much
 for us; 10% would be allright. What would you suggest to do to
 troubleshoot this further?


 What is you kernel/qemu versions and command line you are using to start
 a VM?

 Centos 6.3, 2.6.32-279.22.1.el6.x86_64
 rpm -qf /usr/libexec/qemu-kvm
 qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64

 The guest is also running centos 6.3 with the same settings. Settings that
 can influence Java performance (such as transparent huge pages) are turned
 on on both the host and guest (see the remark on hugepages below).

 The command-line is as follows:

 /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
 -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
 master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
 -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
 -drive
 file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 -drive
 file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
 -drive
 file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
 -drive
 file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6



 This virtual machine has three virtio disks, and one file based disk. The
 last disk is about 100MB in size and is used only during startup (contains
 configurationd data for initializing the vm) and is only read and never
 written. It has one CDrom which is not used. It also uses old-style
 hugepages. Apparently this did not have any significant effect on
 performance over transparent hugepages (as would be expected). We
 configured these old style hugepages just to rule out any issue with
 transparent hugepages.

 Initially we got 30% performance penalty with 16 processors, but in the
 current setting of using 4 processors we see a reduced performance penalty
 of 15-20%. Also on the physical host, we are not running the numad daemon
 at the moment. Also, we tried disabling hyperthreading 

Re: KVM performance Java server/MySQL...

2013-02-08 Thread David Cruz
Other optimizations people are testing out there.

- use nohz=off in the kernel loading line y menu.lst
- Disable Cgroups completely. Using cgclear, and turning off cgred
cg-config daemons.

And from a Personal point of view, we've always tried to use MySQL in
a different server from JBoss.
99% of the times is far better for performance and tuning.

David

2013/2/8 Erik Brakkee e...@brakkee.org:
 quote who=Erik Brakkee
 quote who=Gleb Natapov
 On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
 Hi,


 We have been benchmarking a java server application (java 6 update 29)
 that requires a mysql database. The scenario is quite simple. We open a
 web page which displays a lot of search results. To get the content of
 the
 page one big query is done with many smaller queries to retrieve the
 data.
 The test from the java side is single threaded.

 We have used the following deployment scenarios:
 1. JBoss in VM, MySql in separate VM
 2. JBoss in VM, MySQL native
 3. JBoss native, MySQL in vm.
 4. JBoss native and MySQL native on the same physical machine
 5. JBoss and MySQL virtualized on the same VM.

 What we see is that the performance (time to execute) is practically
 the
 same for all scenarios (approx. 30 seconds), except for scenario 4 that
 takes approx. 21 seconds. This difference is quite large and contrasts
 many other test on the internet and other benchmarks we did previously.

 We have tried pinning the VMs, turning hyperthreading off, varying the
 CPU
 model (including host-passthrough), but this did not have any
 significant
 impact.

 The hardware on which we are running is a dual socket E5-2650 machine
 with
 64 GB memory. The server is a Dell poweredge R720 server with SAS
 disks,
 RAID controller with battery backup (writeback cache). Transparent huge
 pages is turned on.

 We are at a loss to explain the differences in the test. In particular,
 we
 would have expected the least performance when both were running
 virtualized and we would have expected a better performance when JBoss
 and
 MySQL were running virtualized in the same VM as compared to JBoss and
 MySQL both running in different virtual machines. It looks like we are
 dealing with multiple issues here and not just one.

 Right now we have a 30% penalty for running virtualized which is too
 much
 for us; 10% would be allright. What would you suggest to do to
 troubleshoot this further?


 What is you kernel/qemu versions and command line you are using to start
 a VM?

 Centos 6.3, 2.6.32-279.22.1.el6.x86_64
 rpm -qf /usr/libexec/qemu-kvm
 qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64

 The guest is also running centos 6.3 with the same settings. Settings that
 can influence Java performance (such as transparent huge pages) are turned
 on on both the host and guest (see the remark on hugepages below).

 The command-line is as follows:

 /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc
 -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name
 master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig
 -nodefaults -chardev
 socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
 -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
 if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device
 ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
 -drive
 file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
 -drive
 file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
 -drive
 file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2
 -drive
 file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none
 -device
 virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25
 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device
 virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3
 -chardev pty,id=charserial0 -device
 isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus
 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6



 This virtual machine has three virtio disks, and one file based disk. The
 last disk is about 100MB in size and is used only during startup (contains
 configurationd data for initializing the vm) and is only read and never
 written. It has one CDrom which is not used. It also uses old-style
 hugepages. Apparently this did not have any significant effect on
 performance over transparent hugepages 

Re: KVM performance Java server/MySQL...

2013-02-07 Thread Gleb Natapov
On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote:
 Hi,
 
 
 We have been benchmarking a java server application (java 6 update 29)
 that requires a mysql database. The scenario is quite simple. We open a
 web page which displays a lot of search results. To get the content of the
 page one big query is done with many smaller queries to retrieve the data.
 The test from the java side is single threaded.
 
 We have used the following deployment scenarios:
 1. JBoss in VM, MySql in separate VM
 2. JBoss in VM, MySQL native
 3. JBoss native, MySQL in vm.
 4. JBoss native and MySQL native on the same physical machine
 5. JBoss and MySQL virtualized on the same VM.
 
 What we see is that the performance (time to execute) is practically the
 same for all scenarios (approx. 30 seconds), except for scenario 4 that
 takes approx. 21 seconds. This difference is quite large and contrasts
 many other test on the internet and other benchmarks we did previously.
 
 We have tried pinning the VMs, turning hyperthreading off, varying the CPU
 model (including host-passthrough), but this did not have any significant
 impact.
 
 The hardware on which we are running is a dual socket E5-2650 machine with
 64 GB memory. The server is a Dell poweredge R720 server with SAS disks,
 RAID controller with battery backup (writeback cache). Transparent huge
 pages is turned on.
 
 We are at a loss to explain the differences in the test. In particular, we
 would have expected the least performance when both were running
 virtualized and we would have expected a better performance when JBoss and
 MySQL were running virtualized in the same VM as compared to JBoss and
 MySQL both running in different virtual machines. It looks like we are
 dealing with multiple issues here and not just one.
 
 Right now we have a 30% penalty for running virtualized which is too much
 for us; 10% would be allright. What would you suggest to do to
 troubleshoot this further?


What is you kernel/qemu versions and command line you are using to start
a VM?
 
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Andrew Theurer wrote:

I wanted to share some performance data for KVM and Xen.  I thought it
would be interesting to share some performance results especially
compared to Xen, using a more complex situation like heterogeneous
server consolidation.

The Workload:
The workload is one that simulates a consolidation of servers on to a
single host.  There are 3 server types: web, imap, and app (j2ee).  In
addition, there are other helper servers which are also consolidated:
a db server, which helps out with the app server, and an nfs server,
which helps out with the web server (a portion of the docroot is nfs
mounted).  There is also one other server that is simply idle.  All 6
servers make up one set.  The first 3 server types are sent requests,
which in turn may send requests to the db and nfs helper servers.  The
request rate is throttled to produce a fixed amount of work.  In order
to increase utilization on the host, more sets of these servers are
used.  The clients which send requests also have a response time
requirement which is monitored.  The following results have passed the
response time requirements.



What's the typical I/O load (disk and network bandwidth) while the tests 
are running?



The host hardware:
A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x
1 GB Ethenret


CPU time measurements with SMT can vary wildly if the system is not 
fully loaded.  If the scheduler happens to schedule two threads on a 
single core, both of these threads will generate less work compared to 
if they were scheduled on different cores.




Test Results:
The throughput is equal in these tests, as the clients throttle the work
(this is assuming you don't run out of a resource on the host).  What's
telling is the CPU used to do the same amount of work:

Xen:  52.85%
KVM:  66.93%

So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of
work. Here's the breakdown:

totalusernice  system irq softirq   guest
66.907.200.00   12.940.353.39   43.02

Comparing guest time to all other busy time, that's a 23.88/43.02 = 55%
overhead for virtualization.  I certainly don't expect it to be 0, but
55% seems a bit high.  So, what's the reason for this overhead?  At the
bottom is oprofile output of top functions for KVM.  Some observations:

1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the thread 
pool used for I/O.



2) cpu_physical_memory_rw due to not using preadv/pwritev?


I think both virtio-net and virtio-blk use memcpy().


3) vmx_[save|load]_host_state: I take it this is from guest switches?


These are called when you context-switch from a guest, and, much more 
frequently, when you enter qemu.



We have 180,000 context switches a second.  Is this more than expected?



Way more.  Across 16 logical cpus, this is 10,000 cs/sec/cpu.


I wonder if schedstats can show why we context switch (need to let
someone else run, yielded, waiting on io, etc).



Yes, there is a scheduler tracer, though I have no idea how to operate it.

Do you have kvm_stat logs?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Andrew Theurer

Avi Kivity wrote:

Andrew Theurer wrote:

I wanted to share some performance data for KVM and Xen.  I thought it
would be interesting to share some performance results especially
compared to Xen, using a more complex situation like heterogeneous
server consolidation.

The Workload:
The workload is one that simulates a consolidation of servers on to a
single host.  There are 3 server types: web, imap, and app (j2ee).  In
addition, there are other helper servers which are also consolidated:
a db server, which helps out with the app server, and an nfs server,
which helps out with the web server (a portion of the docroot is nfs
mounted).  There is also one other server that is simply idle.  All 6
servers make up one set.  The first 3 server types are sent requests,
which in turn may send requests to the db and nfs helper servers.  The
request rate is throttled to produce a fixed amount of work.  In order
to increase utilization on the host, more sets of these servers are
used.  The clients which send requests also have a response time
requirement which is monitored.  The following results have passed the
response time requirements.



What's the typical I/O load (disk and network bandwidth) while the 
tests are running?

This is average thrgoughput:
network:Tx: 79 MB/sec  Rx: 5 MB/sec
disk:read: 17 MB/sec  write: 40 MB/sec



The host hardware:
A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x
1 GB Ethenret


CPU time measurements with SMT can vary wildly if the system is not 
fully loaded.  If the scheduler happens to schedule two threads on a 
single core, both of these threads will generate less work compared to 
if they were scheduled on different cores.
Understood.  Even if at low loads, the scheduler does the right thing 
and spreads out to all the cores first, once it goes beyond 50% util, 
the CPU util can climb at a much higher rate (compared to a linear 
increase in work) because it then starts scheduling 2 threads per core, 
and each thread can do less work.  I have always wanted something which 
could more accurately show the utilization of a processor core, but I 
guess we have to use what we have today.  I will run again with SMT off 
to see what we get.




Test Results:
The throughput is equal in these tests, as the clients throttle the work
(this is assuming you don't run out of a resource on the host).  What's
telling is the CPU used to do the same amount of work:

Xen:  52.85%
KVM:  66.93%

So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of
work. Here's the breakdown:

totalusernice  system irq softirq   guest
66.907.200.00   12.940.353.39   43.02

Comparing guest time to all other busy time, that's a 23.88/43.02 = 55%
overhead for virtualization.  I certainly don't expect it to be 0, but
55% seems a bit high.  So, what's the reason for this overhead?  At the
bottom is oprofile output of top functions for KVM.  Some observations:

1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the thread 
pool used for I/O.
I have a older patch which makes a small change to posix_aio_thread.c by 
trying to keep the thread pool size a bit lower than it is today.  I 
will dust that off and see if it helps.



2) cpu_physical_memory_rw due to not using preadv/pwritev?


I think both virtio-net and virtio-blk use memcpy().


3) vmx_[save|load]_host_state: I take it this is from guest switches?


These are called when you context-switch from a guest, and, much more 
frequently, when you enter qemu.



We have 180,000 context switches a second.  Is this more than expected?



Way more.  Across 16 logical cpus, this is 10,000 cs/sec/cpu.


I wonder if schedstats can show why we context switch (need to let
someone else run, yielded, waiting on io, etc).



Yes, there is a scheduler tracer, though I have no idea how to operate 
it.


Do you have kvm_stat logs?
Sorry, I don't, but I'll run that next time.  BTW, I did not notice a 
batch/log mode the last time I ram kvm_stat.  Or maybe it was not 
obvious to me.  Is there an ideal way to run kvm_stat without a curses 
like output?


-Andrew


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Andrew Theurer wrote:

Avi Kivity wrote:




What's the typical I/O load (disk and network bandwidth) while the 
tests are running?

This is average thrgoughput:
network:Tx: 79 MB/sec  Rx: 5 MB/sec


MB as in Byte or Mb as in bit?


disk:read: 17 MB/sec  write: 40 MB/sec


This could definitely cause the extra load, especially if it's many 
small requests (compared to a few large ones).



The host hardware:
A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 
4 x

1 GB Ethenret


CPU time measurements with SMT can vary wildly if the system is not 
fully loaded.  If the scheduler happens to schedule two threads on a 
single core, both of these threads will generate less work compared 
to if they were scheduled on different cores.
Understood.  Even if at low loads, the scheduler does the right thing 
and spreads out to all the cores first, once it goes beyond 50% util, 
the CPU util can climb at a much higher rate (compared to a linear 
increase in work) because it then starts scheduling 2 threads per 
core, and each thread can do less work.  I have always wanted 
something which could more accurately show the utilization of a 
processor core, but I guess we have to use what we have today.  I will 
run again with SMT off to see what we get.


On the other hand, without SMT you will get to overcommit much faster, 
so you'll have scheduling artifacts.  Unfortunately there's no good 
answer here (except to improve the SMT scheduler).


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.
I have a older patch which makes a small change to posix_aio_thread.c 
by trying to keep the thread pool size a bit lower than it is today.  
I will dust that off and see if it helps.


Really, I think linux-aio support can help here.



Yes, there is a scheduler tracer, though I have no idea how to 
operate it.


Do you have kvm_stat logs?
Sorry, I don't, but I'll run that next time.  BTW, I did not notice a 
batch/log mode the last time I ram kvm_stat.  Or maybe it was not 
obvious to me.  Is there an ideal way to run kvm_stat without a curses 
like output?


You're probably using an ancient version:

$ kvm_stat --help
Usage: kvm_stat [options]

Options:
 -h, --helpshow this help message and exit
 -1, --once, --batch   run in batch mode for one second
 -l, --log run in logging mode (like vmstat)
 -f FIELDS, --fields=FIELDS
   fields to display (regex)


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Andrew Theurer

Avi Kivity wrote:

Andrew Theurer wrote:

Avi Kivity wrote:




What's the typical I/O load (disk and network bandwidth) while the 
tests are running?

This is average thrgoughput:
network:Tx: 79 MB/sec  Rx: 5 MB/sec


MB as in Byte or Mb as in bit?
Byte.  There are 4 x 1 Gb adapters, each handling about 20 MB/sec or 160 
Mbit/sec.



disk:read: 17 MB/sec  write: 40 MB/sec


This could definitely cause the extra load, especially if it's many 
small requests (compared to a few large ones).
I don't have the request sizes at my fingertips, but we have to use a 
lot of disks to support this I/O, so I think it's safe to assume there 
are a lot more requests than a simple large sequential read/write.



The host hardware:
A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of 
disks, 4 x

1 GB Ethenret


CPU time measurements with SMT can vary wildly if the system is not 
fully loaded.  If the scheduler happens to schedule two threads on a 
single core, both of these threads will generate less work compared 
to if they were scheduled on different cores.
Understood.  Even if at low loads, the scheduler does the right thing 
and spreads out to all the cores first, once it goes beyond 50% util, 
the CPU util can climb at a much higher rate (compared to a linear 
increase in work) because it then starts scheduling 2 threads per 
core, and each thread can do less work.  I have always wanted 
something which could more accurately show the utilization of a 
processor core, but I guess we have to use what we have today.  I 
will run again with SMT off to see what we get.


On the other hand, without SMT you will get to overcommit much faster, 
so you'll have scheduling artifacts.  Unfortunately there's no good 
answer here (except to improve the SMT scheduler).


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.
I have a older patch which makes a small change to posix_aio_thread.c 
by trying to keep the thread pool size a bit lower than it is today.  
I will dust that off and see if it helps.


Really, I think linux-aio support can help here.
Yes, I think that would work for real block devices, but would that help 
for files?  I am using real block devices right now, but it would be 
nice to also see a benefit for files in a file-system.  Or maybe I am 
mis-understanding this, and linux-aio can be used on files?


-Andrew





Yes, there is a scheduler tracer, though I have no idea how to 
operate it.


Do you have kvm_stat logs?
Sorry, I don't, but I'll run that next time.  BTW, I did not notice a 
batch/log mode the last time I ram kvm_stat.  Or maybe it was not 
obvious to me.  Is there an ideal way to run kvm_stat without a 
curses like output?


You're probably using an ancient version:

$ kvm_stat --help
Usage: kvm_stat [options]

Options:
 -h, --helpshow this help message and exit
 -1, --once, --batch   run in batch mode for one second
 -l, --log run in logging mode (like vmstat)
 -f FIELDS, --fields=FIELDS
   fields to display (regex)





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Anthony Liguori

Avi Kivity wrote:


1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the thread 
pool used for I/O.


This is why I wrote the linux-aio patch.  It only reduced CPU 
consumption by about 2% although I'm not sure if that's absolute or 
relative.  Andrew?



2) cpu_physical_memory_rw due to not using preadv/pwritev?


I think both virtio-net and virtio-blk use memcpy().


With latest linux-2.6, and a development snapshot of glibc, virtio-blk 
will not use memcpy() anymore but virtio-net still does on the receive 
path (but not transmit).


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Anthony Liguori

Andrew Theurer wrote:


Really, I think linux-aio support can help here.
Yes, I think that would work for real block devices, but would that 
help for files?  I am using real block devices right now, but it would 
be nice to also see a benefit for files in a file-system.  Or maybe I 
am mis-understanding this, and linux-aio can be used on files?


For cache=off, with some file systems, yes.  But not for 
cache=writethrough/writeback.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Andrew Theurer wrote:





disk:read: 17 MB/sec  write: 40 MB/sec


This could definitely cause the extra load, especially if it's many 
small requests (compared to a few large ones).
I don't have the request sizes at my fingertips, but we have to use a 
lot of disks to support this I/O, so I think it's safe to assume there 
are a lot more requests than a simple large sequential read/write.


Yes.  Well the high context switch rate is the scheduler's way of 
telling us to use linux-aio.  If lot's of disks == 100, with a 3ms 
seek time, that's already 60,000 cs/sec.



Really, I think linux-aio support can help here.
Yes, I think that would work for real block devices, but would that 
help for files?  I am using real block devices right now, but it would 
be nice to also see a benefit for files in a file-system.  Or maybe I 
am mis-understanding this, and linux-aio can be used on files?


It could work with files with cache=none (though not qcow2 as now written).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Anthony Liguori wrote:



2) cpu_physical_memory_rw due to not using preadv/pwritev?


I think both virtio-net and virtio-blk use memcpy().


With latest linux-2.6, and a development snapshot of glibc, virtio-blk 
will not use memcpy() anymore but virtio-net still does on the receive 
path (but not transmit).


There's still the kernel/user copy, so we have two copies on rx, one on tx.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Anthony Liguori wrote:

Avi Kivity wrote:


1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.


This is why I wrote the linux-aio patch.  It only reduced CPU 
consumption by about 2% although I'm not sure if that's absolute or 
relative.  Andrew?


Was that before or after the entire path was made copyless?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Andrew Theurer

Avi Kivity wrote:

Anthony Liguori wrote:

Avi Kivity wrote:


1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.


This is why I wrote the linux-aio patch.  It only reduced CPU 
consumption by about 2% although I'm not sure if that's absolute or 
relative.  Andrew?
If  I recall correctly, it was 2.4% and relative.  But with 2.3% in 
scheduler functions, that's what I expected.


Was that before or after the entire path was made copyless?
If this is referring to the preadv/writev support, no, I have not tested 
with that.


-Andrew


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:



2) cpu_physical_memory_rw due to not using preadv/pwritev?


I think both virtio-net and virtio-blk use memcpy().


With latest linux-2.6, and a development snapshot of glibc, 
virtio-blk will not use memcpy() anymore but virtio-net still does on 
the receive path (but not transmit).


There's still the kernel/user copy, so we have two copies on rx, one 
on tx.


That won't show up as cpu_physical_memory_rw.  stl_phys/ldl_phys are 
suspect though as they degrade to cpu_physical_memory_rw.


Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Anthony Liguori

Avi Kivity wrote:

Anthony Liguori wrote:

Avi Kivity wrote:


1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.


This is why I wrote the linux-aio patch.  It only reduced CPU 
consumption by about 2% although I'm not sure if that's absolute or 
relative.  Andrew?


Was that before or after the entire path was made copyless?


Before so it's worth updating and trying again.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Anthony Liguori

Andrew Theurer wrote:

Avi Kivity wrote:

Anthony Liguori wrote:

Avi Kivity wrote:


1) I'm seeing about 2.3% in scheduler functions [that I recognize].
Does that seems a bit excessive?


Yes, it is.  If there is a lot of I/O, this might be due to the 
thread pool used for I/O.


This is why I wrote the linux-aio patch.  It only reduced CPU 
consumption by about 2% although I'm not sure if that's absolute or 
relative.  Andrew?
If  I recall correctly, it was 2.4% and relative.  But with 2.3% in 
scheduler functions, that's what I expected.


Was that before or after the entire path was made copyless?
If this is referring to the preadv/writev support, no, I have not 
tested with that.


Previously, the block API only exposed non-vector interfaces and bounced 
vectored operations to a linear buffer.  That's been eliminated now 
though so we need to update the linux-aio patch to implement a vectored 
backend interface.


However, it is an apples to apples comparison in terms of copying since 
the same is true with the thread pool.  My take away was that the thread 
pool overhead isn't the major source of issues.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Avi Kivity

Anthony Liguori wrote:


Previously, the block API only exposed non-vector interfaces and 
bounced vectored operations to a linear buffer.  That's been 
eliminated now though so we need to update the linux-aio patch to 
implement a vectored backend interface.


However, it is an apples to apples comparison in terms of copying 
since the same is true with the thread pool.  My take away was that 
the thread pool overhead isn't the major source of issues.


If the overhead is dominated by copying, then you won't see the 
difference.  Once the copying is eliminated, the comparison may yield 
different results.  We should certainly see a difference in context 
switches.


One cause of context switches won't be eliminated - the non-saturating 
workload causes us to switch to the idle thread, which incurs a 
heavyweight exit.  This doesn't matter since we're idle anyway, but when 
we switch back, we incur a heavyweight entry.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-30 Thread Andrew Theurer
Here are the SMT off results.  This workload is designed to not 
over-saturate the CPU, so you have to pick a number of server sets to 
ensure that.  With SMT on, 4 sets was enough for KVM, but 5 was too much 
(start seeing response time errors).  For SMT off, I tried to size the 
load as high as we can go without running into these errors.  For KVM, 
thats 3 (18 guests) and for Xen, that's 4 (24 guests).  The throughout 
has a fairly linear relationship to the number of server sets used, but 
has a bit of wiggle room (mostly affected by response times getting 
longer and longer, but not exceeding the requirement set forth).  
Anyway, the relative throughput for these are 1.0 for KVM and 1.34 
for Xen.  The CPU is 78.71% for KVM the CPU is 87.83%. 


If we normalize to CPU utilization, Xen is doing 20% more throughput.

Avi Kivity wrote:

Anthony Liguori wrote:


Previously, the block API only exposed non-vector interfaces and 
bounced vectored operations to a linear buffer.  That's been 
eliminated now though so we need to update the linux-aio patch to 
implement a vectored backend interface.


However, it is an apples to apples comparison in terms of copying 
since the same is true with the thread pool.  My take away was that 
the thread pool overhead isn't the major source of issues.


If the overhead is dominated by copying, then you won't see the 
difference.  Once the copying is eliminated, the comparison may yield 
different results.  We should certainly see a difference in context 
switches.
I would like to test this the proper way.  What do I need to do to 
ensure these copies are eliminated?  I am on a 2.6.27 kernel, am I 
missing anything there?  Anthony, would you be willing to provide a 
patch to support the changes in the block API?


One cause of context switches won't be eliminated - the non-saturating 
workload causes us to switch to the idle thread, which incurs a 
heavyweight exit.  This doesn't matter since we're idle anyway, but 
when we switch back, we incur a heavyweight entry.
I have not looked at the schedstat or ftrace yet, but will soon.  Maybe 
it will tell us a little more about the context switches.


Here's a sample of the kvm_stat:

efer_relo  exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall  
insn_emul  insn_emul invlpg   io_exits  irq_exits  irq_injec  irq_windo  
kvm_reque  largepage  mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  
mmu_pte_w  mmu_recyc  mmu_shado  mmu_unsyn  mmu_unsyn  nmi_injec  nmi_windo   
pf_fixed   pf_guest  remote_tl  request_n  signal_ex  tlb_flush
0 233866  53994  20353  16209 119812  0 
 48879  0  0  75666  44917  34772   3984
  0187  0 10  0  0  0  
0  0  0  0  0  0  0202  
0  0  0  0  17698
0 244556  67321  15570  12364 116226  0 
 49865  0  0  69357  56131  32860   4449
  0  -1895  0 19  0  0  0  
0 21 21  0  0  0  0   1117  
0  0  0  0  21586
0 230788  71382  10619   7920 109151  0 
 44354  0  0  62561  60074  28322   4841
  0103  0 13  0  0  0  
0  0  0  0  0  0  0122  
0  0  0  0  22702
0 275259  82605  14326  11148 127293  0 
 53738  0  0  73438  70707  34724   5373
  0859  0 15  0  0  0  
0 21 21  0  0  0  0874  
0  0  0  0  26723
0 250576  58760  20368  16476 128296  0 
 50936  0  0  80439  51219  36329   4621
  0  -1170  0  8  0  0  0  
0 22 22  0  0  0  0   1333  
0  0  0  0  18508
0 244746  59650  19480  15657 122721  0 
 49882  0  0  76011  50453  35352   4523
  0201  0 11  0  0  0  
0 21 21  0  0  0  0212  
0  0  0  0  19163
0 251724  71715  14049  10920 117255  0 
 49924  0  0  70173  58040  32328   5058

RE: KVM performance vs. Xen

2009-04-29 Thread Nakajima, Jun
On 4/29/2009 7:41:50 AM, Andrew Theurer wrote:
 I wanted to share some performance data for KVM and Xen.  I thought it
 would be interesting to share some performance results especially
 compared to Xen, using a more complex situation like heterogeneous
 server consolidation.

 The Workload:
 The workload is one that simulates a consolidation of servers on to a
 single host.  There are 3 server types: web, imap, and app (j2ee).  In
 addition, there are other helper servers which are also
 consolidated: a db server, which helps out with the app server, and an
 nfs server, which helps out with the web server (a portion of the docroot is 
 nfs mounted).
 There is also one other server that is simply idle.  All 6 servers
 make up one set.  The first 3 server types are sent requests, which in
 turn may send requests to the db and nfs helper servers.  The request
 rate is throttled to produce a fixed amount of work.  In order to
 increase utilization on the host, more sets of these servers are used.
 The clients which send requests also have a response time requirement
 which is monitored.  The following results have passed the response
 time requirements.

 The host hardware:
 A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4
 x
 1 GB Ethenret

 The host software:
 Both Xen and KVM use the same host Linux OS, SLES11.  KVM uses the
 2.6.27.19-5-default kernel and Xen uses the 2.6.27.19-5-xen kernel.  I
 have tried 2.6.29 for KVM, but results are actually worse.  KVM
 modules are rebuilt with kvm-85.  Qemu is also from kvm-85.  Xen
 version is 3.3.1_18546_12-3.1.

 The guest software:
 All guests are RedHat 5.3.  The same disk images are used but
 different kernels. Xen uses the RedHat Xen kernel and KVM uses 2.6.29
 with all paravirt build options enabled.  Both use PV I/O drivers.  Software 
 used:
 Apache, PHP, Java, Glassfish, Postgresql, and Dovecot.


Just for clarification. So are you using PV (Xen) Linux on Xen, not HVM? Is 
that 32-bit or 64-bit?

 .
Jun Nakajima | Intel Open Source Technology Center
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance vs. Xen

2009-04-29 Thread Andrew Theurer

Nakajima, Jun wrote:

On 4/29/2009 7:41:50 AM, Andrew Theurer wrote:
  

I wanted to share some performance data for KVM and Xen.  I thought it
would be interesting to share some performance results especially
compared to Xen, using a more complex situation like heterogeneous
server consolidation.

The Workload:
The workload is one that simulates a consolidation of servers on to a
single host.  There are 3 server types: web, imap, and app (j2ee).  In
addition, there are other helper servers which are also
consolidated: a db server, which helps out with the app server, and an
nfs server, which helps out with the web server (a portion of the docroot is 
nfs mounted).
There is also one other server that is simply idle.  All 6 servers
make up one set.  The first 3 server types are sent requests, which in
turn may send requests to the db and nfs helper servers.  The request
rate is throttled to produce a fixed amount of work.  In order to
increase utilization on the host, more sets of these servers are used.
The clients which send requests also have a response time requirement
which is monitored.  The following results have passed the response
time requirements.

The host hardware:
A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4
x
1 GB Ethenret

The host software:
Both Xen and KVM use the same host Linux OS, SLES11.  KVM uses the
2.6.27.19-5-default kernel and Xen uses the 2.6.27.19-5-xen kernel.  I
have tried 2.6.29 for KVM, but results are actually worse.  KVM
modules are rebuilt with kvm-85.  Qemu is also from kvm-85.  Xen
version is 3.3.1_18546_12-3.1.

The guest software:
All guests are RedHat 5.3.  The same disk images are used but
different kernels. Xen uses the RedHat Xen kernel and KVM uses 2.6.29
with all paravirt build options enabled.  Both use PV I/O drivers.  Software 
used:
Apache, PHP, Java, Glassfish, Postgresql, and Dovecot.




Just for clarification. So are you using PV (Xen) Linux on Xen, not HVM? Is 
that 32-bit or 64-bit?
  

PV, 64-bit.

-Andrew

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2009-04-06 Thread Hauke Hoffmann
On Friday 03 April 2009 13:32:50 you wrote:
 Hallo,

 as I want to switch from XEN to KVM I've made some performance tests
 to see if KVM is as peformant as XEN. But tests with a VMU that receives
 a streamed video, adds a small logo to the video and streams it to a
 client
 have shown that XEN performs much betten than KVM.
 In XEN the vlc (videolan client used to receive, process and send the
 video) process
 within the vmu has a cpuload of 33,8 % whereas in KVM
 the vlc process has a cpuload of 99.9 %.
 I'am not sure why, does anybody now some settings to improve
 the KVM performance?

 Thank you.
 Regards, Stefanie.


 Used hardware and settings:
 In the tests I've used the same host hardware for XEN and KVM:
 - Dual Core AMD 2.2 GHz, 8 GB RAM
 - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm
 version 10.fc10 version 74
 also tested in january: compiled kernel with
 kvm-83

 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also
 tested)
   RAM: 256 MB (same for XEN vmu)
   CPU: 1 Core with 2,2 GHz (same for XEN vmu)
   tested nic models: rtl8139, e1000, virtio

 Tested Scenario: VMU receives a streamed video , adds a logo (watermark)
 to the video stream and then streams it to a client

 Results:

 XEN:
 Host cpu load (virt-manager): 23%
 VMU  cpu load (virt-manager): 18 %
 VLC process within VMU (top): 33,8%

 KVM:
 no virt-manager cpu load as I started the vmu with the kvm command
 Host cpu load :   52%
 qemu-kvm process (top)77-100%
 VLC process within vmu (top): 80 - 99,9%

 KVM command to start vmu
 /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net
 nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net
 tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem
 u-ifdown -vnc 127.0.0.1:1 -k de --daemonize

Hi Stefanie,

does vlc perform operations on disc (eg caching, logging, ...)? 

When it cache you can use virtio also for the disk. 
Just change
-hda /images/vmu01.raw
to
-drive file=/images/vmu01.raw,if=virtio,boot=on

Regards
Hauke







 

 Alcatel-Lucent Deutschland AG
 Bell Labs Germany
 Service Infrastructure, ZFZ-SI
 Stefanie Braun
 Phone:   +49.711.821-34865
 Fax: +49.711.821-32453

 Postal address:
 Alcatel-Lucent Deutschland AG
 Lorenzstrasse 10
 D-70435 STUTTGART

 Mail: stefanie.br...@alcatel-lucent.de



 Alcatel-Lucent Deutschland AG
 Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026
 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk
 Wulf (Vors.), Dr. Rainer Fechner

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 

hauke hoffmann service and electronic systems

Moristeig 60, D-23556 Lübeck

Telefon: +49 (0) 451 8896462
Fax: +49 (0) 451 8896461
Mobil: +49 (0) 170 7580491
E-Mail: off...@hauke-hoffmann.net
PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2008-11-20 Thread Avi Kivity

Randy Broman wrote:
After I submitted the initial question, I downloaded the latest kernel 
2.6.27.6, and compiled
with the following options, some of which are new since my previous 
kernel 2.6.24-21.


CONFIG_PARAVIRT_GUEST=y
CONFIG_XEN_SAVE_RESTORE=y
CONFIG_VMI=y
CONFIG_KVM_CLOCK=y
CONFIG_KVM_GUEST=y
# CONFIG_LGUEST_GUEST is not set
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_CLOCK=y

Using my existing kvm-62 and the following invocation:

$ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net 
nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all 
-localtime /home/rbroman/windows.img


and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't 
figure out how to invoke the
remaining options. One of the missing options seems to be the tap 
network, and the kvm-79 WinXP

guest now has no networking.


These options have not changed.  Do you get any error messages?

Oh, and don't use the kvm python script,  I'll remove it from the 
repository.




I also tried the -vga vmware option below, as well as -vga=vmware 
and various other permutations,

and I can't get that to work either.



What error message do you get?


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2008-11-17 Thread Brian Jackson
Don't use kvm in the tarball. It's not what you want. That's just a wrapper 
that calls qemu/kvm (possibly even the system one) after it mangles some 
command line options. Use qemu/x86_64-softmmu/qemu-system-x86_64 from the 
tarball if you aren't going to install it. Then you just use the same command 
line params as when you run the kvm that your distro installed.



On Sunday 16 November 2008 4:08:02 pm Randy Broman wrote:
 After I submitted the initial question, I downloaded the latest kernel
 2.6.27.6, and compiled
 with the following options, some of which are new since my previous
 kernel 2.6.24-21.

 CONFIG_PARAVIRT_GUEST=y
 CONFIG_XEN_SAVE_RESTORE=y
 CONFIG_VMI=y
 CONFIG_KVM_CLOCK=y
 CONFIG_KVM_GUEST=y
 # CONFIG_LGUEST_GUEST is not set
 CONFIG_PARAVIRT=y
 CONFIG_PARAVIRT_CLOCK=y

 Using my existing kvm-62 and the following invocation:

 $ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net
 nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all
 -localtime /home/rbroman/windows.img

 CPU usage went down and performance was much better (no skips), for my
 video/audio feeds.

 I then downloaded, compiled, installed kvm-79, and invoked using the
 following options

 $ aoss ~/kvm-79/kvm -m 1024 --cdrom /dev/cdrom --mac=00:d0:13:b0:2d:32
 --nictype=rtl8139 --smp=2 /home/rbroman/windows.img

 Note I'm using the new kvm in the compile directory, and I've confirmed
 that the kvm and kvm-intel
 modules from the kvm-79 compile are what's loaded. Some of the options
 from the kvm-62 invocation
 are missing - because they give errors - I understand that the command
 syntax/options have changed,
 and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't
 figure out how to invoke the
 remaining options. One of the missing options seems to be the tap
 network, and the kvm-79 WinXP
 guest now has no networking.

 I also tried the -vga vmware option below, as well as -vga=vmware and
 various other permutations,
 and I can't get that to work either.

 Can someone help me resolve the above? Are there any README's, HowTo's
 or other documentation
 on compiling, installing and using kvm-79?

 Thanks, Randy

 Avi Kivity wrote:
  Randy Broman wrote:
  -I've tried both the default Cirrus adapter and the -std-vga
  option. Which is better?
 
  Cirrus is generally better, but supports fewer resolutions.
 
  I saw reference to another VMware-based adapter, but I can't figure
  out how to implement
  it - would that be better?
 
  -vga vmware (with the new syntax needed by kvm-79); it should be
  better, but is less will tested.  I'm not at all sure the Windows
  driver will like it.
 
  -I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer
  version?
 
  Yes.
 
  Do I
  have to custom-compile my kernel to do so
 
  No.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2008-11-16 Thread Randy Broman
After I submitted the initial question, I downloaded the latest kernel 
2.6.27.6, and compiled
with the following options, some of which are new since my previous 
kernel 2.6.24-21.


CONFIG_PARAVIRT_GUEST=y
CONFIG_XEN_SAVE_RESTORE=y
CONFIG_VMI=y
CONFIG_KVM_CLOCK=y
CONFIG_KVM_GUEST=y
# CONFIG_LGUEST_GUEST is not set
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_CLOCK=y

Using my existing kvm-62 and the following invocation:

$ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net 
nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all 
-localtime /home/rbroman/windows.img


CPU usage went down and performance was much better (no skips), for my 
video/audio feeds.


I then downloaded, compiled, installed kvm-79, and invoked using the 
following options


$ aoss ~/kvm-79/kvm -m 1024 --cdrom /dev/cdrom --mac=00:d0:13:b0:2d:32 
--nictype=rtl8139 --smp=2 /home/rbroman/windows.img


Note I'm using the new kvm in the compile directory, and I've confirmed 
that the kvm and kvm-intel
modules from the kvm-79 compile are what's loaded. Some of the options 
from the kvm-62 invocation
are missing - because they give errors - I understand that the command 
syntax/options have changed,
and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't 
figure out how to invoke the
remaining options. One of the missing options seems to be the tap 
network, and the kvm-79 WinXP

guest now has no networking.

I also tried the -vga vmware option below, as well as -vga=vmware and 
various other permutations,

and I can't get that to work either.

Can someone help me resolve the above? Are there any README's, HowTo's 
or other documentation

on compiling, installing and using kvm-79?

Thanks, Randy

Avi Kivity wrote:

Randy Broman wrote:


-I've tried both the default Cirrus adapter and the -std-vga 
option. Which is better?


Cirrus is generally better, but supports fewer resolutions.

I saw reference to another VMware-based adapter, but I can't figure 
out how to implement

it - would that be better?



-vga vmware (with the new syntax needed by kvm-79); it should be 
better, but is less will tested.  I'm not at all sure the Windows 
driver will like it.


-I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer 
version? 


Yes.


Do I
have to custom-compile my kernel to do so


No.



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM performance

2008-11-14 Thread David S. Ahern
See if boosting the priority of the VM (see man chrt), and locking it to
a core (see man taskset) helps. You'll want to do that for the vcpu
thread(s) (in the qmeu monitor, run 'info cpus' command).

david


Randy Broman wrote:
 I am using Intel Core2 Duo E6600, Kubuntu 8.04 with kernel
 2.6.24-21-generic,
 kvm (as in QEMU PC emulator version 0.9.1 (kvm-62)) and a WinXP SP3
 guest,
 with bridged networking. My start command is:
 
 sudo kvm -m 1024 -cdrom /dev/cdrom -boot c -net
 nic,macaddr=00:d0:13:b0:2d:32,
 model=rtl8139 -net tap -soundhw all -localtime /home/rbroman/windows.img
 
 All this is stable and generally works well, except that internet-based
 video and
 audio performance is poor (choppy, skips) in comparison with performance
 under
 WinXP running native on the same machine (it's a dual-boot). I would
 appreciate
 recommendations to improve video and audio performance, and have the
 following
 specific questions:
 
 -I've tried both the default Cirrus adapter and the -std-vga option.
 Which is better?
 I saw reference to another VMware-based adapter, but I can't figure out
 how to implement
 it - would that be better?
 
 -I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer
 version? Do I
 have to custom-compile my kernel to do so, and if so what kernel version
 and what
 specific kernel options should I use?
 
 -Are there other tuning steps I could take?
 
 Please copy me directly as I'm not on this list. Thankyou
 
 
 
 
 -- 
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html