Re: KVM performance Java server/MySQL...
quote who=David Cruz Other optimizations people are testing out there. - use nohz=off in the kernel loading line y menu.lst - Disable Cgroups completely. Using cgclear, and turning off cgred cg-config daemons. And from a Personal point of view, we've always tried to use MySQL in a different server from JBoss. 99% of the times is far better for performance and tuning. I am still having problems. Running mysql and JBoss on different VMs is significantly slower, but that appears to be a pure application issue relating to the network overhead in a huge amount of queries. This is of course not an issue for this mailing list. However, I got a performance degradation of running the same test with MySQL and JBoss colocated and increasing the CPU count from 4 to 8. Then I did various tricks to improve performance. Pinning the processes did not have a significant effect. What appeared to have an effect was to remove memory ballooning. But now I get inconsistent results, varying from 21 seconds to 27 seconds. The 21 figure is acceptable but the 27 figure is not. The tests are being done on new hardware (see previous posts), with basically the only thing running the virtual machine. Summary of configuration: - host and guest: centos 6.3, transparent hugepages - vm: cpu mode is host-passthrough, pinning to the cores of one processor, removed tablet, sound, and USB devices - host: hyperthreading switched off in BIOS Do you have any idea what this could be? I expect it is somehow NUMA related, but how would I troubleshoot this? Ideally I would like to make sure the entire VM runs on one CPU and allocates memory from that CPU and never moves (or both CPU and memory move together). I saw some presentations on the internet about NUMA work being done for linux. Do you have any suggestions? My domain.xml is given below: domain type='kvm' id='37' namemaster-data05-v50/name uuid79ddd84d-937e-357b-8e57-c7f487dc3464/uuid memory unit='KiB'8388608/memory currentMemory unit='KiB'8388608/currentMemory vcpu placement='static'8/vcpu cputune vcpupin vcpu='0' cpuset='0'/ vcpupin vcpu='1' cpuset='2'/ vcpupin vcpu='2' cpuset='4'/ vcpupin vcpu='3' cpuset='6'/ vcpupin vcpu='4' cpuset='8'/ vcpupin vcpu='5' cpuset='10'/ vcpupin vcpu='6' cpuset='12'/ vcpupin vcpu='7' cpuset='14'/ /cputune os type arch='x86_64' machine='rhel6.3.0'hvm/type boot dev='cdrom'/ boot dev='hd'/ /os features acpi/ apic/ /features cpu mode='host-passthrough' /cpu clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/libexec/qemu-kvm/emulator disk type='file' device='cdrom' driver name='qemu' type='raw'/ target dev='hdc' bus='ide'/ readonly/ alias name='ide0-1-0'/ address type='drive' controller='0' bus='1' target='0' unit='0'/ /disk disk type='block' device='disk' driver name='qemu' type='raw' cache='none' io='native'/ source dev='/dev/raid5/v50disk1'/ target dev='vda' bus='virtio'/ alias name='virtio-disk0'/ address type='pci' domain='0x' bus='0x00' slot='0x05' function='0x0'/ /disk disk type='block' device='disk' driver name='qemu' type='raw' cache='none' io='native'/ source dev='/dev/vg_system/v50disk2'/ target dev='vdb' bus='virtio'/ alias name='virtio-disk1'/ address type='pci' domain='0x' bus='0x00' slot='0x08' function='0x0'/ /disk disk type='block' device='disk' driver name='qemu' type='raw' cache='none' io='native'/ source dev='/dev/raid5/v50disk3'/ target dev='vdc' bus='virtio'/ alias name='virtio-disk2'/ address type='pci' domain='0x' bus='0x00' slot='0x09' function='0x0'/ /disk disk type='file' device='disk' driver name='qemu' type='raw' cache='none'/ source file='/var/mydata/images/configdisks/v50/configdisk.img'/ target dev='vdz' bus='virtio'/ alias name='virtio-disk25'/ address type='pci' domain='0x' bus='0x00' slot='0x07' function='0x0'/ /disk controller type='usb' index='0' alias name='usb0'/ address type='pci' domain='0x' bus='0x00' slot='0x01' function='0x2'/ /controller controller type='ide' index='0' alias name='ide0'/ address type='pci' domain='0x' bus='0x00' slot='0x01' function='0x1'/ /controller interface type='bridge' mac address='52:54:00:00:01:50'/ source bridge='br0'/ target dev='vnet0'/ model type='virtio'/ alias name='net0'/ address type='pci' domain='0x' bus='0x00' slot='0x03' function='0x0'/ /interface serial type='pty' source path='/dev/pts/1'/ target port='0'/ alias name='serial0'/ /serial console type='pty' tty='/dev/pts/1' source path='/dev/pts/1'/ target type='serial' port='0'/ alias name='serial0'/ /console
Re: KVM performance Java server/MySQL...
quote who=Erik Brakkee quote who=David Cruz Other optimizations people are testing out there. - use nohz=off in the kernel loading line y menu.lst - Disable Cgroups completely. Using cgclear, and turning off cgred cg-config daemons. I also tried this option but it did not have a significant effect and degraded performance a bit, although it is difficult to tell because of the variations in the measured times. On a positive note, I have managed to get a significant performance improvement by checking the BIOS. When we got the server, it was configured to use the Performance Per Watt profile. All current tests were run with CPU scaling enabled (and also recognized by the linux kernel). What I did to get increased performance was to enable the 'Performance' profile instead and let the hardware do the CPU scaling (this is what I understand that the option means). Now the measured times have dropped from a consistent 19 seconds on the physical host to a consistent 10 seconds on the physical host. On the virtual host, the variable times from 21-27 seconds have now turned into consistent times of approximately 13.5 seconds. This is with the nohz=off kernel option. With the nohz=off option removed, I get consistent times that are mostly a bit under 13 seconds (12.8 upwards). These results are for a VM with 4 virtual CPUs that are also pinned to cores of the first CPU. From the looks of it, the kernel we are using does not fully understand the CPU it is dealing with. Also, I am doubtful that we ever saw the actual CPU frequences when doing 'cat /proc/cpuinfo'. Also, positively, increasing the VCPU count from 4 to 8 and also not pinning them does not deteriorate performance. In fact, it looks like it increases performance even a bit, leading to an overhead of 20% approximately. All in all, the performance results are acceptable now looking at the absolute figures. In fact, I am quite happy with this because it means we can continue to take this setup in production. However, there still appears to be a 20% penalty for virtualization in our specific case. Do you have any ideas on how to get to the bottom of this? Being able to squeeze the last 20% out of it would really make us even more happy (my day is already made with this BIOS setting). Any ideas? However, I got a performance degradation of running the same test with MySQL and JBoss colocated and increasing the CPU count from 4 to 8. Then I did various tricks to improve performance. Pinning the processes did not have a significant effect. What appeared to have an effect was to remove memory ballooning. But now I get inconsistent results, varying from 21 seconds to 27 seconds. The 21 figure is acceptable but the 27 figure is not. The tests are being done on new hardware (see previous posts), with basically the only thing running the virtual machine. Summary of configuration: - host and guest: centos 6.3, transparent hugepages - vm: cpu mode is host-passthrough, pinning to the cores of one processor, removed tablet, sound, and USB devices - host: hyperthreading switched off in BIOS Do you have any idea what this could be? I expect it is somehow NUMA related, but how would I troubleshoot this? Ideally I would like to make sure the entire VM runs on one CPU and allocates memory from that CPU and never moves (or both CPU and memory move together). I saw some presentations on the internet about NUMA work being done for linux. Do you have any suggestions? My domain.xml is given below: domain type='kvm' id='37' namemaster-data05-v50/name uuid79ddd84d-937e-357b-8e57-c7f487dc3464/uuid memory unit='KiB'8388608/memory currentMemory unit='KiB'8388608/currentMemory vcpu placement='static'8/vcpu cputune vcpupin vcpu='0' cpuset='0'/ vcpupin vcpu='1' cpuset='2'/ vcpupin vcpu='2' cpuset='4'/ vcpupin vcpu='3' cpuset='6'/ vcpupin vcpu='4' cpuset='8'/ vcpupin vcpu='5' cpuset='10'/ vcpupin vcpu='6' cpuset='12'/ vcpupin vcpu='7' cpuset='14'/ /cputune os type arch='x86_64' machine='rhel6.3.0'hvm/type boot dev='cdrom'/ boot dev='hd'/ /os features acpi/ apic/ /features cpu mode='host-passthrough' /cpu clock offset='utc'/ on_poweroffdestroy/on_poweroff on_rebootrestart/on_reboot on_crashrestart/on_crash devices emulator/usr/libexec/qemu-kvm/emulator disk type='file' device='cdrom' driver name='qemu' type='raw'/ target dev='hdc' bus='ide'/ readonly/ alias name='ide0-1-0'/ address type='drive' controller='0' bus='1' target='0' unit='0'/ /disk disk type='block' device='disk' driver name='qemu' type='raw' cache='none' io='native'/ source dev='/dev/raid5/v50disk1'/ target dev='vda' bus='virtio'/ alias name='virtio-disk0'/ address type='pci' domain='0x' bus='0x00' slot='0x05' function='0x0'/ /disk disk type='block' device='disk'
Re: KVM performance Java server/MySQL...
quote who=Gleb Natapov On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote: Hi, We have been benchmarking a java server application (java 6 update 29) that requires a mysql database. The scenario is quite simple. We open a web page which displays a lot of search results. To get the content of the page one big query is done with many smaller queries to retrieve the data. The test from the java side is single threaded. We have used the following deployment scenarios: 1. JBoss in VM, MySql in separate VM 2. JBoss in VM, MySQL native 3. JBoss native, MySQL in vm. 4. JBoss native and MySQL native on the same physical machine 5. JBoss and MySQL virtualized on the same VM. What we see is that the performance (time to execute) is practically the same for all scenarios (approx. 30 seconds), except for scenario 4 that takes approx. 21 seconds. This difference is quite large and contrasts many other test on the internet and other benchmarks we did previously. We have tried pinning the VMs, turning hyperthreading off, varying the CPU model (including host-passthrough), but this did not have any significant impact. The hardware on which we are running is a dual socket E5-2650 machine with 64 GB memory. The server is a Dell poweredge R720 server with SAS disks, RAID controller with battery backup (writeback cache). Transparent huge pages is turned on. We are at a loss to explain the differences in the test. In particular, we would have expected the least performance when both were running virtualized and we would have expected a better performance when JBoss and MySQL were running virtualized in the same VM as compared to JBoss and MySQL both running in different virtual machines. It looks like we are dealing with multiple issues here and not just one. Right now we have a 30% penalty for running virtualized which is too much for us; 10% would be allright. What would you suggest to do to troubleshoot this further? What is you kernel/qemu versions and command line you are using to start a VM? Centos 6.3, 2.6.32-279.22.1.el6.x86_64 rpm -qf /usr/libexec/qemu-kvm qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64 The guest is also running centos 6.3 with the same settings. Settings that can influence Java performance (such as transparent huge pages) are turned on on both the host and guest (see the remark on hugepages below). The command-line is as follows: /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 This virtual machine has three virtio disks, and one file based disk. The last disk is about 100MB in size and is used only during startup (contains configurationd data for initializing the vm) and is only read and never written. It has one CDrom which is not used. It also uses old-style hugepages. Apparently this did not have any significant effect on performance over transparent hugepages (as would be expected). We configured these old style hugepages just to rule out any issue with transparent hugepages. Initially we got 30% performance penalty with 16 processors, but in the current setting of using 4 processors we see a reduced performance penalty of 15-20%. Also on the physical host, we are not running the numad daemon at the moment. Also, we tried disabling hyperthreading in the host's BIOS but the measurements do not change significantly. The
Re: KVM performance Java server/MySQL...
quote who=Erik Brakkee The IO scheduler on the host and on the guest is CFS. We also tried with deadline scheduler on the host but this did not make any measurable difference. We did not try no-op on the host. I mean of course that we did not try no-op on the guest (not on the host). -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance Java server/MySQL...
quote who=Erik Brakkee quote who=Gleb Natapov On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote: Hi, We have been benchmarking a java server application (java 6 update 29) that requires a mysql database. The scenario is quite simple. We open a web page which displays a lot of search results. To get the content of the page one big query is done with many smaller queries to retrieve the data. The test from the java side is single threaded. We have used the following deployment scenarios: 1. JBoss in VM, MySql in separate VM 2. JBoss in VM, MySQL native 3. JBoss native, MySQL in vm. 4. JBoss native and MySQL native on the same physical machine 5. JBoss and MySQL virtualized on the same VM. What we see is that the performance (time to execute) is practically the same for all scenarios (approx. 30 seconds), except for scenario 4 that takes approx. 21 seconds. This difference is quite large and contrasts many other test on the internet and other benchmarks we did previously. We have tried pinning the VMs, turning hyperthreading off, varying the CPU model (including host-passthrough), but this did not have any significant impact. The hardware on which we are running is a dual socket E5-2650 machine with 64 GB memory. The server is a Dell poweredge R720 server with SAS disks, RAID controller with battery backup (writeback cache). Transparent huge pages is turned on. We are at a loss to explain the differences in the test. In particular, we would have expected the least performance when both were running virtualized and we would have expected a better performance when JBoss and MySQL were running virtualized in the same VM as compared to JBoss and MySQL both running in different virtual machines. It looks like we are dealing with multiple issues here and not just one. Right now we have a 30% penalty for running virtualized which is too much for us; 10% would be allright. What would you suggest to do to troubleshoot this further? What is you kernel/qemu versions and command line you are using to start a VM? Centos 6.3, 2.6.32-279.22.1.el6.x86_64 rpm -qf /usr/libexec/qemu-kvm qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64 The guest is also running centos 6.3 with the same settings. Settings that can influence Java performance (such as transparent huge pages) are turned on on both the host and guest (see the remark on hugepages below). The command-line is as follows: /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 This virtual machine has three virtio disks, and one file based disk. The last disk is about 100MB in size and is used only during startup (contains configurationd data for initializing the vm) and is only read and never written. It has one CDrom which is not used. It also uses old-style hugepages. Apparently this did not have any significant effect on performance over transparent hugepages (as would be expected). We configured these old style hugepages just to rule out any issue with transparent hugepages. Initially we got 30% performance penalty with 16 processors, but in the current setting of using 4 processors we see a reduced performance penalty of 15-20%. Also on the physical host, we are not running the numad daemon at the moment. Also, we tried disabling hyperthreading
Re: KVM performance Java server/MySQL...
Other optimizations people are testing out there. - use nohz=off in the kernel loading line y menu.lst - Disable Cgroups completely. Using cgclear, and turning off cgred cg-config daemons. And from a Personal point of view, we've always tried to use MySQL in a different server from JBoss. 99% of the times is far better for performance and tuning. David 2013/2/8 Erik Brakkee e...@brakkee.org: quote who=Erik Brakkee quote who=Gleb Natapov On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote: Hi, We have been benchmarking a java server application (java 6 update 29) that requires a mysql database. The scenario is quite simple. We open a web page which displays a lot of search results. To get the content of the page one big query is done with many smaller queries to retrieve the data. The test from the java side is single threaded. We have used the following deployment scenarios: 1. JBoss in VM, MySql in separate VM 2. JBoss in VM, MySQL native 3. JBoss native, MySQL in vm. 4. JBoss native and MySQL native on the same physical machine 5. JBoss and MySQL virtualized on the same VM. What we see is that the performance (time to execute) is practically the same for all scenarios (approx. 30 seconds), except for scenario 4 that takes approx. 21 seconds. This difference is quite large and contrasts many other test on the internet and other benchmarks we did previously. We have tried pinning the VMs, turning hyperthreading off, varying the CPU model (including host-passthrough), but this did not have any significant impact. The hardware on which we are running is a dual socket E5-2650 machine with 64 GB memory. The server is a Dell poweredge R720 server with SAS disks, RAID controller with battery backup (writeback cache). Transparent huge pages is turned on. We are at a loss to explain the differences in the test. In particular, we would have expected the least performance when both were running virtualized and we would have expected a better performance when JBoss and MySQL were running virtualized in the same VM as compared to JBoss and MySQL both running in different virtual machines. It looks like we are dealing with multiple issues here and not just one. Right now we have a 30% penalty for running virtualized which is too much for us; 10% would be allright. What would you suggest to do to troubleshoot this further? What is you kernel/qemu versions and command line you are using to start a VM? Centos 6.3, 2.6.32-279.22.1.el6.x86_64 rpm -qf /usr/libexec/qemu-kvm qemu-kvm-0.12.1.2-2.295.el6_3.10.x86_64 The guest is also running centos 6.3 with the same settings. Settings that can influence Java performance (such as transparent huge pages) are turned on on both the host and guest (see the remark on hugepages below). The command-line is as follows: /usr/libexec/qemu-kvm -S -M rhel6.3.0 -enable-kvm -m 8192 -mem-prealloc -mem-path /hugepages/libvirt/qemu -smp 4,sockets=4,cores=1,threads=1 -name master-data05-v50 -uuid 79ddd84d-937e-357b-8e57-c7f487dc3464 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/master-data05-v50.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=/dev/raid5/v50disk1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/dev/vg_raid1/v50disk2,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/dev/raid5/v50disk3,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/var/data/images/configdisks/v50/configdisk.img,if=none,id=drive-virtio-disk25,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk25,id=virtio-disk25 -netdev tap,fd=21,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:01:50,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 This virtual machine has three virtio disks, and one file based disk. The last disk is about 100MB in size and is used only during startup (contains configurationd data for initializing the vm) and is only read and never written. It has one CDrom which is not used. It also uses old-style hugepages. Apparently this did not have any significant effect on performance over transparent hugepages
Re: KVM performance Java server/MySQL...
On Thu, Feb 07, 2013 at 04:41:31PM +0100, Erik Brakkee wrote: Hi, We have been benchmarking a java server application (java 6 update 29) that requires a mysql database. The scenario is quite simple. We open a web page which displays a lot of search results. To get the content of the page one big query is done with many smaller queries to retrieve the data. The test from the java side is single threaded. We have used the following deployment scenarios: 1. JBoss in VM, MySql in separate VM 2. JBoss in VM, MySQL native 3. JBoss native, MySQL in vm. 4. JBoss native and MySQL native on the same physical machine 5. JBoss and MySQL virtualized on the same VM. What we see is that the performance (time to execute) is practically the same for all scenarios (approx. 30 seconds), except for scenario 4 that takes approx. 21 seconds. This difference is quite large and contrasts many other test on the internet and other benchmarks we did previously. We have tried pinning the VMs, turning hyperthreading off, varying the CPU model (including host-passthrough), but this did not have any significant impact. The hardware on which we are running is a dual socket E5-2650 machine with 64 GB memory. The server is a Dell poweredge R720 server with SAS disks, RAID controller with battery backup (writeback cache). Transparent huge pages is turned on. We are at a loss to explain the differences in the test. In particular, we would have expected the least performance when both were running virtualized and we would have expected a better performance when JBoss and MySQL were running virtualized in the same VM as compared to JBoss and MySQL both running in different virtual machines. It looks like we are dealing with multiple issues here and not just one. Right now we have a 30% penalty for running virtualized which is too much for us; 10% would be allright. What would you suggest to do to troubleshoot this further? What is you kernel/qemu versions and command line you are using to start a VM? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. What's the typical I/O load (disk and network bandwidth) while the tests are running? The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Test Results: The throughput is equal in these tests, as the clients throttle the work (this is assuming you don't run out of a resource on the host). What's telling is the CPU used to do the same amount of work: Xen: 52.85% KVM: 66.93% So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of work. Here's the breakdown: totalusernice system irq softirq guest 66.907.200.00 12.940.353.39 43.02 Comparing guest time to all other busy time, that's a 23.88/43.02 = 55% overhead for virtualization. I certainly don't expect it to be 0, but 55% seems a bit high. So, what's the reason for this overhead? At the bottom is oprofile output of top functions for KVM. Some observations: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). 3) vmx_[save|load]_host_state: I take it this is from guest switches? These are called when you context-switch from a guest, and, much more frequently, when you enter qemu. We have 180,000 context switches a second. Is this more than expected? Way more. Across 16 logical cpus, this is 10,000 cs/sec/cpu. I wonder if schedstats can show why we context switch (need to let someone else run, yielded, waiting on io, etc). Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec disk:read: 17 MB/sec write: 40 MB/sec The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. Test Results: The throughput is equal in these tests, as the clients throttle the work (this is assuming you don't run out of a resource on the host). What's telling is the CPU used to do the same amount of work: Xen: 52.85% KVM: 66.93% So, KVM requires 66.93/52.85 = 26.6% more CPU to do the same amount of work. Here's the breakdown: totalusernice system irq softirq guest 66.907.200.00 12.940.353.39 43.02 Comparing guest time to all other busy time, that's a 23.88/43.02 = 55% overhead for virtualization. I certainly don't expect it to be 0, but 55% seems a bit high. So, what's the reason for this overhead? At the bottom is oprofile output of top functions for KVM. Some observations: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). 3) vmx_[save|load]_host_state: I take it this is from guest switches? These are called when you context-switch from a guest, and, much more frequently, when you enter qemu. We have 180,000 context switches a second. Is this more than expected? Way more. Across 16 logical cpus, this is 10,000 cs/sec/cpu. I wonder if schedstats can show why we context switch (need to let someone else run, yielded, waiting on io, etc). Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Avi Kivity wrote: What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec MB as in Byte or Mb as in bit? disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. On the other hand, without SMT you will get to overcommit much faster, so you'll have scheduling artifacts. Unfortunately there's no good answer here (except to improve the SMT scheduler). Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. Really, I think linux-aio support can help here. Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? You're probably using an ancient version: $ kvm_stat --help Usage: kvm_stat [options] Options: -h, --helpshow this help message and exit -1, --once, --batch run in batch mode for one second -l, --log run in logging mode (like vmstat) -f FIELDS, --fields=FIELDS fields to display (regex) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Andrew Theurer wrote: Avi Kivity wrote: What's the typical I/O load (disk and network bandwidth) while the tests are running? This is average thrgoughput: network:Tx: 79 MB/sec Rx: 5 MB/sec MB as in Byte or Mb as in bit? Byte. There are 4 x 1 Gb adapters, each handling about 20 MB/sec or 160 Mbit/sec. disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). I don't have the request sizes at my fingertips, but we have to use a lot of disks to support this I/O, so I think it's safe to assume there are a lot more requests than a simple large sequential read/write. The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret CPU time measurements with SMT can vary wildly if the system is not fully loaded. If the scheduler happens to schedule two threads on a single core, both of these threads will generate less work compared to if they were scheduled on different cores. Understood. Even if at low loads, the scheduler does the right thing and spreads out to all the cores first, once it goes beyond 50% util, the CPU util can climb at a much higher rate (compared to a linear increase in work) because it then starts scheduling 2 threads per core, and each thread can do less work. I have always wanted something which could more accurately show the utilization of a processor core, but I guess we have to use what we have today. I will run again with SMT off to see what we get. On the other hand, without SMT you will get to overcommit much faster, so you'll have scheduling artifacts. Unfortunately there's no good answer here (except to improve the SMT scheduler). Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. I have a older patch which makes a small change to posix_aio_thread.c by trying to keep the thread pool size a bit lower than it is today. I will dust that off and see if it helps. Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? -Andrew Yes, there is a scheduler tracer, though I have no idea how to operate it. Do you have kvm_stat logs? Sorry, I don't, but I'll run that next time. BTW, I did not notice a batch/log mode the last time I ram kvm_stat. Or maybe it was not obvious to me. Is there an ideal way to run kvm_stat without a curses like output? You're probably using an ancient version: $ kvm_stat --help Usage: kvm_stat [options] Options: -h, --helpshow this help message and exit -1, --once, --batch run in batch mode for one second -l, --log run in logging mode (like vmstat) -f FIELDS, --fields=FIELDS fields to display (regex) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? For cache=off, with some file systems, yes. But not for cache=writethrough/writeback. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: disk:read: 17 MB/sec write: 40 MB/sec This could definitely cause the extra load, especially if it's many small requests (compared to a few large ones). I don't have the request sizes at my fingertips, but we have to use a lot of disks to support this I/O, so I think it's safe to assume there are a lot more requests than a simple large sequential read/write. Yes. Well the high context switch rate is the scheduler's way of telling us to use linux-aio. If lot's of disks == 100, with a 3ms seek time, that's already 60,000 cs/sec. Really, I think linux-aio support can help here. Yes, I think that would work for real block devices, but would that help for files? I am using real block devices right now, but it would be nice to also see a benefit for files in a file-system. Or maybe I am mis-understanding this, and linux-aio can be used on files? It could work with files with cache=none (though not qcow2 as now written). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). There's still the kernel/user copy, so we have two copies on rx, one on tx. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? Was that before or after the entire path was made copyless? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? If I recall correctly, it was 2.4% and relative. But with 2.3% in scheduler functions, that's what I expected. Was that before or after the entire path was made copyless? If this is referring to the preadv/writev support, no, I have not tested with that. -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: 2) cpu_physical_memory_rw due to not using preadv/pwritev? I think both virtio-net and virtio-blk use memcpy(). With latest linux-2.6, and a development snapshot of glibc, virtio-blk will not use memcpy() anymore but virtio-net still does on the receive path (but not transmit). There's still the kernel/user copy, so we have two copies on rx, one on tx. That won't show up as cpu_physical_memory_rw. stl_phys/ldl_phys are suspect though as they degrade to cpu_physical_memory_rw. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? Was that before or after the entire path was made copyless? Before so it's worth updating and trying again. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Andrew Theurer wrote: Avi Kivity wrote: Anthony Liguori wrote: Avi Kivity wrote: 1) I'm seeing about 2.3% in scheduler functions [that I recognize]. Does that seems a bit excessive? Yes, it is. If there is a lot of I/O, this might be due to the thread pool used for I/O. This is why I wrote the linux-aio patch. It only reduced CPU consumption by about 2% although I'm not sure if that's absolute or relative. Andrew? If I recall correctly, it was 2.4% and relative. But with 2.3% in scheduler functions, that's what I expected. Was that before or after the entire path was made copyless? If this is referring to the preadv/writev support, no, I have not tested with that. Previously, the block API only exposed non-vector interfaces and bounced vectored operations to a linear buffer. That's been eliminated now though so we need to update the linux-aio patch to implement a vectored backend interface. However, it is an apples to apples comparison in terms of copying since the same is true with the thread pool. My take away was that the thread pool overhead isn't the major source of issues. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Anthony Liguori wrote: Previously, the block API only exposed non-vector interfaces and bounced vectored operations to a linear buffer. That's been eliminated now though so we need to update the linux-aio patch to implement a vectored backend interface. However, it is an apples to apples comparison in terms of copying since the same is true with the thread pool. My take away was that the thread pool overhead isn't the major source of issues. If the overhead is dominated by copying, then you won't see the difference. Once the copying is eliminated, the comparison may yield different results. We should certainly see a difference in context switches. One cause of context switches won't be eliminated - the non-saturating workload causes us to switch to the idle thread, which incurs a heavyweight exit. This doesn't matter since we're idle anyway, but when we switch back, we incur a heavyweight entry. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Here are the SMT off results. This workload is designed to not over-saturate the CPU, so you have to pick a number of server sets to ensure that. With SMT on, 4 sets was enough for KVM, but 5 was too much (start seeing response time errors). For SMT off, I tried to size the load as high as we can go without running into these errors. For KVM, thats 3 (18 guests) and for Xen, that's 4 (24 guests). The throughout has a fairly linear relationship to the number of server sets used, but has a bit of wiggle room (mostly affected by response times getting longer and longer, but not exceeding the requirement set forth). Anyway, the relative throughput for these are 1.0 for KVM and 1.34 for Xen. The CPU is 78.71% for KVM the CPU is 87.83%. If we normalize to CPU utilization, Xen is doing 20% more throughput. Avi Kivity wrote: Anthony Liguori wrote: Previously, the block API only exposed non-vector interfaces and bounced vectored operations to a linear buffer. That's been eliminated now though so we need to update the linux-aio patch to implement a vectored backend interface. However, it is an apples to apples comparison in terms of copying since the same is true with the thread pool. My take away was that the thread pool overhead isn't the major source of issues. If the overhead is dominated by copying, then you won't see the difference. Once the copying is eliminated, the comparison may yield different results. We should certainly see a difference in context switches. I would like to test this the proper way. What do I need to do to ensure these copies are eliminated? I am on a 2.6.27 kernel, am I missing anything there? Anthony, would you be willing to provide a patch to support the changes in the block API? One cause of context switches won't be eliminated - the non-saturating workload causes us to switch to the idle thread, which incurs a heavyweight exit. This doesn't matter since we're idle anyway, but when we switch back, we incur a heavyweight entry. I have not looked at the schedstat or ftrace yet, but will soon. Maybe it will tell us a little more about the context switches. Here's a sample of the kvm_stat: efer_relo exits fpu_reloa halt_exit halt_wake host_stat hypercall insn_emul insn_emul invlpg io_exits irq_exits irq_injec irq_windo kvm_reque largepage mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc mmu_shado mmu_unsyn mmu_unsyn nmi_injec nmi_windo pf_fixed pf_guest remote_tl request_n signal_ex tlb_flush 0 233866 53994 20353 16209 119812 0 48879 0 0 75666 44917 34772 3984 0187 0 10 0 0 0 0 0 0 0 0 0 0202 0 0 0 0 17698 0 244556 67321 15570 12364 116226 0 49865 0 0 69357 56131 32860 4449 0 -1895 0 19 0 0 0 0 21 21 0 0 0 0 1117 0 0 0 0 21586 0 230788 71382 10619 7920 109151 0 44354 0 0 62561 60074 28322 4841 0103 0 13 0 0 0 0 0 0 0 0 0 0122 0 0 0 0 22702 0 275259 82605 14326 11148 127293 0 53738 0 0 73438 70707 34724 5373 0859 0 15 0 0 0 0 21 21 0 0 0 0874 0 0 0 0 26723 0 250576 58760 20368 16476 128296 0 50936 0 0 80439 51219 36329 4621 0 -1170 0 8 0 0 0 0 22 22 0 0 0 0 1333 0 0 0 0 18508 0 244746 59650 19480 15657 122721 0 49882 0 0 76011 50453 35352 4523 0201 0 11 0 0 0 0 21 21 0 0 0 0212 0 0 0 0 19163 0 251724 71715 14049 10920 117255 0 49924 0 0 70173 58040 32328 5058
RE: KVM performance vs. Xen
On 4/29/2009 7:41:50 AM, Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret The host software: Both Xen and KVM use the same host Linux OS, SLES11. KVM uses the 2.6.27.19-5-default kernel and Xen uses the 2.6.27.19-5-xen kernel. I have tried 2.6.29 for KVM, but results are actually worse. KVM modules are rebuilt with kvm-85. Qemu is also from kvm-85. Xen version is 3.3.1_18546_12-3.1. The guest software: All guests are RedHat 5.3. The same disk images are used but different kernels. Xen uses the RedHat Xen kernel and KVM uses 2.6.29 with all paravirt build options enabled. Both use PV I/O drivers. Software used: Apache, PHP, Java, Glassfish, Postgresql, and Dovecot. Just for clarification. So are you using PV (Xen) Linux on Xen, not HVM? Is that 32-bit or 64-bit? . Jun Nakajima | Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance vs. Xen
Nakajima, Jun wrote: On 4/29/2009 7:41:50 AM, Andrew Theurer wrote: I wanted to share some performance data for KVM and Xen. I thought it would be interesting to share some performance results especially compared to Xen, using a more complex situation like heterogeneous server consolidation. The Workload: The workload is one that simulates a consolidation of servers on to a single host. There are 3 server types: web, imap, and app (j2ee). In addition, there are other helper servers which are also consolidated: a db server, which helps out with the app server, and an nfs server, which helps out with the web server (a portion of the docroot is nfs mounted). There is also one other server that is simply idle. All 6 servers make up one set. The first 3 server types are sent requests, which in turn may send requests to the db and nfs helper servers. The request rate is throttled to produce a fixed amount of work. In order to increase utilization on the host, more sets of these servers are used. The clients which send requests also have a response time requirement which is monitored. The following results have passed the response time requirements. The host hardware: A 2 socket, 8 core Nehalem with SMT, and EPT enabled, lots of disks, 4 x 1 GB Ethenret The host software: Both Xen and KVM use the same host Linux OS, SLES11. KVM uses the 2.6.27.19-5-default kernel and Xen uses the 2.6.27.19-5-xen kernel. I have tried 2.6.29 for KVM, but results are actually worse. KVM modules are rebuilt with kvm-85. Qemu is also from kvm-85. Xen version is 3.3.1_18546_12-3.1. The guest software: All guests are RedHat 5.3. The same disk images are used but different kernels. Xen uses the RedHat Xen kernel and KVM uses 2.6.29 with all paravirt build options enabled. Both use PV I/O drivers. Software used: Apache, PHP, Java, Glassfish, Postgresql, and Dovecot. Just for clarification. So are you using PV (Xen) Linux on Xen, not HVM? Is that 32-bit or 64-bit? PV, 64-bit. -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
On Friday 03 April 2009 13:32:50 you wrote: Hallo, as I want to switch from XEN to KVM I've made some performance tests to see if KVM is as peformant as XEN. But tests with a VMU that receives a streamed video, adds a small logo to the video and streams it to a client have shown that XEN performs much betten than KVM. In XEN the vlc (videolan client used to receive, process and send the video) process within the vmu has a cpuload of 33,8 % whereas in KVM the vlc process has a cpuload of 99.9 %. I'am not sure why, does anybody now some settings to improve the KVM performance? Thank you. Regards, Stefanie. Used hardware and settings: In the tests I've used the same host hardware for XEN and KVM: - Dual Core AMD 2.2 GHz, 8 GB RAM - Tested OSes for KVM Host: Fedora 10, 2.6.27.5-117.fc10.x86_64 with kvm version 10.fc10 version 74 also tested in january: compiled kernel with kvm-83 - KVM Guest settings: OS: Fedora 9 2.6.25-14.fc9.x86_64 (i386 also tested) RAM: 256 MB (same for XEN vmu) CPU: 1 Core with 2,2 GHz (same for XEN vmu) tested nic models: rtl8139, e1000, virtio Tested Scenario: VMU receives a streamed video , adds a logo (watermark) to the video stream and then streams it to a client Results: XEN: Host cpu load (virt-manager): 23% VMU cpu load (virt-manager): 18 % VLC process within VMU (top): 33,8% KVM: no virt-manager cpu load as I started the vmu with the kvm command Host cpu load : 52% qemu-kvm process (top)77-100% VLC process within vmu (top): 80 - 99,9% KVM command to start vmu /usr/bin/qemu-kvm -boot c -hda /images/vmu01.raw -m 256 -net nic,vlan=0,macaddr=aa:bb:cc:dd:ee:10,model=virtio -net tap,ifname=tap0,vlan=0,script=/etc/kvm/qemu-ifup,downscript=/etc/kvm/qem u-ifdown -vnc 127.0.0.1:1 -k de --daemonize Hi Stefanie, does vlc perform operations on disc (eg caching, logging, ...)? When it cache you can use virtio also for the disk. Just change -hda /images/vmu01.raw to -drive file=/images/vmu01.raw,if=virtio,boot=on Regards Hauke Alcatel-Lucent Deutschland AG Bell Labs Germany Service Infrastructure, ZFZ-SI Stefanie Braun Phone: +49.711.821-34865 Fax: +49.711.821-32453 Postal address: Alcatel-Lucent Deutschland AG Lorenzstrasse 10 D-70435 STUTTGART Mail: stefanie.br...@alcatel-lucent.de Alcatel-Lucent Deutschland AG Sitz der Gesellschaft: Stuttgart - Amtsgericht Stuttgart HRB 4026 Vorsitzender des Aufsichtsrats: Michael Oppenhoff Vorstand: Alf Henryk Wulf (Vors.), Dr. Rainer Fechner -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- hauke hoffmann service and electronic systems Moristeig 60, D-23556 Lübeck Telefon: +49 (0) 451 8896462 Fax: +49 (0) 451 8896461 Mobil: +49 (0) 170 7580491 E-Mail: off...@hauke-hoffmann.net PGP public key: www.hauke-hoffmann.net/static/pgp/kontakt.asc -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
Randy Broman wrote: After I submitted the initial question, I downloaded the latest kernel 2.6.27.6, and compiled with the following options, some of which are new since my previous kernel 2.6.24-21. CONFIG_PARAVIRT_GUEST=y CONFIG_XEN_SAVE_RESTORE=y CONFIG_VMI=y CONFIG_KVM_CLOCK=y CONFIG_KVM_GUEST=y # CONFIG_LGUEST_GUEST is not set CONFIG_PARAVIRT=y CONFIG_PARAVIRT_CLOCK=y Using my existing kvm-62 and the following invocation: $ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all -localtime /home/rbroman/windows.img and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't figure out how to invoke the remaining options. One of the missing options seems to be the tap network, and the kvm-79 WinXP guest now has no networking. These options have not changed. Do you get any error messages? Oh, and don't use the kvm python script, I'll remove it from the repository. I also tried the -vga vmware option below, as well as -vga=vmware and various other permutations, and I can't get that to work either. What error message do you get? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
Don't use kvm in the tarball. It's not what you want. That's just a wrapper that calls qemu/kvm (possibly even the system one) after it mangles some command line options. Use qemu/x86_64-softmmu/qemu-system-x86_64 from the tarball if you aren't going to install it. Then you just use the same command line params as when you run the kvm that your distro installed. On Sunday 16 November 2008 4:08:02 pm Randy Broman wrote: After I submitted the initial question, I downloaded the latest kernel 2.6.27.6, and compiled with the following options, some of which are new since my previous kernel 2.6.24-21. CONFIG_PARAVIRT_GUEST=y CONFIG_XEN_SAVE_RESTORE=y CONFIG_VMI=y CONFIG_KVM_CLOCK=y CONFIG_KVM_GUEST=y # CONFIG_LGUEST_GUEST is not set CONFIG_PARAVIRT=y CONFIG_PARAVIRT_CLOCK=y Using my existing kvm-62 and the following invocation: $ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all -localtime /home/rbroman/windows.img CPU usage went down and performance was much better (no skips), for my video/audio feeds. I then downloaded, compiled, installed kvm-79, and invoked using the following options $ aoss ~/kvm-79/kvm -m 1024 --cdrom /dev/cdrom --mac=00:d0:13:b0:2d:32 --nictype=rtl8139 --smp=2 /home/rbroman/windows.img Note I'm using the new kvm in the compile directory, and I've confirmed that the kvm and kvm-intel modules from the kvm-79 compile are what's loaded. Some of the options from the kvm-62 invocation are missing - because they give errors - I understand that the command syntax/options have changed, and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't figure out how to invoke the remaining options. One of the missing options seems to be the tap network, and the kvm-79 WinXP guest now has no networking. I also tried the -vga vmware option below, as well as -vga=vmware and various other permutations, and I can't get that to work either. Can someone help me resolve the above? Are there any README's, HowTo's or other documentation on compiling, installing and using kvm-79? Thanks, Randy Avi Kivity wrote: Randy Broman wrote: -I've tried both the default Cirrus adapter and the -std-vga option. Which is better? Cirrus is generally better, but supports fewer resolutions. I saw reference to another VMware-based adapter, but I can't figure out how to implement it - would that be better? -vga vmware (with the new syntax needed by kvm-79); it should be better, but is less will tested. I'm not at all sure the Windows driver will like it. -I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer version? Yes. Do I have to custom-compile my kernel to do so No. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
After I submitted the initial question, I downloaded the latest kernel 2.6.27.6, and compiled with the following options, some of which are new since my previous kernel 2.6.24-21. CONFIG_PARAVIRT_GUEST=y CONFIG_XEN_SAVE_RESTORE=y CONFIG_VMI=y CONFIG_KVM_CLOCK=y CONFIG_KVM_GUEST=y # CONFIG_LGUEST_GUEST is not set CONFIG_PARAVIRT=y CONFIG_PARAVIRT_CLOCK=y Using my existing kvm-62 and the following invocation: $ aoss kvm -m 1024 -cdrom /dev/cdrom -boot c -net nic,macaddr=00:d0:13:b0:2d:32,model=rtl8139 -net tap -soundhw all -localtime /home/rbroman/windows.img CPU usage went down and performance was much better (no skips), for my video/audio feeds. I then downloaded, compiled, installed kvm-79, and invoked using the following options $ aoss ~/kvm-79/kvm -m 1024 --cdrom /dev/cdrom --mac=00:d0:13:b0:2d:32 --nictype=rtl8139 --smp=2 /home/rbroman/windows.img Note I'm using the new kvm in the compile directory, and I've confirmed that the kvm and kvm-intel modules from the kvm-79 compile are what's loaded. Some of the options from the kvm-62 invocation are missing - because they give errors - I understand that the command syntax/options have changed, and I've checked ~/kvm-79/kvm --help for the new syntax, but I can't figure out how to invoke the remaining options. One of the missing options seems to be the tap network, and the kvm-79 WinXP guest now has no networking. I also tried the -vga vmware option below, as well as -vga=vmware and various other permutations, and I can't get that to work either. Can someone help me resolve the above? Are there any README's, HowTo's or other documentation on compiling, installing and using kvm-79? Thanks, Randy Avi Kivity wrote: Randy Broman wrote: -I've tried both the default Cirrus adapter and the -std-vga option. Which is better? Cirrus is generally better, but supports fewer resolutions. I saw reference to another VMware-based adapter, but I can't figure out how to implement it - would that be better? -vga vmware (with the new syntax needed by kvm-79); it should be better, but is less will tested. I'm not at all sure the Windows driver will like it. -I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer version? Yes. Do I have to custom-compile my kernel to do so No. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM performance
See if boosting the priority of the VM (see man chrt), and locking it to a core (see man taskset) helps. You'll want to do that for the vcpu thread(s) (in the qmeu monitor, run 'info cpus' command). david Randy Broman wrote: I am using Intel Core2 Duo E6600, Kubuntu 8.04 with kernel 2.6.24-21-generic, kvm (as in QEMU PC emulator version 0.9.1 (kvm-62)) and a WinXP SP3 guest, with bridged networking. My start command is: sudo kvm -m 1024 -cdrom /dev/cdrom -boot c -net nic,macaddr=00:d0:13:b0:2d:32, model=rtl8139 -net tap -soundhw all -localtime /home/rbroman/windows.img All this is stable and generally works well, except that internet-based video and audio performance is poor (choppy, skips) in comparison with performance under WinXP running native on the same machine (it's a dual-boot). I would appreciate recommendations to improve video and audio performance, and have the following specific questions: -I've tried both the default Cirrus adapter and the -std-vga option. Which is better? I saw reference to another VMware-based adapter, but I can't figure out how to implement it - would that be better? -I notice we're up to kvm-79 vs my kvm-62. Should I move to the newer version? Do I have to custom-compile my kernel to do so, and if so what kernel version and what specific kernel options should I use? -Are there other tuning steps I could take? Please copy me directly as I'm not on this list. Thankyou -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html