Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/06/2010 05:16 PM, Anthony Liguori wrote: On 01/06/2010 08:48 AM, Dor Laor wrote: On 01/06/2010 04:32 PM, Avi Kivity wrote: On 01/06/2010 04:22 PM, Michael S. Tsirkin wrote: We can probably default -enable-kvm to -cpu host, as long as we explain very carefully that if users wish to preserve cpu features across upgrades, they can't depend on the default. Hardware upgrades or software upgrades? Yes. I just want to remind all the the main motivation for using -cpu realModelThatWasOnceShiped is to provide correct cpu emulation for the guest. Using a random qemu|kvm64+flag1-flag2 might really cause trouble for the guest OS or guest apps. On top of -cpu nehalem we can always add fancy features like x2apic, etc. I think it boils down to, how are people going to use this. For individuals, code names like Nehalem are too obscure. From my own personal experience, even power users often have no clue whether there processor is a Nehalem or not. For management tools, Nehalem is a somewhat imprecise target because it covers a wide range of potential processors. In general, I think what we really need to do is simplify the process of going from, here's the output of /proc/cpuinfo for a 100 nodes, what do I need to pass to qemu so that migration always works for these systems. I don't think -cpu nehalem really helps with that problem. -cpu none helps a bit, but I hope we can find something nicer. We can debate about the exact name/model to represent the Nehalem family, I don't have an issue with that and actually Intel and Amd should define it. There are two main motivations behind the above approach: 1. Sound guest cpu definition. Using a predefined model should automatically set all the relevant vendor/stepping/cpuid flags/cache sizes/etc. We just can let every management application deal with it. It breaks guest OS/apps. For instance there are MSI support in windows guest relay on the stepping. 2. Simplifying end user and mgmt tools. qemu/kvm have the best knowledge about these low levels. If we push it up in the stack, eventually it reaches the user. The end user, not a 'qemu-devel user' which is actually far better from the average user. This means that such users will have to know what is popcount and whether or not to limit migration on one host by adding sse4.2 or not. This is exactly what vmware are doing: - Intel CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 - AMD CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 Why should we invent the wheel (qemu64..)? Let's learn from their experience. This is the test description of the original patch by John: # Intel # - # Management layers remove pentium3 by default. # It primarily remains here for testing of 32-bit migration. # [0:Pentium 3 Intel :vmx :pentium3;] # Core 2, 65nm # possible option sets: (+nx,+cx16), (+nx,+cx16,+ssse3) # 1:Merom :vmx,sse2 :qemu64,-nx,+sse2; # Core2 45nm # 2:Penryn :vmx,sse2,nx,cx16,ssse3,sse4_1 :qemu64,+sse2,+cx16,+ssse3,+sse4_1; # Core i7 45/32nm # 3:Nehalem :vmx,sse2,nx,cx16,ssse3,sse4_1,sse4_2,popcnt :qemu64,+sse2,+cx16,+ssse3,+sse4_1,+sse4_2,+popcnt; # AMD # --- # Management layers remove pentium3 by default. # It primarily remains here for testing of 32-bit migration. # [0:Pentium 3 AMD :svm :pentium3;] # Opteron 90nm stepping E1/E4/E6 # possible option sets: (-nx) for 130nm # 1:Opteron G1 :svm,sse2,nx :qemu64,+sse2; # Opteron 90nm stepping F2/F3 # 2:Opteron G2 :svm,sse2,nx,cx16,rdtscp :qemu64,+sse2,+cx16,+rdtscp; # Opteron 65/45nm # 3:Opteron G3 :svm,sse2,nx,cx16,sse4a,misalignsse,popcnt,abm :qemu64,+sse2,+cx16,+sse4a,+misalignsse,+popcnt,+abm; Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add the ability to use *.xml as unattended_install file for kvm.
Signed-off-by: sshang ssh...@redhat.com --- client/tests/kvm/scripts/unattended.py |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/client/tests/kvm/scripts/unattended.py b/client/tests/kvm/scripts/unattended.py index 562d317..ee20b60 100755 --- a/client/tests/kvm/scripts/unattended.py +++ b/client/tests/kvm/scripts/unattended.py @@ -91,6 +91,8 @@ class UnattendedInstall(object): shutil.copyfile(setup_file_path, setup_file_dest) elif self.unattended_file.endswith('.ks'): dest_fname = 'ks.cfg' +elif self.unattended_file.endswith('.xml'): +dest_fname = autounattend.xml dest = os.path.join(self.floppy_mount, dest_fname) shutil.copyfile(self.unattended_file, dest) -- 1.5.3.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 10:03 AM, Dor Laor wrote: We can debate about the exact name/model to represent the Nehalem family, I don't have an issue with that and actually Intel and Amd should define it. AMD and Intel already defined their names (in cat /proc/cpuinfo). They don't define families, the whole idea is to segment the market. There are two main motivations behind the above approach: 1. Sound guest cpu definition. Using a predefined model should automatically set all the relevant vendor/stepping/cpuid flags/cache sizes/etc. We just can let every management application deal with it. It breaks guest OS/apps. For instance there are MSI support in windows guest relay on the stepping. 2. Simplifying end user and mgmt tools. qemu/kvm have the best knowledge about these low levels. If we push it up in the stack, eventually it reaches the user. The end user, not a 'qemu-devel user' which is actually far better from the average user. This means that such users will have to know what is popcount and whether or not to limit migration on one host by adding sse4.2 or not. This is exactly what vmware are doing: - Intel CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 - AMD CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 They don't have to deal with different qemu and kvm versions. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On Thu, Jan 07, 2010 at 10:03:28AM +0200, Dor Laor wrote: On 01/06/2010 05:16 PM, Anthony Liguori wrote: On 01/06/2010 08:48 AM, Dor Laor wrote: On 01/06/2010 04:32 PM, Avi Kivity wrote: On 01/06/2010 04:22 PM, Michael S. Tsirkin wrote: We can probably default -enable-kvm to -cpu host, as long as we explain very carefully that if users wish to preserve cpu features across upgrades, they can't depend on the default. Hardware upgrades or software upgrades? Yes. I just want to remind all the the main motivation for using -cpu realModelThatWasOnceShiped is to provide correct cpu emulation for the guest. Using a random qemu|kvm64+flag1-flag2 might really cause trouble for the guest OS or guest apps. On top of -cpu nehalem we can always add fancy features like x2apic, etc. I think it boils down to, how are people going to use this. For individuals, code names like Nehalem are too obscure. From my own personal experience, even power users often have no clue whether there processor is a Nehalem or not. For management tools, Nehalem is a somewhat imprecise target because it covers a wide range of potential processors. In general, I think what we really need to do is simplify the process of going from, here's the output of /proc/cpuinfo for a 100 nodes, what do I need to pass to qemu so that migration always works for these systems. I don't think -cpu nehalem really helps with that problem. -cpu none helps a bit, but I hope we can find something nicer. We can debate about the exact name/model to represent the Nehalem family, I don't have an issue with that and actually Intel and Amd should define it. There are two main motivations behind the above approach: 1. Sound guest cpu definition. Using a predefined model should automatically set all the relevant vendor/stepping/cpuid flags/cache sizes/etc. We just can let every management application deal with it. It breaks guest OS/apps. For instance there are MSI support in windows guest relay on the stepping. 2. Simplifying end user and mgmt tools. qemu/kvm have the best knowledge about these low levels. If we push it up in the stack, eventually it reaches the user. The end user, not a 'qemu-devel user' which is actually far better from the average user. This means that such users will have to know what is popcount and whether or not to limit migration on one host by adding sse4.2 or not. This is exactly what vmware are doing: - Intel CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 - AMD CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 Why should we invent the wheel (qemu64..)? Let's learn from their experience. NB, be careful to distinguish the different levels of VMwares mgmt stack. In terms of guest configuration, VMWare ESX APIs require the management app to specify the raw CPUID masks. With VirtualCenter VMotion they defined this handful of common Intel/AMD CPU sets, and will automatically classify hosts into one of these sets and use that to specify a default CPUID mask, in the case that the guest does not have an explicit one in its config. This gives them good default, out-of-the-box behaviour, while also allowing mgmt apps 100% control over each guest's CPUID should they want it. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1
On 01/06/2010 05:45 PM, Mark Cave-Ayland wrote: Avi Kivity wrote: It probably did make some kind of difference. Please try a clean install. After several hours of testing, I've finally found out what the problem is. I tried a clean WinXP guest install and that worked, so it was obviously a driver issue. After disabling various drivers in the WinXP guest, I didn't get anywhere so I decided to take a break and test Marcelo's VNC patch. With this applied, I could actually see all of the information in the BSOD which showed the error was in intelppm.sys. A quick search took me to this page here: http://blogs.msdn.com/virtual_pc_guy/archive/2005/10/24/484461.aspx which explains the issue in more detail. I first tried disabling the intelppm driver and rebooting, but that didn't make a difference; however disabling the Processor driver worked and my guest VM booted in Normal Mode :) I think the issue is probably similar to that explained in the article above; with a new processor reported to the guest, the internal processor driver tries to upload some kind of microcode to the new device which fails and causes the guest to fall over. Can we teach KVM to silently discard these kinds of updates? Can you try loading kvm.ko with the ignore_msrs module parameter set? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1
On 01/06/2010 07:08 PM, Mark Cave-Ayland wrote: Mark Cave-Ayland wrote: A quick search took me to this page here: http://blogs.msdn.com/virtual_pc_guy/archive/2005/10/24/484461.aspx which explains the issue in more detail. I first tried disabling the intelppm driver and rebooting, but that didn't make a difference; however disabling the Processor driver worked and my guest VM booted in Normal Mode :) I've just re-created the KVM image fresh from the VDI image once again and can confirm that disabling just the Processor driver is enough to allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. Perhaps the default for -cpu host should not be changed in a micro release as there is a risk of breaking existing VMs? That was actually a fix for a regression relative to 0.11. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 10:18 AM, Avi Kivity wrote: On 01/07/2010 10:03 AM, Dor Laor wrote: We can debate about the exact name/model to represent the Nehalem family, I don't have an issue with that and actually Intel and Amd should define it. AMD and Intel already defined their names (in cat /proc/cpuinfo). They don't define families, the whole idea is to segment the market. The idea here is to minimize the number of models we should have the following range for Intel for example: pentium3 - merom - penry - Nehalem - host - kvm/qemu64 So we're supplying wide range of cpus, p3 for maximum flexibility and migration, nehalem for performance and migration, host for maximum performance and qemu/kvm64 for custom maid. There are two main motivations behind the above approach: 1. Sound guest cpu definition. Using a predefined model should automatically set all the relevant vendor/stepping/cpuid flags/cache sizes/etc. We just can let every management application deal with it. It breaks guest OS/apps. For instance there are MSI support in windows guest relay on the stepping. 2. Simplifying end user and mgmt tools. qemu/kvm have the best knowledge about these low levels. If we push it up in the stack, eventually it reaches the user. The end user, not a 'qemu-devel user' which is actually far better from the average user. This means that such users will have to know what is popcount and whether or not to limit migration on one host by adding sse4.2 or not. This is exactly what vmware are doing: - Intel CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 - AMD CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 They don't have to deal with different qemu and kvm versions. Both our customers - the end users. It's not their problem. IMO what's missing today is a safe and sound cpu emulation that is simply and friendly to represent. qemu64,+popcount is not simple for the end user. There is no reason to through it on higher level mgmt. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 11:24 AM, Avi Kivity wrote: On 01/07/2010 11:11 AM, Dor Laor wrote: On 01/07/2010 10:18 AM, Avi Kivity wrote: On 01/07/2010 10:03 AM, Dor Laor wrote: We can debate about the exact name/model to represent the Nehalem family, I don't have an issue with that and actually Intel and Amd should define it. AMD and Intel already defined their names (in cat /proc/cpuinfo). They don't define families, the whole idea is to segment the market. The idea here is to minimize the number of models we should have the following range for Intel for example: pentium3 - merom - penry - Nehalem - host - kvm/qemu64 So we're supplying wide range of cpus, p3 for maximum flexibility and migration, nehalem for performance and migration, host for maximum performance and qemu/kvm64 for custom maid. There's no such thing as Nehalem. Intel were ok with it. Again, you can name is corei7 or xeon34234234234, I don't care, the principle remains the same. This is exactly what vmware are doing: - Intel CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1991 - AMD CPUs : http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1992 They don't have to deal with different qemu and kvm versions. Both our customers - the end users. It's not their problem. IMO what's missing today is a safe and sound cpu emulation that is simply and friendly to represent. qemu64,+popcount is not simple for the end user. There is no reason to through it on higher level mgmt. There's no simple solution except to restrict features to what was available on the first processors. What's not simple about the above 4 options? What's a better alternative (that insures users understand it and use it and guest msi and even skype application is happy about it)? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1
Avi Kivity wrote: I think the issue is probably similar to that explained in the article above; with a new processor reported to the guest, the internal processor driver tries to upload some kind of microcode to the new device which fails and causes the guest to fall over. Can we teach KVM to silently discard these kinds of updates? Can you try loading kvm.ko with the ignore_msrs module parameter set? Hi Avi, I've just done a quick test re-enabling processor.sys on my WinXP guest and then did the following: virsh stop winxp rmmod kvm_intel rmmod kvm modprobe kvm ignore_msrs=1 modprobe kvm_intel virsh start winxp Unfortunately it still crashes with the same DRIVER_UNLOADED_WITHOUT_CANCELING_PENDING_OPERATIONS BSOD :( HTH, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1
Avi Kivity wrote: I've just re-created the KVM image fresh from the VDI image once again and can confirm that disabling just the Processor driver is enough to allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. Perhaps the default for -cpu host should not be changed in a micro release as there is a risk of breaking existing VMs? That was actually a fix for a regression relative to 0.11. Really? Damn :( Any pointers towards the relevant bug in the bug tracker? ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] eventfd: new EFD_STATE flag
On Wed, Jan 06, 2010 at 11:25:40PM -0800, Davide Libenzi wrote: On Thu, 7 Jan 2010, Michael S. Tsirkin wrote: OK. What I think we need is a way to remove ourselves from the eventfd wait queue and clear the counter atomically. We currently do remove_wait_queue(irqfd-wqh, irqfd-wait); where wqh saves the eventfd wait queue head. You do a remove_wait_queue() from inside a callback wakeup on the same wait queue head? No, not from callback, in ioctl context. If we do this before proposed eventfd_read_ctx, we can lose events. If we do this after, we can get spurious events. An unlocked read is one way to fix this. You posted one line of code and a two lines analysis of the issue. Can you be a little bit more verbose and show me more code, so that I can actually see what is going on? - Davide Sure, I was trying to be as brief as possible, here's a detailed summary. Description of the system (MSI emulation in KVM): KVM supports an ioctl to assign/deassign an eventfd file to interrupt message in guest OS. When this eventfd is signalled, interrupt message is sent. This assignment is done from qemu system emulator. eventfd is signalled from device emulation in another thread in userspace or from kernel, which talks with guest OS through another eventfd and shared memory (possibility of out of process was discussed but never got implemented yet). Note: it's okay to delay messages from correctness point of view, but generally this is latency-sensitive path. If multiple identical messages are requested, it's okay to send a single last message, but missing a message altogether causes deadlocks. Sending a message when none were requested might in theory cause crashes, in practice doing this causes performance degradation. Another KVM feature is interrupt masking: guest OS requests that we stop sending some interrupt message, possibly modified mapping and re-enables this message. This needs to be done without involving the device that might keep requesting events: while masked, message is marked pending, and guest might test the pending status. We can implement masking in system emulator in userspace, by using assign/deassign ioctls: when message is masked, we simply deassign all eventfd, and when it is unmasked, we assign them back. Here's some code to illustrate how this all works: assign/deassign code in kernel looks like the following: this is called to unmask interrupt static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; int ret; unsigned int events; irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); ... file = eventfd_fget(fd); if (IS_ERR(file)) { ret = PTR_ERR(file); goto fail; } eventfd = eventfd_ctx_fileget(file); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); goto fail; } irqfd-eventfd = eventfd; /* * Install our own custom wake-up handling so we are notified via * a callback whenever someone signals the underlying eventfd */ init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup); init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc); spin_lock_irq(kvm-irqfds.lock); events = file-f_op-poll(file, irqfd-pt); list_add_tail(irqfd-list, kvm-irqfds.items); spin_unlock_irq(kvm-irqfds.lock); A. /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. */ if (events POLLIN) schedule_work(irqfd-inject); /* * do not drop the file until the irqfd is fully initialized, otherwise * we might race against the POLLHUP */ fput(file); return 0; fail: ... } This is called to mask interrupt /* * shutdown any irqfd's that match fd+gsi */ static int kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) { struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; eventfd = eventfd_ctx_fdget(fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); spin_lock_irq(kvm-irqfds.lock); list_for_each_entry_safe(irqfd, tmp, kvm-irqfds.items, list) { if (irqfd-eventfd == eventfd irqfd-gsi == gsi) irqfd_deactivate(irqfd); } spin_unlock_irq(kvm-irqfds.lock); eventfd_ctx_put(eventfd); /* * Block until we know all outstanding shutdown jobs have completed * so that we guarantee there will not be any more interrupts on this * gsi once this deassign function returns. */ flush_workqueue(irqfd_cleanup_wq); return 0; } And deactivation deep down does this (from
Very bad Speed with Virtio-net
Hello everybody, this is my first post on a mailing list, so i hope everything works fine. My host is a AMD X2 4850e with a 64bit Gentoo (unstable). I have tested qemu-kvm 0.11, 0.12.x and the git version from the 6. jan. I created my own bridges, so i dont need the option from libvirt. I bridged a 1 Gb lan card for my VMs. When I use the virtio net driver, i get something about 200-300 mbit form my desktop to one if my VMs. If iI use the e1000 driver instead of the virtio I get about 500 - 600 mbit. I tested this with the following kernels: Host: 2.6.31.6, 2.6.32.1, 2.6.32.2 Guests: 2.6.26, 2.6.30, 2.6.32 (debian) 2.6.32 (gentoo) Here is a default result, virtio vs. e1000: iperf -c 192.168.0.3 -w 512k -l 512k Client connecting to 192.168.0.3, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 512 KByte) [ 3] local 192.168.0.2 port 52968 connected with 192.168.0.3 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec438 MBytes267 Mbits/sec iperf -c 192.168.0.3 -w 512k -l 512k Client connecting to 192.168.0.3, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 512 KByte) [ 3] local 192.168.0.2 port 52995 connected with 192.168.0.3 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec602 MBytes505 Mbits/sec Any ideas what this could be? I attach a dmesg output of my host. Thx. Ben Linux version 2.6.32-gentoo-r1 (r...@tux) (gcc version 4.4.2 (Gentoo 4.4.2 p1.0) ) #2 SMP Wed Jan 6 12:04:57 CET 2010 Command line: root=/dev/sda2 KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls BIOS-provided physical RAM map: BIOS-e820: - 0009f400 (usable) BIOS-e820: 0009f400 - 000a (reserved) BIOS-e820: 000e6000 - 0010 (reserved) BIOS-e820: 0010 - cfeb (usable) BIOS-e820: cfeb - cfebe000 (ACPI data) BIOS-e820: cfebe000 - cfee (ACPI NVS) BIOS-e820: cfee - cfeee000 (reserved) BIOS-e820: cfef - cff0 (reserved) BIOS-e820: ff70 - 0001 (reserved) BIOS-e820: 0001 - 0001a000 (usable) DMI present. AMI BIOS detected: BIOS may corrupt low RAM, working around it. e820 update range: - 0001 (usable) == (reserved) last_pfn = 0x1a max_arch_pfn = 0x4 MTRR default type: uncachable MTRR fixed ranges enabled: 0-9 write-back A-E uncachable F-F write-protect MTRR variable ranges enabled: 0 base 00 mask FF8000 write-back 1 base 008000 mask FFC000 write-back 2 base 00C000 mask FFF000 write-back 3 disabled 4 disabled 5 disabled 6 disabled 7 disabled TOM2: 0001b000 aka 6912M e820 update range: d000 - 0001 (usable) == (reserved) last_pfn = 0xcfeb0 max_arch_pfn = 0x4 initial memory mapped : 0 - 2000 init_memory_mapping: -cfeb 00 - 00cfe0 page 2M 00cfe0 - 00cfeb page 4k kernel direct mapping tables up to cfeb @ 1-16000 init_memory_mapping: 0001-0001a000 01 - 01a000 page 2M kernel direct mapping tables up to 1a000 @ 14000-1c000 ACPI: RSDP 000f9e40 00014 (v00 ACPIAM) ACPI: RSDT cfeb 0003C (v01 110608 RSDT1133 20081106 MSFT 0097) ACPI: FACP cfeb0200 00084 (v02 110608 FACP1133 20081106 MSFT 0097) ACPI: DSDT cfeb0440 04D44 (v01 1 1000 INTL 20051117) ACPI: FACS cfebe000 00040 ACPI: APIC cfeb0390 0006C (v01 110608 APIC1133 20081106 MSFT 0097) ACPI: MCFG cfeb0400 0003C (v01 110608 OEMMCFG 20081106 MSFT 0097) ACPI: OEMB cfebe040 00071 (v01 110608 OEMB1133 20081106 MSFT 0097) ACPI: HPET cfeb5190 00038 (v01 110608 OEMHPET 20081106 MSFT 0097) ACPI: SSDT cfeb51d0 0028A (v01 A M I POWERNOW 0001 AMD 0001) ACPI: Local APIC address 0xfee0 (7 early reservations) == bootmem [00 - 01a000] #0 [00 - 001000] BIOS data page == [00 - 001000] #1 [006000 - 008000] TRAMPOLINE == [006000 - 008000] #2 [000100 - 0001a7ca84]TEXT DATA BSS == [000100 - 0001a7ca84] #3 [09f400 - 10]BIOS reserved == [09f400 - 10] #4 [0001a7d000 - 0001a7d0f1] BRK == [0001a7d000 - 0001a7d0f1] #5 [01 - 014000] PGTABLE == [01 - 014000] #6 [014000 - 017000] PGTABLE == [014000 - 017000] found SMP MP-table at [880ff780] ff780
Re: Very bad Speed with Virtio-net
I have similar results, like yours, using CentOS 5.4 x86_64 I do not think it is possible to gain more than this right now... or better I wish it could be possible If you can get better result please let me know Rick Benjamin Schweikert wrote: Hello everybody, this is my first post on a mailing list, so i hope everything works fine. My host is a AMD X2 4850e with a 64bit Gentoo (unstable). I have tested qemu-kvm 0.11, 0.12.x and the git version from the 6. jan. I created my own bridges, so i dont need the option from libvirt. I bridged a 1 Gb lan card for my VMs. When I use the virtio net driver, i get something about 200-300 mbit form my desktop to one if my VMs. If iI use the e1000 driver instead of the virtio I get about 500 - 600 mbit. I tested this with the following kernels: Host: 2.6.31.6, 2.6.32.1, 2.6.32.2 Guests: 2.6.26, 2.6.30, 2.6.32 (debian) 2.6.32 (gentoo) Here is a default result, virtio vs. e1000: iperf -c 192.168.0.3 -w 512k -l 512k Client connecting to 192.168.0.3, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 512 KByte) [ 3] local 192.168.0.2 port 52968 connected with 192.168.0.3 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec438 MBytes267 Mbits/sec iperf -c 192.168.0.3 -w 512k -l 512k Client connecting to 192.168.0.3, TCP port 5001 TCP window size: 256 KByte (WARNING: requested 512 KByte) [ 3] local 192.168.0.2 port 52995 connected with 192.168.0.3 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec602 MBytes505 Mbits/sec Any ideas what this could be? I attach a dmesg output of my host. Thx. Ben -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 03:40 AM, Dor Laor wrote: There's no simple solution except to restrict features to what was available on the first processors. What's not simple about the above 4 options? What's a better alternative (that insures users understand it and use it and guest msi and even skype application is happy about it)? Even if you have -cpu Nehalem, different versions of the KVM kernel module may additionally filter cpuid flags. So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary to say: (2.6.33) qemu -cpu Nehalem,-syscall (2.6.18) qemu -cpu Nehalem In order to be compatible. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 01:39 PM, Anthony Liguori wrote: On 01/07/2010 03:40 AM, Dor Laor wrote: There's no simple solution except to restrict features to what was available on the first processors. What's not simple about the above 4 options? What's a better alternative (that insures users understand it and use it and guest msi and even skype application is happy about it)? Even if you have -cpu Nehalem, different versions of the KVM kernel module may additionally filter cpuid flags. So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary to say: (2.6.33) qemu -cpu Nehalem,-syscall (2.6.18) qemu -cpu Nehalem Or let qemu do it automatically for you. In order to be compatible. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 11:40 AM, Dor Laor wrote: There's no such thing as Nehalem. Intel were ok with it. Again, you can name is corei7 or xeon34234234234, I don't care, the principle remains the same. There are several processors belonging to the Nehalem family and each have different features. What's not simple about the above 4 options? If a qemu/kvm/processor combo doesn't support a feature (say, nx) we have to remove it from the migration pool even if the Nehalem processor class says it's included. Or else not admit that combination into the migration pool in the first place. What's a better alternative (that insures users understand it and use it and guest msi and even skype application is happy about it)? Have management scan new nodes and classify them. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 01:44 PM, Dor Laor wrote: So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary to say: (2.6.33) qemu -cpu Nehalem,-syscall (2.6.18) qemu -cpu Nehalem Or let qemu do it automatically for you. qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on another node. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Lazy fpu for svm/npt
This patchset (on top of the previous cr0 patchset) brings lazy fpu to npt. For the cases where guest and host cr0 match (the majority) it will disable intercepts for cr0.ts once the guest fpu is loaded, so the guest can to its own lazy fpu without trapping. Avi Kivity (3): KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK KVM: SVM: Initialize fpu_active in init_vmcb() KVM: SVM: Lazy fpu for npt arch/x86/include/asm/svm.h |2 +- arch/x86/kvm/svm.c | 73 +-- 2 files changed, 37 insertions(+), 38 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: SVM: Fix SVM_CR0_SELECTIVE_MASK
Instead of selecting TS and MP as the comments say, the macro included TS and PE. Luckily the macro is unused now, but fix in order to save a few hours of debugging from anyone who attempts to use it. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/svm.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 1fecb7e..38638cd 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -313,7 +313,7 @@ struct __attribute__ ((__packed__)) vmcb { #define SVM_EXIT_ERR -1 -#define SVM_CR0_SELECTIVE_MASK (1 3 | 1) /* TS and MP */ +#define SVM_CR0_SELECTIVE_MASK (X86_CR0_TS | X86_CR0_MP) #define SVM_VMLOAD .byte 0x0f, 0x01, 0xda #define SVM_VMRUN .byte 0x0f, 0x01, 0xd8 -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: SVM: Initialize fpu_active in init_vmcb()
init_vmcb() sets up the intercepts as if the fpu is active, so initialize it there. This avoids an INIT from setting up intercepts inconsistent with fpu_active. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/svm.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2a3890f..f4418e2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -540,6 +540,8 @@ static void init_vmcb(struct vcpu_svm *svm) struct vmcb_control_area *control = svm-vmcb-control; struct vmcb_save_area *save = svm-vmcb-save; + svm-vcpu.fpu_active = 1; + control-intercept_cr_read =INTERCEPT_CR0_MASK | INTERCEPT_CR3_MASK | INTERCEPT_CR4_MASK; @@ -730,7 +732,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) init_vmcb(svm); fx_init(svm-vcpu); - svm-vcpu.fpu_active = 1; svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE; if (kvm_vcpu_is_bsp(svm-vcpu)) svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP; -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: SVM: Lazy fpu for npt
If two conditions apply: - no bits outside TS and EM differ between the host and guest cr0 - the fpu is active then we can activate the selective cr0 write intercept and drop the unconditional cr0 read and write intercept, and allow the guest to run with the host fpu state. This reduces the heavyweight context switch when npt is enabled. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/svm.c | 70 +-- 1 files changed, 34 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f4418e2..7f3d890 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -571,6 +571,7 @@ static void init_vmcb(struct vcpu_svm *svm) control-intercept =(1ULL INTERCEPT_INTR) | (1ULL INTERCEPT_NMI) | (1ULL INTERCEPT_SMI) | + (1ULL INTERCEPT_SELECTIVE_CR0) | (1ULL INTERCEPT_CPUID) | (1ULL INTERCEPT_INVD) | (1ULL INTERCEPT_HLT) | @@ -643,10 +644,8 @@ static void init_vmcb(struct vcpu_svm *svm) control-intercept = ~((1ULL INTERCEPT_TASK_SWITCH) | (1ULL INTERCEPT_INVLPG)); control-intercept_exceptions = ~(1 PF_VECTOR); - control-intercept_cr_read = ~(INTERCEPT_CR0_MASK| - INTERCEPT_CR3_MASK); - control-intercept_cr_write = ~(INTERCEPT_CR0_MASK| -INTERCEPT_CR3_MASK); + control-intercept_cr_read = ~INTERCEPT_CR3_MASK; + control-intercept_cr_write = ~INTERCEPT_CR3_MASK; save-g_pat = 0x0007040600070406ULL; save-cr3 = 0; save-cr4 = 0; @@ -965,6 +964,27 @@ static void svm_decache_cr4_guest_bits(struct kvm_vcpu *vcpu) { } +static void update_cr0_intercept(struct vcpu_svm *svm) +{ + ulong gcr0 = svm-vcpu.arch.cr0; + u64 *hcr0 = svm-vmcb-save.cr0; + + if (!svm-vcpu.fpu_active) + *hcr0 |= SVM_CR0_SELECTIVE_MASK; + else + *hcr0 = (*hcr0 ~SVM_CR0_SELECTIVE_MASK) + | (gcr0 SVM_CR0_SELECTIVE_MASK); + + + if (gcr0 == *hcr0 svm-vcpu.fpu_active) { + svm-vmcb-control.intercept_cr_read = ~INTERCEPT_CR0_MASK; + svm-vmcb-control.intercept_cr_write = ~INTERCEPT_CR0_MASK; + } else { + svm-vmcb-control.intercept_cr_read |= INTERCEPT_CR0_MASK; + svm-vmcb-control.intercept_cr_write |= INTERCEPT_CR0_MASK; + } +} + static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { struct vcpu_svm *svm = to_svm(vcpu); @@ -982,12 +1002,11 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) } } #endif - if (npt_enabled) - goto set; - vcpu-arch.cr0 = cr0; - cr0 |= X86_CR0_PG | X86_CR0_WP; -set: + + if (!npt_enabled) + cr0 |= X86_CR0_PG | X86_CR0_WP; + /* * re-enable caching here because the QEMU bios * does not do it - this results in some delay at @@ -995,6 +1014,7 @@ set: */ cr0 = ~(X86_CR0_CD | X86_CR0_NW); svm-vmcb-save.cr0 = cr0; + update_cr0_intercept(svm); } static void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) @@ -1240,11 +1260,8 @@ static int ud_interception(struct vcpu_svm *svm) static int nm_interception(struct vcpu_svm *svm) { svm-vmcb-control.intercept_exceptions = ~(1 NM_VECTOR); - if (!kvm_read_cr0_bits(svm-vcpu, X86_CR0_TS)) - svm-vmcb-save.cr0 = ~X86_CR0_TS; - else - svm-vmcb-save.cr0 |= X86_CR0_TS; svm-vcpu.fpu_active = 1; + update_cr0_intercept(svm); return 1; } @@ -2297,7 +2314,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR3] = emulate_on_interception, [SVM_EXIT_READ_CR4] = emulate_on_interception, [SVM_EXIT_READ_CR8] = emulate_on_interception, - /* for now: */ + [SVM_EXIT_CR0_SEL_WRITE]= emulate_on_interception, [SVM_EXIT_WRITE_CR0]= emulate_on_interception, [SVM_EXIT_WRITE_CR3]= emulate_on_interception, [SVM_EXIT_WRITE_CR4]= emulate_on_interception, @@ -2383,21 +2400,10 @@ static int handle_exit(struct kvm_vcpu *vcpu) svm_complete_interrupts(svm); - if (npt_enabled) { - int mmu_reload = 0; - if ((kvm_read_cr0_bits(vcpu, X86_CR0_PG) ^ svm-vmcb-save.cr0) -X86_CR0_PG) { - svm_set_cr0(vcpu, svm-vmcb-save.cr0); - mmu_reload = 1; -
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 02:00 PM, Avi Kivity wrote: On 01/07/2010 01:44 PM, Dor Laor wrote: So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary to say: (2.6.33) qemu -cpu Nehalem,-syscall (2.6.18) qemu -cpu Nehalem Or let qemu do it automatically for you. qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on another node. We can live with it, either have qemu realize the kernel version out of another existing feature or query uname. Alternatively, the matching libvirt package can be the one adding or removing it in the right distribution. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 06:20 AM, Dor Laor wrote: On 01/07/2010 02:00 PM, Avi Kivity wrote: On 01/07/2010 01:44 PM, Dor Laor wrote: So if you had a 2.6.18 kernel and a 2.6.33 kernel, it may be necessary to say: (2.6.33) qemu -cpu Nehalem,-syscall (2.6.18) qemu -cpu Nehalem Or let qemu do it automatically for you. qemu on 2.6.33 doesn't know that you're running qemu on 2.6.18 on another node. We can live with it, either have qemu realize the kernel version out of another existing feature or query uname. Alternatively, the matching libvirt package can be the one adding or removing it in the right distribution. There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WinXP virtual crashes on 0.12.1.2 but not 0.12.1.1
On 01/07/2010 11:57 AM, Mark Cave-Ayland wrote: Avi Kivity wrote: I've just re-created the KVM image fresh from the VDI image once again and can confirm that disabling just the Processor driver is enough to allow the guest WinXP VM to function in qemu-kvm-0.12.1.2. Perhaps the default for -cpu host should not be changed in a micro release as there is a risk of breaking existing VMs? That was actually a fix for a regression relative to 0.11. Really? Damn :( Any pointers towards the relevant bug in the bug tracker? No, it was reported on list. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] make help output be a little more self-consistent
Bruce, Can you please send two patches, one for qemu upstream (qemu-de...@nongnu.org), and another for qemu-kvm (relative to qemu-kvm specific options). Thanks. On Wed, Jan 06, 2010 at 12:31:20PM -0700, Bruce Rogers wrote: Signed-off-by: Bruce Rogers brog...@novell.com --- qemu-options.hx | 58 -- 1 files changed, 30 insertions(+), 28 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 812d067..fdd5884 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp, -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n set the number of CPUs to 'n' [default=1]\n maxcpus= maximum number of total cpus, including\n - offline CPUs for hotplug etc.\n +offline CPUs for hotplug, etc\n cores= number of CPU cores on one socket\n threads= number of threads on one CPU core\n sockets= number of discrete sockets in the system\n) @@ -406,8 +406,9 @@ ETEXI DEF(device, HAS_ARG, QEMU_OPTION_device, -device driver[,options] add device\n) DEF(name, HAS_ARG, QEMU_OPTION_name, --name string1[,process=string2]set the name of the guest\n -string1 sets the window title and string2 the process name (on Linux)\n) +-name string1[,process=string2]\n +set the name of the guest\n +string1 sets the window title and string2 the process name (on Linux)\n) STEXI @item -name @var{name} Sets the @var{name} of the guest. @@ -484,7 +485,7 @@ ETEXI -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH resend] Fix the explanation of write_emulated
On Wed, Jan 06, 2010 at 05:55:23PM +0900, Takuya Yoshikawa wrote: The explanation of write_emulated is confused with that of read_emulated. This patch fix it. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] configure: Correct KVM options in help output
On Wed, Jan 06, 2010 at 10:23:54AM +0100, Pierre Riteau wrote: Signed-off-by: Pierre Riteau pierre.rit...@irisa.fr --- configure |8 1 files changed, 4 insertions(+), 4 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 02:33 PM, Anthony Liguori wrote: There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. Currently, our policy is to only migrate dynamic (from the guest's point of view) state, and specify static state on the command line [1]. I think your suggestion makes a lot of sense, but I'd like to expand it to move all guest state, whether dynamic or static. So '-m 1G' would be migrated as well (but not -mem-path). Similarly, in -drive file=...,if=ide,index=1, everything but file=... would be migrated. This has an advantage wrt hotplug: since qemu is responsible for migrating all guest visible information, the migrator is no longer responsible for replaying hotplug events in the exact sequence they happened. In short, I think we should apply your suggestion as broadly as possible. [1] cpuid state is actually dynamic; repeated cpuid instruction execution with the same operands can return different results. kvm supports querying and setting this state. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On Thu, Jan 07, 2010 at 02:40:34PM +0200, Avi Kivity wrote: On 01/07/2010 02:33 PM, Anthony Liguori wrote: There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. Currently, our policy is to only migrate dynamic (from the guest's point of view) state, and specify static state on the command line [1]. I think your suggestion makes a lot of sense, but I'd like to expand it to move all guest state, whether dynamic or static. So '-m 1G' would be migrated as well (but not -mem-path). Similarly, in -drive file=...,if=ide,index=1, everything but file=... would be migrated. This has an advantage wrt hotplug: since qemu is responsible for migrating all guest visible information, the migrator is no longer responsible for replaying hotplug events in the exact sequence they happened. With the introduction of the new -device spport, there's no need to replay hotplug events in order any more. Instead just use static PCI addresses when starting the guest, and the same addresses after migration. You could argue that QEMU should preserve the addressing automatically during migration, but apps need to do it manually already to keep addreses stable across power-offs, so doing it manually across migration too is no extra burden. Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 02:47 PM, Daniel P. Berrange wrote: With the introduction of the new -device spport, there's no need to replay hotplug events in order any more. Instead just use static PCI addresses when starting the guest, and the same addresses after migration. You could argue that QEMU should preserve the addressing automatically during migration, but apps need to do it manually already to keep addreses stable across power-offs, so doing it manually across migration too is no extra burden. That's true - shutdown and startup are an equivalent problem to live migration from that point of view. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 06:40 AM, Avi Kivity wrote: On 01/07/2010 02:33 PM, Anthony Liguori wrote: There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. Currently, our policy is to only migrate dynamic (from the guest's point of view) state, and specify static state on the command line [1]. I think your suggestion makes a lot of sense, but I'd like to expand it to move all guest state, whether dynamic or static. So '-m 1G' would be migrated as well (but not -mem-path). Similarly, in -drive file=...,if=ide,index=1, everything but file=... would be migrated. Yes, I agree with this and it should be in the form of an fdt. This means we need full qdev conversion. But I think cpuid is somewhere in the middle with respect to static vs. dynamic. For instance, -cpu host is very dynamic in that you get very difficult results on different systems. Likewise, because of kvm filtering, even -cpu qemu64 can be dynamic. So if we didn't have filtering and -cpu host, I'd agree that it's totally static but I think in the current state, it's dynamic. This has an advantage wrt hotplug: since qemu is responsible for migrating all guest visible information, the migrator is no longer responsible for replaying hotplug events in the exact sequence they happened. Yup, 100% in agreement as a long term goal. In short, I think we should apply your suggestion as broadly as possible. [1] cpuid state is actually dynamic; repeated cpuid instruction execution with the same operands can return different results. kvm supports querying and setting this state. Yes, and we save some cpuid state in cpu. We just don't save all of it. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] cpuid problem in upstream qemu with kvm
On 01/07/2010 03:14 PM, Anthony Liguori wrote: On 01/07/2010 06:40 AM, Avi Kivity wrote: On 01/07/2010 02:33 PM, Anthony Liguori wrote: There's another option. Make cpuid information part of live migration protocol, and then support something like -cpu Xeon-3550. We would remember the exact cpuid mask we present to the guest and then we could validate that we can obtain the same mask on the destination. It solves controlling the destination qemu execution all right but does not change the initial spawning of the original guest - to know whether ,-syscall is needed or not. Anyway, I'm in favor of it too. Currently, our policy is to only migrate dynamic (from the guest's point of view) state, and specify static state on the command line [1]. I think your suggestion makes a lot of sense, but I'd like to expand it to move all guest state, whether dynamic or static. So '-m 1G' would be migrated as well (but not -mem-path). Similarly, in -drive file=...,if=ide,index=1, everything but file=... would be migrated. Yes, I agree with this and it should be in the form of an fdt. This means we need full qdev conversion. But I think cpuid is somewhere in the middle with respect to static vs. dynamic. For instance, -cpu host is very dynamic in that you get very difficult results on different systems. Likewise, because of kvm filtering, even -cpu qemu64 can be dynamic. So if we didn't have filtering and -cpu host, I'd agree that it's totally static but I think in the current state, it's dynamic. This has an advantage wrt hotplug: since qemu is responsible for migrating all guest visible information, the migrator is no longer responsible for replaying hotplug events in the exact sequence they happened. Yup, 100% in agreement as a long term goal. In short, I think we should apply your suggestion as broadly as possible. [1] cpuid state is actually dynamic; repeated cpuid instruction execution with the same operands can return different results. kvm supports querying and setting this state. Yes, and we save some cpuid state in cpu. We just don't save all of it. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
may be offtopic question
Hello. I'm new with kvm, i'm try to run exherbo linux under it. Kvm runs on 2.6.32 on gentoo linux Virtual machine works slowly, some times i see this error in syslog: [ 1929.705897] BUG: MAX_LOCK_DEPTH too low! [ 1929.705902] turning off the locking correctness validator. [ 1929.705906] Pid: 6523, comm: vm1 Not tainted 2.6.32-gentoo-r1vase #1 [ 1929.705909] Call Trace: [ 1929.705918] [8109127e] __lock_acquire+0x8d/0x3f0 [ 1929.705924] [81092856] lock_acquire+0xd7/0xfc [ 1929.705930] [810f46bf] ? mm_take_all_locks+0x92/0x105 [ 1929.705936] [816365fa] _spin_lock_nest_lock+0x40/0x75 [ 1929.705940] [810f46bf] ? mm_take_all_locks+0x92/0x105 [ 1929.705945] [810f46bf] mm_take_all_locks+0x92/0x105 [ 1929.705949] [810feda9] ? do_mmu_notifier_register +0x80/0x149 [ 1929.705954] [810fedb1] do_mmu_notifier_register+0x88/0x149 [ 1929.705958] [810fee8d] mmu_notifier_register+0xe/0x10 [ 1929.705964] [8100c44c] kvm_dev_ioctl+0x138/0x2f7 [ 1929.705969] [81143854] compat_sys_ioctl+0x1b5/0x40d [ 1929.705975] [8163585f] ? lockdep_sys_exit_thunk+0x35/0x67 [ 1929.705980] [8126a678] ? __up_read+0x1c/0x8c [ 1929.705987] [81056f7f] sysenter_dispatch+0x7/0x2e qemu run as: qemu-system-x86_64 -boot c -m 512 -sdl -net vde,name=tap0,vlan=0 -net nic,vlan=0,macaddr=52:54:00:00:EE:03 -localtime -name vm1,process=vm1 -vga std -balloon virtio -enable-kvm -runas vase -enable-nesting -cdrom /media/kvm/install-x86-minimal-20091103.iso -hda ~vase/exherbo-kvm-amd64-20091013.img -- Vasiliy G Tolstov v.tols...@selfip.ru Selfip.Ru -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: don't treat NULL parent_pte as multimapped in mmu_parent_walk()
If a kvm_mmu_page is not multimapped but parent_pte is NULL don't treat it as multimapped and dereference it. Signed-off-by: Roel Kluin roel.kl...@gmail.com --- This wasn't tested and maybe I misunderstood so please review. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4c3e5b2..eb17287 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1031,10 +1031,12 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_mmu_page *parent_sp; int i; - if (!sp-multimapped sp-parent_pte) { - parent_sp = page_header(__pa(sp-parent_pte)); - fn(vcpu, parent_sp); - mmu_parent_walk(vcpu, parent_sp, fn); + if (!sp-multimapped) { + if (sp-parent_pte) { + parent_sp = page_header(__pa(sp-parent_pte)); + fn(vcpu, parent_sp); + mmu_parent_walk(vcpu, parent_sp, fn); + } return; } hlist_for_each_entry(pte_chain, node, sp-parent_ptes, link) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: may be offtopic question
On 01/07/2010 05:00 PM, Vasiliy G Tolstov wrote: Hello. I'm new with kvm, i'm try to run exherbo linux under it. Kvm runs on 2.6.32 on gentoo linux Virtual machine works slowly, some times i see this error in syslog: [ 1929.705897] BUG: MAX_LOCK_DEPTH too low! [ 1929.705902] turning off the locking correctness validator. You're running a debugging configuration. Turn off CONFIG_LOCKDEP and other debugging options if you want reasonable performance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: don't treat NULL parent_pte as multimapped in mmu_parent_walk()
On 01/07/2010 05:56 PM, Roel Kluin wrote: If a kvm_mmu_page is not multimapped but parent_pte is NULL don't treat it as multimapped and dereference it. Signed-off-by: Roel Kluinroel.kl...@gmail.com --- This wasn't tested and maybe I misunderstood so please review. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4c3e5b2..eb17287 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1031,10 +1031,12 @@ static void mmu_parent_walk(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct kvm_mmu_page *parent_sp; int i; - if (!sp-multimapped sp-parent_pte) { - parent_sp = page_header(__pa(sp-parent_pte)); - fn(vcpu, parent_sp); - mmu_parent_walk(vcpu, parent_sp, fn); + if (!sp-multimapped) { + if (sp-parent_pte) { + parent_sp = page_header(__pa(sp-parent_pte)); + fn(vcpu, parent_sp); + mmu_parent_walk(vcpu, parent_sp, fn); + } return; } hlist_for_each_entry(pte_chain, node,sp-parent_ptes, link) If sp-parent_pte is NULL then the list walk terminates immediately. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] KVM: SVM: Lazy fpu for npt
On Thu, Jan 07, 2010 at 02:15:44PM +0200, Avi Kivity wrote: If two conditions apply: - no bits outside TS and EM differ between the host and guest cr0 - the fpu is active then we can activate the selective cr0 write intercept and drop the unconditional cr0 read and write intercept, and allow the guest to run with the host fpu state. This reduces the heavyweight context switch when npt is enabled. - if (npt_enabled) { - int mmu_reload = 0; - if ((kvm_read_cr0_bits(vcpu, X86_CR0_PG) ^ svm-vmcb-save.cr0) - X86_CR0_PG) { - svm_set_cr0(vcpu, svm-vmcb-save.cr0); - mmu_reload = 1; - } + if (!(svm-vmcb-control.intercept_cr_write INTERCEPT_CR0_MASK)) vcpu-arch.cr0 = svm-vmcb-save.cr0; + if (npt_enabled) vcpu-arch.cr3 = svm-vmcb-save.cr3; - if (mmu_reload) { - kvm_mmu_reset_context(vcpu); - kvm_mmu_load(vcpu); - } - } - Hmm, I think removing this hack is a seperate issue. Should it be a sepearte patch which enables cr0 intercept for npt and removes these lines? It makes this change more clear in the logs. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: network shutdown under heavy load
Hi guys, it happen again (in this server I didn't patch with the fix you guys sent) but I did this so if it happen i can test with tcpdump.. seems that the guest can receive packages but can't sent... when I open a tcpdump I saw traffic coming in, but not out. Hope this helps.. also I need to know if the patch you guys sent me will be in newer versions, if not I like to know since I can't update. On 12/21/09 11:39 a.m., rek2 wrote: You say this version.. is there a newer version with this patch already apply to it? Thanks On 12/17/09 20:27 p.m., Herbert Xu wrote: On Thu, Dec 17, 2009 at 01:15:46PM -0500, rek2 wrote: I been told that today the network when down again and one of the guys here had to log using the console and restart it for that particular guests.. on the guest: uname -a Linux 2.6.27.25-170.2.72.fc10.x86_64 #1 SMP Sun Jun 21 18:39:34 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Next time it goes down I will try to run a sniffer and try both sides. OK I'm fairly sure this version has a buggy virtio-net. Does this patch (if it applies :) help? diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9eec5a5..74b3854 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -521,8 +521,10 @@ static void xmit_tasklet(unsigned long data) vi-svq-vq_ops-kick(vi-svq); vi-last_xmit_skb = NULL; } -if (vi-free_in_tasklet) +if (vi-free_in_tasklet) { free_old_xmit_skbs(vi); +netif_wake_queue(vi-dev); +} netif_tx_unlock_bh(vi-dev); } Cheers, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] make help output be a little more self-consistent
This is the part which applies to qemu-kvm. Signed-off-by: Bruce Rogers brog...@novell.com --- qemu-options.hx | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index 788d849..fdd5884 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -1938,7 +1938,7 @@ DEF(readconfig, HAS_ARG, QEMU_OPTION_readconfig, -readconfig file\n) DEF(writeconfig, HAS_ARG, QEMU_OPTION_writeconfig, -writeconfig file\n -read/write config file) +read/write config file\n) DEF(no-kvm, 0, QEMU_OPTION_no_kvm, -no-kvm disable KVM hardware virtualization\n) @@ -1947,26 +1947,27 @@ DEF(no-kvm-irqchip, 0, QEMU_OPTION_no_kvm_irqchip, DEF(no-kvm-pit, 0, QEMU_OPTION_no_kvm_pit, -no-kvm-pit disable KVM kernel mode PIT\n) DEF(no-kvm-pit-reinjection, 0, QEMU_OPTION_no_kvm_pit_reinjection, --no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection\n) +-no-kvm-pit-reinjection\n +disable KVM kernel mode PIT interrupt reinjection\n) #if defined(TARGET_I386) || defined(TARGET_X86_64) || defined(TARGET_IA64) || defined(__linux__) DEF(pcidevice, HAS_ARG, QEMU_OPTION_pcidevice, -pcidevice host=bus:dev.func[,dma=none][,name=string]\n -expose a PCI device to the guest OS.\n +expose a PCI device to the guest OS\n dma=none: don't perform any dma translations (default is to use an iommu)\n -'string' is used in log output.\n) +'string' is used in log output\n) #endif DEF(enable-nesting, 0, QEMU_OPTION_enable_nesting, -enable-nesting enable support for running a VM inside the VM (AMD only)\n) DEF(nvram, HAS_ARG, QEMU_OPTION_nvram, --nvram FILE provide ia64 nvram contents\n) +-nvram FILE provide ia64 nvram contents\n) DEF(tdf, 0, QEMU_OPTION_tdf, --tdf enable guest time drift compensation\n) +-tdfenable guest time drift compensation\n) DEF(kvm-shadow-memory, HAS_ARG, QEMU_OPTION_kvm_shadow_memory, -kvm-shadow-memory MEGABYTES\n - allocate MEGABYTES for kvm mmu shadowing\n) +allocate MEGABYTES for kvm mmu shadowing\n) DEF(mem-path, HAS_ARG, QEMU_OPTION_mempath, --mem-path FILE provide backing storage for guest RAM\n) +-mem-path FILE provide backing storage for guest RAM\n) #ifdef MAP_POPULATE DEF(mem-prealloc, 0, QEMU_OPTION_mem_prealloc, --mem-preallocpreallocate guest memory (use with -mempath)\n) +-mem-prealloc preallocate guest memory (use with -mempath)\n) #endif -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] [RESEND] make help output be a little more self-consistent
This is the part which applies to the base qemu. btw: it was sent to qemu-de...@nongnu.org yesterday.) Signed-off-by: Bruce Rogers --- qemu-options.hx | 39 --- 1 files changed, 20 insertions(+), 19 deletions(-) diff --git a/qemu-options.hx b/qemu-options.hx index ecd50eb..20b696d 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -42,7 +42,7 @@ DEF(smp, HAS_ARG, QEMU_OPTION_smp, -smp n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n set the number of CPUs to 'n' [default=1]\n maxcpus= maximum number of total cpus, including\n - offline CPUs for hotplug etc.\n +offline CPUs for hotplug, etc\n cores= number of CPU cores on one socket\n threads= number of threads on one CPU core\n sockets= number of discrete sockets in the system\n) @@ -405,8 +405,9 @@ ETEXI DEF(device, HAS_ARG, QEMU_OPTION_device, -device driver[,options] add device\n) DEF(name, HAS_ARG, QEMU_OPTION_name, --name string1[,process=string2]set the name of the guest\n -string1 sets the window title and string2 the process name (on Linux)\n) +-name string1[,process=string2]\n +set the name of the guest\n +string1 sets the window title and string2 the process name (on Linux)\n) STEXI @item -name @var{name} Sets the @var{name} of the guest. @@ -483,7 +484,7 @@ ETEXI #ifdef CONFIG_SDL DEF(ctrl-grab, 0, QEMU_OPTION_ctrl_grab, --ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) +-ctrl-grab use Right-Ctrl to grab mouse (instead of Ctrl-Alt)\n) #endif STEXI @item -ctrl-grab @@ -756,12 +757,12 @@ ETEXI #ifdef TARGET_I386 DEF(smbios, HAS_ARG, QEMU_OPTION_smbios, -smbios file=binary\n -Load SMBIOS entry from binary file\n +load SMBIOS entry from binary file\n -smbios type=0[,vendor=str][,version=str][,date=str][,release=%%d.%%d]\n -Specify SMBIOS type 0 fields\n +specify SMBIOS type 0 fields\n -smbios type=1[,manufacturer=str][,product=str][,version=str][,serial=str]\n [,uuid=uuid][,sku=str][,family=str]\n -Specify SMBIOS type 1 fields\n) +specify SMBIOS type 1 fields\n) #endif STEXI @item -smbios fi...@var{binary} @@ -816,13 +817,13 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, -net tap[,vlan=n][,name=str][,fd=h][,ifname=name][,script=file][,downscript=dfile][,sndbuf=nbytes][,vnet_hdr=on|off]\n connect the host TAP network interface to VLAN 'n' and use the\n network scripts 'file' (default=%s)\n -and 'dfile' (default=%s);\n -use '[down]script=no' to disable script execution;\n +and 'dfile' (default=%s)\n +use '[down]script=no' to disable script execution\n use 'fd=h' to connect to an already opened TAP interface\n -use 'sndbuf=nbytes' to limit the size of the send buffer; the\n -default of 'sndbuf=1048576' can be disabled using 'sndbuf=0'\n -use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag; use\n -vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n +use 'sndbuf=nbytes' to limit the size of the send buffer (the\n +default of 'sndbuf=1048576' can be disabled using 'sndbuf=0')\n +use vnet_hdr=off to avoid enabling the IFF_VNET_HDR tap flag\n +use vnet_hdr=on to make the lack of IFF_VNET_HDR support an error condition\n #endif -net socket[,vlan=n][,name=str][,fd=h][,listen=[host]:port][,connect=host:port]\n connect the vlan 'n' to another VLAN using a socket connection\n @@ -837,7 +838,7 @@ DEF(net, HAS_ARG, QEMU_OPTION_net, #endif -net dump[,vlan=n][,file=f][,len=n]\n dump traffic on vlan 'n' to file 'f' (max n bytes per packet)\n --net none use it alone to have zero network devices; if no -net option\n +-net none use it alone to have zero network devices. If no -net option\n is provided, the default is '-net nic -net user'\n) DEF(netdev, HAS_ARG, QEMU_OPTION_netdev, -netdev [ @@ -1589,7 +1590,7 @@ The default device is @code{vc} in graphical mode and @code{stdio} in non graphical mode. ETEXI DEF(qmp, HAS_ARG, QEMU_OPTION_qmp, \ --qmp devlike -monitor but opens in 'control' mode.\n) +-qmp devlike -monitor but opens in 'control' mode\n) DEF(mon, HAS_ARG, QEMU_OPTION_mon, \ -mon chardev=[name][,mode=readline|control][,default]\n) @@ -1607,7 +1608,7 @@ from a script. ETEXI DEF(singlestep,
Re: [PATCH 0/2] eventfd: new EFD_STATE flag
On Thu, 7 Jan 2010, Michael S. Tsirkin wrote: Sure, I was trying to be as brief as possible, here's a detailed summary. Description of the system (MSI emulation in KVM): KVM supports an ioctl to assign/deassign an eventfd file to interrupt message in guest OS. When this eventfd is signalled, interrupt message is sent. This assignment is done from qemu system emulator. eventfd is signalled from device emulation in another thread in userspace or from kernel, which talks with guest OS through another eventfd and shared memory (possibility of out of process was discussed but never got implemented yet). Note: it's okay to delay messages from correctness point of view, but generally this is latency-sensitive path. If multiple identical messages are requested, it's okay to send a single last message, but missing a message altogether causes deadlocks. Sending a message when none were requested might in theory cause crashes, in practice doing this causes performance degradation. Another KVM feature is interrupt masking: guest OS requests that we stop sending some interrupt message, possibly modified mapping and re-enables this message. This needs to be done without involving the device that might keep requesting events: while masked, message is marked pending, and guest might test the pending status. We can implement masking in system emulator in userspace, by using assign/deassign ioctls: when message is masked, we simply deassign all eventfd, and when it is unmasked, we assign them back. Here's some code to illustrate how this all works: assign/deassign code in kernel looks like the following: this is called to unmask interrupt static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; int ret; unsigned int events; irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); ... file = eventfd_fget(fd); if (IS_ERR(file)) { ret = PTR_ERR(file); goto fail; } eventfd = eventfd_ctx_fileget(file); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); goto fail; } irqfd-eventfd = eventfd; /* * Install our own custom wake-up handling so we are notified via * a callback whenever someone signals the underlying eventfd */ init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup); init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc); spin_lock_irq(kvm-irqfds.lock); events = file-f_op-poll(file, irqfd-pt); list_add_tail(irqfd-list, kvm-irqfds.items); spin_unlock_irq(kvm-irqfds.lock); A. /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. */ if (events POLLIN) schedule_work(irqfd-inject); /* * do not drop the file until the irqfd is fully initialized, otherwise * we might race against the POLLHUP */ fput(file); return 0; fail: ... } This is called to mask interrupt /* * shutdown any irqfd's that match fd+gsi */ static int kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi) { struct _irqfd *irqfd, *tmp; struct eventfd_ctx *eventfd; eventfd = eventfd_ctx_fdget(fd); if (IS_ERR(eventfd)) return PTR_ERR(eventfd); spin_lock_irq(kvm-irqfds.lock); list_for_each_entry_safe(irqfd, tmp, kvm-irqfds.items, list) { if (irqfd-eventfd == eventfd irqfd-gsi == gsi) irqfd_deactivate(irqfd); } spin_unlock_irq(kvm-irqfds.lock); eventfd_ctx_put(eventfd); /* * Block until we know all outstanding shutdown jobs have completed * so that we guarantee there will not be any more interrupts on this * gsi once this deassign function returns. */ flush_workqueue(irqfd_cleanup_wq); return 0; } And deactivation deep down does this (from irqfd_cleanup_wq workqueue, so this is not under the spinlock): /* * Synchronize with the wait-queue and unhook ourselves to * prevent * further events. */ B. remove_wait_queue(irqfd-wqh, irqfd-wait); /* * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); kfree(irqfd); The problems (really the same bug) in KVM that I am trying to fix: 1. Because of A above, if event was requested while message was masked, we will not miss a message. However, because we never clear the counter, so we currently get a spurious message each time we unmask. We should clear the counter either each time we deliver message,
Re: [PATCH 0/2] eventfd: new EFD_STATE flag
On Thu, 7 Jan 2010, Davide Libenzi wrote: When you unmask, shouldn't the fd be in a clear state anyway? Forget that. If it's used for IRQs you likely need to have a current state when you unmask. - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] eventfd: new EFD_STATE flag
On Thu, 7 Jan 2010, Michael S. Tsirkin wrote: Sure, I was trying to be as brief as possible, here's a detailed summary. Description of the system (MSI emulation in KVM): KVM supports an ioctl to assign/deassign an eventfd file to interrupt message in guest OS. When this eventfd is signalled, interrupt message is sent. This assignment is done from qemu system emulator. eventfd is signalled from device emulation in another thread in userspace or from kernel, which talks with guest OS through another eventfd and shared memory (possibility of out of process was discussed but never got implemented yet). Note: it's okay to delay messages from correctness point of view, but generally this is latency-sensitive path. If multiple identical messages are requested, it's okay to send a single last message, but missing a message altogether causes deadlocks. Sending a message when none were requested might in theory cause crashes, in practice doing this causes performance degradation. Another KVM feature is interrupt masking: guest OS requests that we stop sending some interrupt message, possibly modified mapping and re-enables this message. This needs to be done without involving the device that might keep requesting events: while masked, message is marked pending, and guest might test the pending status. We can implement masking in system emulator in userspace, by using assign/deassign ioctls: when message is masked, we simply deassign all eventfd, and when it is unmasked, we assign them back. Here's some code to illustrate how this all works: assign/deassign code in kernel looks like the following: this is called to unmask interrupt static int kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) { struct _irqfd *irqfd, *tmp; struct file *file = NULL; struct eventfd_ctx *eventfd = NULL; int ret; unsigned int events; irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL); ... file = eventfd_fget(fd); if (IS_ERR(file)) { ret = PTR_ERR(file); goto fail; } eventfd = eventfd_ctx_fileget(file); if (IS_ERR(eventfd)) { ret = PTR_ERR(eventfd); goto fail; } irqfd-eventfd = eventfd; /* * Install our own custom wake-up handling so we are notified via * a callback whenever someone signals the underlying eventfd */ init_waitqueue_func_entry(irqfd-wait, irqfd_wakeup); init_poll_funcptr(irqfd-pt, irqfd_ptable_queue_proc); spin_lock_irq(kvm-irqfds.lock); events = file-f_op-poll(file, irqfd-pt); list_add_tail(irqfd-list, kvm-irqfds.items); spin_unlock_irq(kvm-irqfds.lock); A. /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. */ if (events POLLIN) schedule_work(irqfd-inject); /* * do not drop the file until the irqfd is fully initialized, otherwise * we might race against the POLLHUP */ fput(file); return 0; fail: ... } What is you do (under proper irqfd locking) something like: eventfd_ctx_read(ctx, 1, cnt); if (irqfd-cnt != cnt) { irqfd-cnt = cnt; schedule_work(irqfd-inject); } And deactivation deep down does this (from irqfd_cleanup_wq workqueue, so this is not under the spinlock): /* * Synchronize with the wait-queue and unhook ourselves to * prevent * further events. */ B. remove_wait_queue(irqfd-wqh, irqfd-wait); /* * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); kfree(irqfd); And: eventfd_ctx_read(ctx, 1, irqfd-cnt); remove_wait_queue(irqfd-wqh, irqfd-wait); - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pci-stub error and MSI-X for KVM guest
* Fischer, Anna (anna.fisc...@hp.com) wrote: So, when setting a breakpoint for the exit() call I'm getting a bit closer to figuring where it kills my guest. Thanks, this helps clarify what is happening. Breakpoint 1, exit (status=1) at exit.c:99 99{ Current language: auto The current source language is auto; currently c. (gdb) bt #0 exit (status=1) at exit.c:99 #1 0x00470c6e in assigned_dev_pci_read_config (d=0x259c6f0, address=64, len=4) assigned_dev_pci_read_config(..., 64, 4) ^^ This is a libvirt issue. When you use virt-manager it has libvirtd fork/exec qemu-kvm. libvirtd will drop privileges and run qemu-kvm as user qemu (or perhaps root if you've edited qemu.conf). Regardless of the user, it clears capabilities. Reading PCI config space beyond just the header requires CAP_SYS_ADMIN. The above is reading the first 4 bytes of device dependent config space, and the kernel is returning 0 because qemu doesn't have CAP_SYS_ADMIN. Basically, this means that device assignment w/ libvirt will break MSI/MSI-X because qemu will never be able to see that the host device has those PCI capabilities. This, in turn, renders VF device assignment useless (since a VF is required to support MSI and/or MSI-X). Granting CAP_SYS_ADMIN for each qemu instance that does device assignment would render the privilege reduction useless (CAP_SYS_ADMIN is the kitchen sink catchall of the Linux capability system). Hmmph... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to reinject the corresponding flags into the guest. Signed-off-by: Alexander Graf ag...@suse.de Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/kvm/book3s.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 66b5924..02861fd 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -633,6 +633,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_PROGRAM: { enum emulation_result er; + ulong flags; + + flags = (vcpu-arch.shadow_msr 0x1full); if (vcpu-arch.msr MSR_PR) { #ifdef EXIT_DEBUG @@ -640,7 +643,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, #endif if ((vcpu-arch.last_inst 0xff0007ff) != (INS_DCBZ 0xfff7)) { - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_program(vcpu, flags); r = RESUME_GUEST; break; } @@ -655,7 +658,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case EMULATE_FAIL: printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n, __func__, vcpu-arch.pc, vcpu-arch.last_inst); - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_program(vcpu, flags); r = RESUME_GUEST; break; default: -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/9] KVM: PPC: Add helpers for CR, XER
We now have helpers for the GPRs, so let's also add some for CR and XER. Having them in the PACA simplifies code a lot, as we don't need to care about where to store CC or not to overflow any integers. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 40 arch/powerpc/kvm/44x_tlb.c |6 +++- arch/powerpc/kvm/book3s.c |8 +++--- arch/powerpc/kvm/booke.c |8 +++--- 4 files changed, 52 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index ba01b9c..d60b2f0 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -108,6 +108,26 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) return vcpu-arch.gpr[num]; } +static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) +{ + vcpu-arch.cr = val; +} + +static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.cr; +} + +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +{ + vcpu-arch.xer = val; +} + +static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.xer; +} + #else static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) @@ -120,6 +140,26 @@ static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) return vcpu-arch.gpr[num]; } +static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) +{ + vcpu-arch.cr = val; +} + +static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.cr; +} + +static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) +{ + vcpu-arch.xer = val; +} + +static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.xer; +} + #endif #endif /* __POWERPC_KVM_PPC_H__ */ diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c index 8b37736..2570fcc 100644 --- a/arch/powerpc/kvm/44x_tlb.c +++ b/arch/powerpc/kvm/44x_tlb.c @@ -506,10 +506,12 @@ int kvmppc_44x_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc) gtlb_index = kvmppc_44x_tlb_index(vcpu, ea, pid, as); if (rc) { + u32 cr = kvmppc_get_cr(vcpu); + if (gtlb_index 0) - vcpu-arch.cr = ~0x2000; + kvmppc_set_cr(vcpu, cr ~0x2000); else - vcpu-arch.cr |= 0x2000; + kvmppc_set_cr(vcpu, cr | 0x2000); } kvmppc_set_gpr(vcpu, rt, gtlb_index); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 574b24f..09ba8db 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -717,10 +717,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int i; regs-pc = vcpu-arch.pc; - regs-cr = vcpu-arch.cr; + regs-cr = kvmppc_get_cr(vcpu); regs-ctr = vcpu-arch.ctr; regs-lr = vcpu-arch.lr; - regs-xer = vcpu-arch.xer; + regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.msr; regs-srr0 = vcpu-arch.srr0; regs-srr1 = vcpu-arch.srr1; @@ -744,10 +744,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int i; vcpu-arch.pc = regs-pc; - vcpu-arch.cr = regs-cr; + kvmppc_set_cr(vcpu, regs-cr); vcpu-arch.ctr = regs-ctr; vcpu-arch.lr = regs-lr; - vcpu-arch.xer = regs-xer; + kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.srr0 = regs-srr0; vcpu-arch.srr1 = regs-srr1; diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 49af80e..338baf9 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -449,10 +449,10 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int i; regs-pc = vcpu-arch.pc; - regs-cr = vcpu-arch.cr; + regs-cr = kvmppc_get_cr(vcpu); regs-ctr = vcpu-arch.ctr; regs-lr = vcpu-arch.lr; - regs-xer = vcpu-arch.xer; + regs-xer = kvmppc_get_xer(vcpu); regs-msr = vcpu-arch.msr; regs-srr0 = vcpu-arch.srr0; regs-srr1 = vcpu-arch.srr1; @@ -476,10 +476,10 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) int i; vcpu-arch.pc = regs-pc; - vcpu-arch.cr = regs-cr; + kvmppc_set_cr(vcpu, regs-cr); vcpu-arch.ctr = regs-ctr; vcpu-arch.lr = regs-lr; - vcpu-arch.xer = regs-xer; + kvmppc_set_xer(vcpu, regs-xer); kvmppc_set_msr(vcpu, regs-msr); vcpu-arch.srr0 = regs-srr0; vcpu-arch.srr1 = regs-srr1; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo
[PATCH 0/9] KVM: PPC: Reduce races, fix code
We've been a bit lax with how we use fields in the PACA so far. Most of the time we just overwrote random fields that another interrupt handler would have used as well. That is racy. We also jumped over to real mode from IR=1 using RFI. Unfortunately, we need 3 operations to do that transitions which need to be fully atomic, as any interrupt coming in between those instructions can possibly break us. That is racy too. So let's get rid of all the racy code and clean up some pieces along the way. Alexander Graf (9): KVM: PPC: Use accessor functions for GPR access KVM: PPC: Add helpers for CR, XER KVM: PPC: Use PACA backed shadow vcpu KVM: PPC: Implement 'skip instruction' mode KVM: PPC: Get rid of unnecessary RFI KVM: PPC: Call SLB patching code in interrupt safe manner KVM: PPC: Emulate trap SRR1 flags properly KVM: PPC: Fix HID5 setting code KVM: PPC: Pass program interrupt flags to the guest arch/powerpc/include/asm/kvm_asm.h |6 + arch/powerpc/include/asm/kvm_book3s.h|4 + arch/powerpc/include/asm/kvm_book3s_64_asm.h | 18 ++ arch/powerpc/include/asm/kvm_host.h |6 +- arch/powerpc/include/asm/kvm_ppc.h | 76 - arch/powerpc/include/asm/paca.h |5 + arch/powerpc/include/asm/reg.h |4 + arch/powerpc/kernel/asm-offsets.c| 34 - arch/powerpc/kvm/44x_emulate.c | 25 ++-- arch/powerpc/kvm/44x_tlb.c | 20 ++- arch/powerpc/kvm/book3s.c| 35 +++-- arch/powerpc/kvm/book3s_64_emulate.c | 77 + arch/powerpc/kvm/book3s_64_exports.c |1 + arch/powerpc/kvm/book3s_64_interrupts.S | 242 +- arch/powerpc/kvm/book3s_64_rmhandlers.S | 85 +++--- arch/powerpc/kvm/book3s_64_slb.S | 158 +++--- arch/powerpc/kvm/booke.c | 27 ++-- arch/powerpc/kvm/booke_emulate.c | 107 ++-- arch/powerpc/kvm/e500_emulate.c | 95 ++- arch/powerpc/kvm/e500_tlb.c |4 +- arch/powerpc/kvm/emulate.c | 112 +++-- arch/powerpc/kvm/powerpc.c | 21 ++- 22 files changed, 672 insertions(+), 490 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] KVM: PPC: Implement 'skip instruction' mode
To fetch the last instruction we were interrupted on, we enable DR in early exit code, where we are still in a very transitional phase between guest and host state. Most of the time this seemed to work, but another CPU can easily flush our TLB and HTAB which makes us go in the Linux page fault handler which totally breaks because we still use the guest's SLB entries. To work around that, let's introduce a second KVM guest mode that defines that whenever we get a trap, we don't call the Linux handler or go into the KVM exit code, but just jump over the faulting instruction. That way a potentially bad lwz doesn't trigger any faults and we can later on interpret the invalid instruction we fetched as fetch didn't work. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_asm.h |6 arch/powerpc/kvm/book3s_64_rmhandlers.S | 39 ++- arch/powerpc/kvm/book3s_64_slb.S| 16 arch/powerpc/kvm/emulate.c |4 +++ 4 files changed, 59 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index af2abe7..aadf2dd 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -97,4 +97,10 @@ #define RESUME_HOST RESUME_FLAG_HOST #define RESUME_HOST_NV (RESUME_FLAG_HOST|RESUME_FLAG_NV) +#define KVM_GUEST_MODE_NONE0 +#define KVM_GUEST_MODE_GUEST 1 +#define KVM_GUEST_MODE_SKIP2 + +#define KVM_INST_FETCH_FAILED -1 + #endif /* __POWERPC_KVM_ASM_H__ */ diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S index cd9f0b6..9ad1c26 100644 --- a/arch/powerpc/kvm/book3s_64_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S @@ -49,7 +49,7 @@ kvmppc_trampoline_\intno: mfcrr12 stw r12, PACA_KVM_SCRATCH1(r13) lbz r12, PACA_KVM_IN_GUEST(r13) - cmpwi r12, 0 + cmpwi r12, KVM_GUEST_MODE_NONE bne ..kvmppc_handler_hasmagic_\intno /* No KVM guest? Then jump back to the Linux handler! */ lwz r12, PACA_KVM_SCRATCH1(r13) @@ -60,6 +60,11 @@ kvmppc_trampoline_\intno: /* Now we know we're handling a KVM guest */ ..kvmppc_handler_hasmagic_\intno: + + /* Should we just skip the faulting instruction? */ + cmpwi r12, KVM_GUEST_MODE_SKIP + beq kvmppc_handler_skip_ins + /* Let's store which interrupt we're handling */ li r12, \intno @@ -86,6 +91,38 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALTIVEC INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_VSX /* + * Bring us back to the faulting code, but skip the + * faulting instruction. + * + * This is a generic exit path from the interrupt + * trampolines above. + * + * Input Registers: + * + * R12 = free + * R13 = PACA + * PACA.KVM.SCRATCH0 = guest R12 + * PACA.KVM.SCRATCH1 = guest CR + * SPRG_SCRATCH0 = guest R13 + * + */ +kvmppc_handler_skip_ins: + + /* Patch the IP to the next instruction */ + mfsrr0 r12 + addir12, r12, 4 + mtsrr0 r12 + + /* Clean up all state */ + lwz r12, PACA_KVM_SCRATCH1(r13) + mtcrr12 + ld r12, PACA_KVM_SCRATCH0(r13) + mfspr r13, SPRN_SPRG_SCRATCH0 + + /* And get back into the code */ + RFI + +/* * This trampoline brings us back to a real mode handler * * Input Registers: diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S index 7188c11..d07b886 100644 --- a/arch/powerpc/kvm/book3s_64_slb.S +++ b/arch/powerpc/kvm/book3s_64_slb.S @@ -212,10 +212,6 @@ kvmppc_handler_trampoline_exit: mfdar r5 mfdsisr r6 - /* Unset guest state */ - li r9, 0 - stb r9, PACA_KVM_IN_GUEST(r13) - /* * In order for us to easily get the last instruction, * we got the #vmexit at, we exploit the fact that the @@ -233,18 +229,28 @@ kvmppc_handler_trampoline_exit: ld_last_inst: /* Save off the guest instruction we're at */ + + /* Set guest mode to 'jump over instruction' so if lwz faults +* we'll just continue at the next IP. */ + li r9, KVM_GUEST_MODE_SKIP + stb r9, PACA_KVM_IN_GUEST(r13) + /*1) enable paging for data */ mfmsr r9 ori r11, r9, MSR_DR /* Enable paging for data */ mtmsr r11 /*2) fetch the instruction */ - /* XXX implement PACA_KVM_IN_GUEST=2 path to safely jump over this */ + li r0, KVM_INST_FETCH_FAILED /* In case lwz faults */ lwz r0, 0(r3) /*3) disable paging again */ mtmsr r9 no_ld_last_inst: + /* Unset guest mode */ + li r9, KVM_GUEST_MODE_NONE + stb r9, PACA_KVM_IN_GUEST(r13) + /* Restore bolted entries from the shadow and fix it along the way */
[PATCH 5/9] KVM: PPC: Get rid of unnecessary RFI
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI without getting interrupted at all, because every interrupt could potentially overwrite the SRR values. Fortunately, we don't need to RFI in at least this particular case of the code, so we can just replace it with an mtmsr and b. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/book3s_64_interrupts.S | 22 +++--- 1 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S index 66e3b11..3c0ba55 100644 --- a/arch/powerpc/kvm/book3s_64_interrupts.S +++ b/arch/powerpc/kvm/book3s_64_interrupts.S @@ -221,15 +221,8 @@ no_dcbz32_off: mflrr5 std r5, VCPU_LR(r7) - /* XXX convert to safe function call */ - /* Restore host msr - SRR1 */ ld r6, VCPU_HOST_MSR(r7) - mtsrr1 r6 - - /* Restore host IP - SRR0 */ - ld r5, VCPU_HOST_RETIP(r7) - mtsrr0 r5 /* * For some interrupts, we need to call the real Linux @@ -246,8 +239,9 @@ no_dcbz32_off: cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER beq call_linux_handler - /* Back to Interruptable Mode! (goto kvm_return_point) */ - RFI + /* Back to EE=1 */ + mtmsr r6 + b kvm_return_point call_linux_handler: @@ -260,10 +254,16 @@ call_linux_handler: * interrupt handler! * * R3 still contains the exit code, -* R6 VCPU_HOST_RETIP and -* R7 VCPU_HOST_MSR +* R5 VCPU_HOST_RETIP and +* R6 VCPU_HOST_MSR */ + /* Restore host IP - SRR0 */ + ld r5, VCPU_HOST_RETIP(r7) + + /* XXX Better move to a safe function? +* What if we get an HTAB flush in between mtsrr0 and mtsrr1? */ + mtlrr12 ld r4, VCPU_TRAMPOLINE_LOWMEM(r7) -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] KVM: PPC: Fix HID5 setting code
The code to unset HID5.dcbz32 is broken. This patch makes it do the right rotate magic. Signed-off-by: Alexander Graf ag...@suse.de Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/kvm/book3s_64_interrupts.S |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S index 33aef53..2ff0b21 100644 --- a/arch/powerpc/kvm/book3s_64_interrupts.S +++ b/arch/powerpc/kvm/book3s_64_interrupts.S @@ -177,8 +177,9 @@ kvmppc_handler_highmem: rldicl. r5, r5, 0, 63 /* CR = ((r5 1) == 0) */ beq no_dcbz32_off + li r4, 0 mfspr r5,SPRN_HID5 - rldimi r5,r5,6,56 + rldimi r5,r4,6,56 mtspr SPRN_HID5,r5 no_dcbz32_off: -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/9] KVM: PPC: Use accessor functions for GPR access
All code in PPC KVM currently accesses gprs in the vcpu struct directly. While there's nothing wrong with that wrt the current way gprs are stored and loaded, it doesn't suffice for the PACA acceleration that will follow in this patchset. So let's just create little wrapper inline functions that we call whenever a GPR needs to be read from or written to. The compiled code shouldn't really change at all for now. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 26 arch/powerpc/kvm/44x_emulate.c | 25 arch/powerpc/kvm/44x_tlb.c | 14 ++-- arch/powerpc/kvm/book3s.c|8 +- arch/powerpc/kvm/book3s_64_emulate.c | 77 + arch/powerpc/kvm/booke.c | 16 +++--- arch/powerpc/kvm/booke_emulate.c | 107 +- arch/powerpc/kvm/e500_emulate.c | 95 -- arch/powerpc/kvm/e500_tlb.c |4 +- arch/powerpc/kvm/emulate.c | 106 ++--- arch/powerpc/kvm/powerpc.c | 21 --- 11 files changed, 274 insertions(+), 225 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index abfd0c4..ba01b9c 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -96,4 +96,30 @@ extern void kvmppc_booke_exit(void); extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); +#ifdef CONFIG_PPC_BOOK3S + +static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) +{ + vcpu-arch.gpr[num] = val; +} + +static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) +{ + return vcpu-arch.gpr[num]; +} + +#else + +static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) +{ + vcpu-arch.gpr[num] = val; +} + +static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) +{ + return vcpu-arch.gpr[num]; +} + +#endif + #endif /* __POWERPC_KVM_PPC_H__ */ diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c index 61af58f..0ff0d40 100644 --- a/arch/powerpc/kvm/44x_emulate.c +++ b/arch/powerpc/kvm/44x_emulate.c @@ -65,13 +65,14 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, */ switch (dcrn) { case DCRN_CPR0_CONFIG_ADDR: - vcpu-arch.gpr[rt] = vcpu-arch.cpr0_cfgaddr; + kvmppc_set_gpr(vcpu, rt, vcpu-arch.cpr0_cfgaddr); break; case DCRN_CPR0_CONFIG_DATA: local_irq_disable(); mtdcr(DCRN_CPR0_CONFIG_ADDR, vcpu-arch.cpr0_cfgaddr); - vcpu-arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA); + kvmppc_set_gpr(vcpu, rt, + mfdcr(DCRN_CPR0_CONFIG_DATA)); local_irq_enable(); break; default: @@ -93,11 +94,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, /* emulate some access in kernel */ switch (dcrn) { case DCRN_CPR0_CONFIG_ADDR: - vcpu-arch.cpr0_cfgaddr = vcpu-arch.gpr[rs]; + vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, rs); break; default: run-dcr.dcrn = dcrn; - run-dcr.data = vcpu-arch.gpr[rs]; + run-dcr.data = kvmppc_get_gpr(vcpu, rs); run-dcr.is_write = 1; vcpu-arch.dcr_needed = 1; kvmppc_account_exit(vcpu, DCR_EXITS); @@ -146,13 +147,13 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) switch (sprn) { case SPRN_PID: - kvmppc_set_pid(vcpu, vcpu-arch.gpr[rs]); break; + kvmppc_set_pid(vcpu, kvmppc_get_gpr(vcpu, rs)); break; case SPRN_MMUCR: - vcpu-arch.mmucr = vcpu-arch.gpr[rs]; break; + vcpu-arch.mmucr = kvmppc_get_gpr(vcpu, rs); break; case SPRN_CCR0: - vcpu-arch.ccr0 = vcpu-arch.gpr[rs]; break; + vcpu-arch.ccr0 = kvmppc_get_gpr(vcpu, rs); break; case SPRN_CCR1: - vcpu-arch.ccr1 = vcpu-arch.gpr[rs]; break; + vcpu-arch.ccr1 = kvmppc_get_gpr(vcpu, rs); break; default: emulated = kvmppc_booke_emulate_mtspr(vcpu, sprn, rs); } @@ -167,13 +168,13 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
[PATCH 3/9] KVM: PPC: Use PACA backed shadow vcpu
We're being horribly racy right now. All the entry and exit code hijacks random fields from the PACA that could easily be used by different code in case we get interrupted, for example by a #MC or even page fault. After discussing this with Ben, we figured it's best to reserve some more space in the PACA and just shove off some vcpu state to there. That way we can drastically improve the readability of the code, make it less racy and less complex. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h|2 + arch/powerpc/include/asm/kvm_book3s_64_asm.h | 19 +++ arch/powerpc/include/asm/kvm_host.h |5 +- arch/powerpc/include/asm/kvm_ppc.h | 20 ++- arch/powerpc/include/asm/paca.h |5 + arch/powerpc/kernel/asm-offsets.c| 35 - arch/powerpc/kvm/book3s.c|4 + arch/powerpc/kvm/book3s_64_interrupts.S | 216 +- arch/powerpc/kvm/book3s_64_rmhandlers.S | 32 +--- arch/powerpc/kvm/book3s_64_slb.S | 150 +++--- 10 files changed, 250 insertions(+), 238 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index 74b7369..f192017 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -23,6 +23,7 @@ #include linux/types.h #include linux/kvm_host.h #include asm/kvm_ppc.h +#include asm/kvm_book3s_64_asm.h struct kvmppc_slb { u64 esid; @@ -69,6 +70,7 @@ struct kvmppc_sid_map { struct kvmppc_vcpu_book3s { struct kvm_vcpu vcpu; + struct kvmppc_book3s_shadow_vcpu shadow_vcpu; struct kvmppc_sid_map sid_map[SID_MAP_NUM]; struct kvmppc_slb slb[64]; struct { diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h index 2e06ee8..fca9404 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h @@ -20,6 +20,8 @@ #ifndef __ASM_KVM_BOOK3S_ASM_H__ #define __ASM_KVM_BOOK3S_ASM_H__ +#ifdef __ASSEMBLY__ + #ifdef CONFIG_KVM_BOOK3S_64_HANDLER #include asm/kvm_asm.h @@ -55,4 +57,21 @@ kvmppc_resume_\intno: #endif /* CONFIG_KVM_BOOK3S_64_HANDLER */ +#else /*__ASSEMBLY__ */ + +struct kvmppc_book3s_shadow_vcpu { + ulong gpr[14]; + u32 cr; + u32 xer; + ulong host_r1; + ulong host_r2; + ulong handler; + ulong scratch0; + ulong scratch1; + ulong vmhandler; + ulong rmhandler; +}; + +#endif /*__ASSEMBLY__ */ + #endif /* __ASM_KVM_BOOK3S_ASM_H__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 1201f62..d615fa8 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -175,10 +175,13 @@ struct kvm_vcpu_arch { ulong gpr[32]; ulong pc; - u32 cr; ulong ctr; ulong lr; + +#ifdef CONFIG_BOOKE ulong xer; + u32 cr; +#endif ulong msr; #ifdef CONFIG_PPC64 diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index d60b2f0..89c5d79 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -98,34 +98,42 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); #ifdef CONFIG_PPC_BOOK3S +/* We assume we're always acting on the current vcpu */ + static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) { - vcpu-arch.gpr[num] = val; + if ( num 14 ) + get_paca()-shadow_vcpu.gpr[num] = val; + else + vcpu-arch.gpr[num] = val; } static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) { - return vcpu-arch.gpr[num]; + if ( num 14 ) + return get_paca()-shadow_vcpu.gpr[num]; + else + return vcpu-arch.gpr[num]; } static inline void kvmppc_set_cr(struct kvm_vcpu *vcpu, u32 val) { - vcpu-arch.cr = val; + get_paca()-shadow_vcpu.cr = val; } static inline u32 kvmppc_get_cr(struct kvm_vcpu *vcpu) { - return vcpu-arch.cr; + return get_paca()-shadow_vcpu.cr; } static inline void kvmppc_set_xer(struct kvm_vcpu *vcpu, u32 val) { - vcpu-arch.xer = val; + get_paca()-shadow_vcpu.xer = val; } static inline u32 kvmppc_get_xer(struct kvm_vcpu *vcpu) { - return vcpu-arch.xer; + return get_paca()-shadow_vcpu.xer; } #else diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 5e9b4ef..d8a6931 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -19,6 +19,9 @@ #include asm/mmu.h #include asm/page.h #include asm/exception-64e.h +#ifdef CONFIG_KVM_BOOK3S_64_HANDLER +#include asm/kvm_book3s_64_asm.h +#endif register struct paca_struct *local_paca asm(r13); @@ -135,6 +138,8 @@ struct paca_struct {
[PATCH 6/9] KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from the module memory entry code to the real mode SLB switching code. To work around that I took a look at the RTAS entry code which is faced with a similar problem and did the same thing: A small helper in linear mapped memory that does mtmsr with IR=0 and then RFIs info the actual handler. Thanks to that trick we can safely take page faults in the entry code and only need to be really wary of what to do as of the SLB switching part. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h|1 + arch/powerpc/include/asm/kvm_book3s_64_asm.h |1 - arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kernel/asm-offsets.c|3 +-- arch/powerpc/kvm/book3s.c|1 + arch/powerpc/kvm/book3s_64_exports.c |1 + arch/powerpc/kvm/book3s_64_interrupts.S | 25 +++-- arch/powerpc/kvm/book3s_64_rmhandlers.S | 18 ++ arch/powerpc/kvm/book3s_64_slb.S |4 9 files changed, 34 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index f192017..c91be0f 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -121,6 +121,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern u32 kvmppc_trampoline_lowmem; extern u32 kvmppc_trampoline_enter; +extern void kvmppc_rmcall(ulong srr0, ulong srr1); static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h index fca9404..183461b 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h @@ -69,7 +69,6 @@ struct kvmppc_book3s_shadow_vcpu { ulong scratch0; ulong scratch1; ulong vmhandler; - ulong rmhandler; }; #endif /*__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d615fa8..f7215e6 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -167,6 +167,7 @@ struct kvm_vcpu_arch { ulong trampoline_lowmem; ulong trampoline_enter; ulong highmem_handler; + ulong rmcall; ulong host_paca_phys; struct kvmppc_mmu mmu; #endif diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 03b4fcd..be90ced 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -214,8 +214,6 @@ int main(void) DEFINE(PACA_KVM_HOST_R2, offsetof(struct paca_struct, shadow_vcpu.host_r2)); DEFINE(PACA_KVM_VMHANDLER, offsetof(struct paca_struct, shadow_vcpu.vmhandler)); - DEFINE(PACA_KVM_RMHANDLER, offsetof(struct paca_struct, - shadow_vcpu.rmhandler)); DEFINE(PACA_KVM_SCRATCH0, offsetof(struct paca_struct, shadow_vcpu.scratch0)); DEFINE(PACA_KVM_SCRATCH1, offsetof(struct paca_struct, @@ -437,6 +435,7 @@ int main(void) DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem)); DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter)); DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler)); + DEFINE(VCPU_RMCALL, offsetof(struct kvm_vcpu, arch.rmcall)); DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags)); DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); #else diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 3e06eae..1317392 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -919,6 +919,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) vcpu-arch.trampoline_lowmem = kvmppc_trampoline_lowmem; vcpu-arch.trampoline_enter = kvmppc_trampoline_enter; vcpu-arch.highmem_handler = (ulong)kvmppc_handler_highmem; + vcpu-arch.rmcall = *(ulong*)kvmppc_rmcall; vcpu-arch.shadow_msr = MSR_USER64; diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c index 5b2db38..99b0712 100644 --- a/arch/powerpc/kvm/book3s_64_exports.c +++ b/arch/powerpc/kvm/book3s_64_exports.c @@ -22,3 +22,4 @@ EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter); EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem); +EXPORT_SYMBOL_GPL(kvmppc_rmcall); diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S index 3c0ba55..33aef53 100644 --- a/arch/powerpc/kvm/book3s_64_interrupts.S +++ b/arch/powerpc/kvm/book3s_64_interrupts.S @@ -95,17 +95,14 @@ kvm_start_entry: ld r3,
[PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt. One such example is the trap instruction. It tells the guest kernel that a program interrupt is due to a trap using a bit in SRR1. This patch implements above behavior, making WARN_ON behave like WARN_ON. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h |1 + arch/powerpc/include/asm/kvm_ppc.h|2 +- arch/powerpc/include/asm/reg.h|4 arch/powerpc/kvm/book3s.c |7 +-- arch/powerpc/kvm/booke.c |3 ++- arch/powerpc/kvm/emulate.c|2 +- 6 files changed, 14 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index c91be0f..79ab8fa 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -91,6 +91,7 @@ struct kvmppc_vcpu_book3s { u64 vsid_next; u64 vsid_max; int context_id; + ulong prog_flags; /* flags to inject when giving a 700 trap */ }; #define CONTEXT_HOST 0 diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 89c5d79..09816da 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -80,7 +80,7 @@ extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu); extern void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu); extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu); -extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu); +extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags); extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_dequeue_dec(struct kvm_vcpu *vcpu); extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu, diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index bc8dd53..5572e86 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -426,6 +426,10 @@ #define SRR1_WAKEMT 0x0028 /* mtctrl */ #define SRR1_WAKEDEC 0x0018 /* Decrementer interrupt */ #define SRR1_WAKETHERM 0x0010 /* Thermal management interrupt */ +#define SRR1_PROGFPE 0x0010 /* Floating Point Enabled */ +#define SRR1_PROGPRIV0x0004 /* Privileged instruction */ +#define SRR1_PROGTRAP0x0002 /* Trap */ +#define SRR1_PROGADDR0x0001 /* SRR0 contains subsequent addr */ #define SPRN_HSRR0 0x13A /* Save/Restore Register 0 */ #define SPRN_HSRR1 0x13B /* Save/Restore Register 1 */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 1317392..66b5924 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -168,8 +168,9 @@ void kvmppc_book3s_queue_irqprio(struct kvm_vcpu *vcpu, unsigned int vec) } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + to_book3s(vcpu)-prog_flags = flags; kvmppc_book3s_queue_irqprio(vcpu, BOOK3S_INTERRUPT_PROGRAM); } @@ -198,6 +199,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) { int deliver = 1; int vec = 0; + ulong flags = 0ULL; switch (priority) { case BOOK3S_IRQPRIO_DECREMENTER: @@ -231,6 +233,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) break; case BOOK3S_IRQPRIO_PROGRAM: vec = BOOK3S_INTERRUPT_PROGRAM; + flags = to_book3s(vcpu)-prog_flags; break; case BOOK3S_IRQPRIO_VSX: vec = BOOK3S_INTERRUPT_VSX; @@ -261,7 +264,7 @@ int kvmppc_book3s_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) #endif if (deliver) - kvmppc_inject_interrupt(vcpu, vec, 0ULL); + kvmppc_inject_interrupt(vcpu, vec, flags); return deliver; } diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 338baf9..e283e44 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + /* BookE does flags in ESR, so ignore those we get here */ kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 04e317c..8b0ba0b 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -154,7 +154,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) #else vcpu-arch.esr |= ESR_PTR; #endif -
[PATCH 9/9] KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to reinject the corresponding flags into the guest. Signed-off-by: Alexander Graf ag...@suse.de Reported-by: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc/kvm/book3s.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 66b5924..02861fd 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -633,6 +633,9 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case BOOK3S_INTERRUPT_PROGRAM: { enum emulation_result er; + ulong flags; + + flags = (vcpu-arch.shadow_msr 0x1full); if (vcpu-arch.msr MSR_PR) { #ifdef EXIT_DEBUG @@ -640,7 +643,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, #endif if ((vcpu-arch.last_inst 0xff0007ff) != (INS_DCBZ 0xfff7)) { - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_program(vcpu, flags); r = RESUME_GUEST; break; } @@ -655,7 +658,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, case EMULATE_FAIL: printk(KERN_CRIT %s: emulation at %lx failed (%08x)\n, __func__, vcpu-arch.pc, vcpu-arch.last_inst); - kvmppc_book3s_queue_irqprio(vcpu, exit_nr); + kvmppc_core_queue_program(vcpu, flags); r = RESUME_GUEST; break; default: -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/9] KVM: PPC: Implement 'skip instruction' mode
To fetch the last instruction we were interrupted on, we enable DR in early exit code, where we are still in a very transitional phase between guest and host state. Most of the time this seemed to work, but another CPU can easily flush our TLB and HTAB which makes us go in the Linux page fault handler which totally breaks because we still use the guest's SLB entries. To work around that, let's introduce a second KVM guest mode that defines that whenever we get a trap, we don't call the Linux handler or go into the KVM exit code, but just jump over the faulting instruction. That way a potentially bad lwz doesn't trigger any faults and we can later on interpret the invalid instruction we fetched as fetch didn't work. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_asm.h |6 arch/powerpc/kvm/book3s_64_rmhandlers.S | 39 ++- arch/powerpc/kvm/book3s_64_slb.S| 16 arch/powerpc/kvm/emulate.c |4 +++ 4 files changed, 59 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index af2abe7..aadf2dd 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -97,4 +97,10 @@ #define RESUME_HOST RESUME_FLAG_HOST #define RESUME_HOST_NV (RESUME_FLAG_HOST|RESUME_FLAG_NV) +#define KVM_GUEST_MODE_NONE0 +#define KVM_GUEST_MODE_GUEST 1 +#define KVM_GUEST_MODE_SKIP2 + +#define KVM_INST_FETCH_FAILED -1 + #endif /* __POWERPC_KVM_ASM_H__ */ diff --git a/arch/powerpc/kvm/book3s_64_rmhandlers.S b/arch/powerpc/kvm/book3s_64_rmhandlers.S index cd9f0b6..9ad1c26 100644 --- a/arch/powerpc/kvm/book3s_64_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_64_rmhandlers.S @@ -49,7 +49,7 @@ kvmppc_trampoline_\intno: mfcrr12 stw r12, PACA_KVM_SCRATCH1(r13) lbz r12, PACA_KVM_IN_GUEST(r13) - cmpwi r12, 0 + cmpwi r12, KVM_GUEST_MODE_NONE bne ..kvmppc_handler_hasmagic_\intno /* No KVM guest? Then jump back to the Linux handler! */ lwz r12, PACA_KVM_SCRATCH1(r13) @@ -60,6 +60,11 @@ kvmppc_trampoline_\intno: /* Now we know we're handling a KVM guest */ ..kvmppc_handler_hasmagic_\intno: + + /* Should we just skip the faulting instruction? */ + cmpwi r12, KVM_GUEST_MODE_SKIP + beq kvmppc_handler_skip_ins + /* Let's store which interrupt we're handling */ li r12, \intno @@ -86,6 +91,38 @@ INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_ALTIVEC INTERRUPT_TRAMPOLINE BOOK3S_INTERRUPT_VSX /* + * Bring us back to the faulting code, but skip the + * faulting instruction. + * + * This is a generic exit path from the interrupt + * trampolines above. + * + * Input Registers: + * + * R12 = free + * R13 = PACA + * PACA.KVM.SCRATCH0 = guest R12 + * PACA.KVM.SCRATCH1 = guest CR + * SPRG_SCRATCH0 = guest R13 + * + */ +kvmppc_handler_skip_ins: + + /* Patch the IP to the next instruction */ + mfsrr0 r12 + addir12, r12, 4 + mtsrr0 r12 + + /* Clean up all state */ + lwz r12, PACA_KVM_SCRATCH1(r13) + mtcrr12 + ld r12, PACA_KVM_SCRATCH0(r13) + mfspr r13, SPRN_SPRG_SCRATCH0 + + /* And get back into the code */ + RFI + +/* * This trampoline brings us back to a real mode handler * * Input Registers: diff --git a/arch/powerpc/kvm/book3s_64_slb.S b/arch/powerpc/kvm/book3s_64_slb.S index 7188c11..d07b886 100644 --- a/arch/powerpc/kvm/book3s_64_slb.S +++ b/arch/powerpc/kvm/book3s_64_slb.S @@ -212,10 +212,6 @@ kvmppc_handler_trampoline_exit: mfdar r5 mfdsisr r6 - /* Unset guest state */ - li r9, 0 - stb r9, PACA_KVM_IN_GUEST(r13) - /* * In order for us to easily get the last instruction, * we got the #vmexit at, we exploit the fact that the @@ -233,18 +229,28 @@ kvmppc_handler_trampoline_exit: ld_last_inst: /* Save off the guest instruction we're at */ + + /* Set guest mode to 'jump over instruction' so if lwz faults +* we'll just continue at the next IP. */ + li r9, KVM_GUEST_MODE_SKIP + stb r9, PACA_KVM_IN_GUEST(r13) + /*1) enable paging for data */ mfmsr r9 ori r11, r9, MSR_DR /* Enable paging for data */ mtmsr r11 /*2) fetch the instruction */ - /* XXX implement PACA_KVM_IN_GUEST=2 path to safely jump over this */ + li r0, KVM_INST_FETCH_FAILED /* In case lwz faults */ lwz r0, 0(r3) /*3) disable paging again */ mtmsr r9 no_ld_last_inst: + /* Unset guest mode */ + li r9, KVM_GUEST_MODE_NONE + stb r9, PACA_KVM_IN_GUEST(r13) + /* Restore bolted entries from the shadow and fix it along the way */
[PATCH 1/9] KVM: PPC: Use accessor functions for GPR access
All code in PPC KVM currently accesses gprs in the vcpu struct directly. While there's nothing wrong with that wrt the current way gprs are stored and loaded, it doesn't suffice for the PACA acceleration that will follow in this patchset. So let's just create little wrapper inline functions that we call whenever a GPR needs to be read from or written to. The compiled code shouldn't really change at all for now. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_ppc.h | 26 arch/powerpc/kvm/44x_emulate.c | 25 arch/powerpc/kvm/44x_tlb.c | 14 ++-- arch/powerpc/kvm/book3s.c|8 +- arch/powerpc/kvm/book3s_64_emulate.c | 77 + arch/powerpc/kvm/booke.c | 16 +++--- arch/powerpc/kvm/booke_emulate.c | 107 +- arch/powerpc/kvm/e500_emulate.c | 95 -- arch/powerpc/kvm/e500_tlb.c |4 +- arch/powerpc/kvm/emulate.c | 106 ++--- arch/powerpc/kvm/powerpc.c | 21 --- 11 files changed, 274 insertions(+), 225 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index abfd0c4..ba01b9c 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -96,4 +96,30 @@ extern void kvmppc_booke_exit(void); extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu); +#ifdef CONFIG_PPC_BOOK3S + +static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) +{ + vcpu-arch.gpr[num] = val; +} + +static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) +{ + return vcpu-arch.gpr[num]; +} + +#else + +static inline void kvmppc_set_gpr(struct kvm_vcpu *vcpu, int num, ulong val) +{ + vcpu-arch.gpr[num] = val; +} + +static inline ulong kvmppc_get_gpr(struct kvm_vcpu *vcpu, int num) +{ + return vcpu-arch.gpr[num]; +} + +#endif + #endif /* __POWERPC_KVM_PPC_H__ */ diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c index 61af58f..0ff0d40 100644 --- a/arch/powerpc/kvm/44x_emulate.c +++ b/arch/powerpc/kvm/44x_emulate.c @@ -65,13 +65,14 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, */ switch (dcrn) { case DCRN_CPR0_CONFIG_ADDR: - vcpu-arch.gpr[rt] = vcpu-arch.cpr0_cfgaddr; + kvmppc_set_gpr(vcpu, rt, vcpu-arch.cpr0_cfgaddr); break; case DCRN_CPR0_CONFIG_DATA: local_irq_disable(); mtdcr(DCRN_CPR0_CONFIG_ADDR, vcpu-arch.cpr0_cfgaddr); - vcpu-arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA); + kvmppc_set_gpr(vcpu, rt, + mfdcr(DCRN_CPR0_CONFIG_DATA)); local_irq_enable(); break; default: @@ -93,11 +94,11 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, /* emulate some access in kernel */ switch (dcrn) { case DCRN_CPR0_CONFIG_ADDR: - vcpu-arch.cpr0_cfgaddr = vcpu-arch.gpr[rs]; + vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, rs); break; default: run-dcr.dcrn = dcrn; - run-dcr.data = vcpu-arch.gpr[rs]; + run-dcr.data = kvmppc_get_gpr(vcpu, rs); run-dcr.is_write = 1; vcpu-arch.dcr_needed = 1; kvmppc_account_exit(vcpu, DCR_EXITS); @@ -146,13 +147,13 @@ int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs) switch (sprn) { case SPRN_PID: - kvmppc_set_pid(vcpu, vcpu-arch.gpr[rs]); break; + kvmppc_set_pid(vcpu, kvmppc_get_gpr(vcpu, rs)); break; case SPRN_MMUCR: - vcpu-arch.mmucr = vcpu-arch.gpr[rs]; break; + vcpu-arch.mmucr = kvmppc_get_gpr(vcpu, rs); break; case SPRN_CCR0: - vcpu-arch.ccr0 = vcpu-arch.gpr[rs]; break; + vcpu-arch.ccr0 = kvmppc_get_gpr(vcpu, rs); break; case SPRN_CCR1: - vcpu-arch.ccr1 = vcpu-arch.gpr[rs]; break; + vcpu-arch.ccr1 = kvmppc_get_gpr(vcpu, rs); break; default: emulated = kvmppc_booke_emulate_mtspr(vcpu, sprn, rs); } @@ -167,13 +168,13 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
[PATCH 6/9] KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from the module memory entry code to the real mode SLB switching code. To work around that I took a look at the RTAS entry code which is faced with a similar problem and did the same thing: A small helper in linear mapped memory that does mtmsr with IR=0 and then RFIs info the actual handler. Thanks to that trick we can safely take page faults in the entry code and only need to be really wary of what to do as of the SLB switching part. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/include/asm/kvm_book3s.h|1 + arch/powerpc/include/asm/kvm_book3s_64_asm.h |1 - arch/powerpc/include/asm/kvm_host.h |1 + arch/powerpc/kernel/asm-offsets.c|3 +-- arch/powerpc/kvm/book3s.c|1 + arch/powerpc/kvm/book3s_64_exports.c |1 + arch/powerpc/kvm/book3s_64_interrupts.S | 25 +++-- arch/powerpc/kvm/book3s_64_rmhandlers.S | 18 ++ arch/powerpc/kvm/book3s_64_slb.S |4 9 files changed, 34 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index f192017..c91be0f 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -121,6 +121,7 @@ extern void kvmppc_set_bat(struct kvm_vcpu *vcpu, struct kvmppc_bat *bat, extern u32 kvmppc_trampoline_lowmem; extern u32 kvmppc_trampoline_enter; +extern void kvmppc_rmcall(ulong srr0, ulong srr1); static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) { diff --git a/arch/powerpc/include/asm/kvm_book3s_64_asm.h b/arch/powerpc/include/asm/kvm_book3s_64_asm.h index fca9404..183461b 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64_asm.h +++ b/arch/powerpc/include/asm/kvm_book3s_64_asm.h @@ -69,7 +69,6 @@ struct kvmppc_book3s_shadow_vcpu { ulong scratch0; ulong scratch1; ulong vmhandler; - ulong rmhandler; }; #endif /*__ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index d615fa8..f7215e6 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -167,6 +167,7 @@ struct kvm_vcpu_arch { ulong trampoline_lowmem; ulong trampoline_enter; ulong highmem_handler; + ulong rmcall; ulong host_paca_phys; struct kvmppc_mmu mmu; #endif diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 03b4fcd..be90ced 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -214,8 +214,6 @@ int main(void) DEFINE(PACA_KVM_HOST_R2, offsetof(struct paca_struct, shadow_vcpu.host_r2)); DEFINE(PACA_KVM_VMHANDLER, offsetof(struct paca_struct, shadow_vcpu.vmhandler)); - DEFINE(PACA_KVM_RMHANDLER, offsetof(struct paca_struct, - shadow_vcpu.rmhandler)); DEFINE(PACA_KVM_SCRATCH0, offsetof(struct paca_struct, shadow_vcpu.scratch0)); DEFINE(PACA_KVM_SCRATCH1, offsetof(struct paca_struct, @@ -437,6 +435,7 @@ int main(void) DEFINE(VCPU_TRAMPOLINE_LOWMEM, offsetof(struct kvm_vcpu, arch.trampoline_lowmem)); DEFINE(VCPU_TRAMPOLINE_ENTER, offsetof(struct kvm_vcpu, arch.trampoline_enter)); DEFINE(VCPU_HIGHMEM_HANDLER, offsetof(struct kvm_vcpu, arch.highmem_handler)); + DEFINE(VCPU_RMCALL, offsetof(struct kvm_vcpu, arch.rmcall)); DEFINE(VCPU_HFLAGS, offsetof(struct kvm_vcpu, arch.hflags)); DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr)); #else diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 3e06eae..1317392 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -919,6 +919,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id) vcpu-arch.trampoline_lowmem = kvmppc_trampoline_lowmem; vcpu-arch.trampoline_enter = kvmppc_trampoline_enter; vcpu-arch.highmem_handler = (ulong)kvmppc_handler_highmem; + vcpu-arch.rmcall = *(ulong*)kvmppc_rmcall; vcpu-arch.shadow_msr = MSR_USER64; diff --git a/arch/powerpc/kvm/book3s_64_exports.c b/arch/powerpc/kvm/book3s_64_exports.c index 5b2db38..99b0712 100644 --- a/arch/powerpc/kvm/book3s_64_exports.c +++ b/arch/powerpc/kvm/book3s_64_exports.c @@ -22,3 +22,4 @@ EXPORT_SYMBOL_GPL(kvmppc_trampoline_enter); EXPORT_SYMBOL_GPL(kvmppc_trampoline_lowmem); +EXPORT_SYMBOL_GPL(kvmppc_rmcall); diff --git a/arch/powerpc/kvm/book3s_64_interrupts.S b/arch/powerpc/kvm/book3s_64_interrupts.S index 3c0ba55..33aef53 100644 --- a/arch/powerpc/kvm/book3s_64_interrupts.S +++ b/arch/powerpc/kvm/book3s_64_interrupts.S @@ -95,17 +95,14 @@ kvm_start_entry: ld r3,