RE: Installation of Windows 8 hangs with KVM
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stefan Pietsch Sent: Monday, January 07, 2013 2:25 AM To: Gleb Natapov Cc: kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM * Gleb Natapov g...@redhat.com [2013-01-06 11:11]: On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote: Hi all, when I run KVM with this command the Windows 8 installation stops with error code 0x005D: kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso After adding the option -cpu host the installation proceeds to a black screen and hangs. With Virtualbox the installation succeeds. The host CPU is an Intel Core Duo L2400. Do you have any suggestions? What is your kernel/qemu version? I'm using Debian unstable. qemu-kvm 1.1.2+dfsg-3 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2 you met issue only for 32bit Win8 (not 64 bit Win8), right? I think it's the same issue as the below bug I reported. https://bugs.launchpad.net/qemu/+bug/1007269 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line. This should be a known issue which is caused by missing 'SEP' CPU flag. See another bug in Redhat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=821741 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Installation of Windows 8 hangs with KVM
On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stefan Pietsch Sent: Monday, January 07, 2013 2:25 AM To: Gleb Natapov Cc: kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM * Gleb Natapov g...@redhat.com [2013-01-06 11:11]: On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote: Hi all, when I run KVM with this command the Windows 8 installation stops with error code 0x005D: kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso After adding the option -cpu host the installation proceeds to a black screen and hangs. With Virtualbox the installation succeeds. The host CPU is an Intel Core Duo L2400. Do you have any suggestions? What is your kernel/qemu version? I'm using Debian unstable. qemu-kvm 1.1.2+dfsg-3 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2 you met issue only for 32bit Win8 (not 64 bit Win8), right? I think it's the same issue as the below bug I reported. https://bugs.launchpad.net/qemu/+bug/1007269 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line. This should be a known issue which is caused by missing 'SEP' CPU flag. See another bug in Redhat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=821741 That was RHEL kernel bug. Doubt Debian one has it. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)
On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote: What's the different with below combos? The difference is historical, it's just how the command-line options evolved over time. 1)qemu --enable-kvm The old way. Still useful because it's slightly easier to type than --machine accel=kvm. 2)qemu accel=kvm The modern way. 3)qemu without above parameters when kvm kmod has been load There is a difference in behavior between QEMU and qemu-kvm here: QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko module has been loaded or not. qemu-kvm uses KVM by default, if available. The qemu-kvm fork has been retired so it's best not to rely on this behavior. Future distro packages will be built from QEMU and unless a code change is made, the default accelerator is TCG. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what's the different for qemu --eanble-kvm and accel=kvm and qemu(when kvm kmod load)
On Mon, Jan 7, 2013 at 4:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Jan 6, 2013 at 12:27 PM, lei yang yanglei.f...@gmail.com wrote: What's the different with below combos? The difference is historical, it's just how the command-line options evolved over time. 1)qemu --enable-kvm The old way. Still useful because it's slightly easier to type than --machine accel=kvm. 2)qemu accel=kvm The modern way. 3)qemu without above parameters when kvm kmod has been load There is a difference in behavior between QEMU and qemu-kvm here: QEMU uses TCG and not KVM by default, regardless of whether the kvm.ko module has been loaded or not. qemu-kvm uses KVM by default, if available. The qemu-kvm fork has been retired so it's best not to rely on this behavior. Future distro packages will be built from QEMU and unless a code change is made, the default accelerator is TCG. Thanks fro the explain So if we want use kvm we need to explicitly add --enable-kvm or accel=kvm regardless kvm.ko load or not How can we check we are using TCG or KVM, can we check this in guestos or check this with monitor can you show me the exactly command? Lei Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Installation of Windows 8 hangs with KVM
-Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Gleb Natapov Sent: Monday, January 07, 2013 4:54 PM To: Ren, Yongjie Cc: Stefan Pietsch; kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stefan Pietsch Sent: Monday, January 07, 2013 2:25 AM To: Gleb Natapov Cc: kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM * Gleb Natapov g...@redhat.com [2013-01-06 11:11]: On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote: Hi all, when I run KVM with this command the Windows 8 installation stops with error code 0x005D: kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso After adding the option -cpu host the installation proceeds to a black screen and hangs. With Virtualbox the installation succeeds. The host CPU is an Intel Core Duo L2400. Do you have any suggestions? What is your kernel/qemu version? I'm using Debian unstable. qemu-kvm 1.1.2+dfsg-3 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2 you met issue only for 32bit Win8 (not 64 bit Win8), right? I think it's the same issue as the below bug I reported. https://bugs.launchpad.net/qemu/+bug/1007269 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line. This should be a known issue which is caused by missing 'SEP' CPU flag. See another bug in Redhat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=821741 That was RHEL kernel bug. Doubt Debian one has it. I don't think so. It should be a qemu bug (also described in that RHEL bugzilla). In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu SandyBridge,+sep' in qemu-kvm CLI. But it can't boot up with '-cpu SandyBridge'. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high
On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote: Reading the spec, it is clear that most modes normally leave the IRQ output line high, and only pulse it low to generate a leading edge. Especially the most commonly used mode 2. The KVM i8254 model does not try to emulate the duration of the pulse at all, so just swap the high/low settings it to leave it high most of the time. This fix is a prerequisite to improving the i8259 model to handle the trailing edge of an interupt request as indicated in its spec: If it gets a trailing edge of an IRQ line before it starts to service the interrupt, the request should be canceled. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Risks: There is a risk that migrating a running guest between versions with and without this patch will lose or gain a single timer interrupt during the migration process. The only case where Can you elaborate on how exactly this can happen? Do not see it. this is likely to be serious is probably losing a single-shot (mode 4) interrupt, but if my understanding of how things work is good, then that should only be possible if a whole slew of conditions are all met: 1. The guest is configured to run in a tickless mode (like modern Linux). 2. The guest is for some reason still using the i8254 rather than something more modern like an HPET. (The combination of 1 and 2 should be rare.) This is not so rare. For performance reason it is better to not have HPET at all. In fact -no-hpet is how I would advice anyone to run qemu. 3. The migration is going from a fixed version back to the old version. (Not sure how common this is, but it should be rarer than migrating from old to new.) 4. There are not going to be any timely events/interrupts (keyboard, network, process sleeps, etc) that cause the guest to reset the PIT mode 4 one-shot counter soon enough. This combination should be rare enough that more complicated solutions are not worth the effort. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index c1d30b2..cd4ec60 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work) } spin_unlock(ps-inject_lock); if (inject) { - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + /* Clear previous interrupt, then create a rising + * edge to request another interupt, and leave it at + * level=1 until time to inject another one. + */ kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); /* * Provides NMI watchdog support via Virtual Wire mode. -- 1.7.10.2.484.gcd07cc5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Installation of Windows 8 hangs with KVM
On Mon, Jan 07, 2013 at 09:13:37AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Gleb Natapov Sent: Monday, January 07, 2013 4:54 PM To: Ren, Yongjie Cc: Stefan Pietsch; kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM On Mon, Jan 07, 2013 at 08:38:59AM +, Ren, Yongjie wrote: -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Stefan Pietsch Sent: Monday, January 07, 2013 2:25 AM To: Gleb Natapov Cc: kvm@vger.kernel.org Subject: Re: Installation of Windows 8 hangs with KVM * Gleb Natapov g...@redhat.com [2013-01-06 11:11]: On Fri, Jan 04, 2013 at 10:58:33PM +0100, Stefan Pietsch wrote: Hi all, when I run KVM with this command the Windows 8 installation stops with error code 0x005D: kvm -m 1024 -hda win8.img -cdrom windows_8_x86.iso After adding the option -cpu host the installation proceeds to a black screen and hangs. With Virtualbox the installation succeeds. The host CPU is an Intel Core Duo L2400. Do you have any suggestions? What is your kernel/qemu version? I'm using Debian unstable. qemu-kvm 1.1.2+dfsg-3 Linux version 3.2.0-4-686-pae (debian-ker...@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2 you met issue only for 32bit Win8 (not 64 bit Win8), right? I think it's the same issue as the below bug I reported. https://bugs.launchpad.net/qemu/+bug/1007269 You can try with '-cpu coreduo' or '-cpu core2duo' in qemu-kvm command line. This should be a known issue which is caused by missing 'SEP' CPU flag. See another bug in Redhat bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=821741 That was RHEL kernel bug. Doubt Debian one has it. I don't think so. It should be a qemu bug (also described in that RHEL bugzilla). https://bugzilla.redhat.com/show_bug.cgi?id=821463 is the kernel one. In my SandyBridge platform, 32bit Win8 guest can boot up with '-cpu SandyBridge,+sep' in qemu-kvm CLI. But it can't boot up with '-cpu SandyBridge'. Which qemu version? Master has sep in SandyBridge definition. In any case -cpu host should have sep enabled. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} +#endif } void host_cpuid(uint32_t function, uint32_t count, -- 1.7.11.7 -- Gleb. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} +#endif } void host_cpuid(uint32_t function, uint32_t count, -- 1.7.11.7 -- Gleb. -- Eduardo -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 02/11] target-i386: Disable kvm_mmu_op by default on pc-1.4
On Sun, Jan 06, 2013 at 03:38:28PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:03PM -0200, Eduardo Habkost wrote: The kvm_mmu_op feature was removed from the kernel since v3.3 (released in March 2012), it was marked for removal since January 2011 and it's slower than shadow or hardware assisted paging (see kernel commit fb92045843). It doesn't make sense to keep it enabled by default. Actually it was effectively removed Oct 1 2009 by a68a6a7282373. After 3 and a half years of not having it I think we can safely drop it without trying to preserve it in older machine types. Agreed. Especially considering that the check/enforce code for KVM flags is currently broken. So probably people using pc-1.0, pc-1.1, pc-1.2 are probably _not_ getting the kvm_mmu feature exposed to the guest. Also, keeping it enabled by default would cause unnecessary hassle when libvirt start using the enforce option. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com I was planning to reverse the logic of the compat init functions and make pc_init_pci_1_3() enable kvm_mmu_op and then call pc_init_pci_1_4() instead. But that would require changing pc_init_pci_no_kvmclock() and pc_init_isa() as well. So to keep the changes simple, I am keeping the pattern used when pc_init_pci_1_3() was introduced, making pc_init_pci_1_4() disable kvm_mmu_op and then call pc_init_pci_1_3(). Changes v2: - Coding style fix - Removed redundant comments above machine init functions --- hw/pc_piix.c | 9 - target-i386/cpu.c | 9 + target-i386/cpu.h | 1 + 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 99747a7..a32af6a 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -217,6 +217,7 @@ static void pc_init1(MemoryRegion *system_memory, } } +/* machine init function for pc-0.14 - pc-1.2 */ static void pc_init_pci(QEMUMachineInitArgs *args) { ram_addr_t ram_size = args-ram_size; @@ -238,6 +239,12 @@ static void pc_init_pci_1_3(QEMUMachineInitArgs *args) pc_init_pci(args); } +static void pc_init_pci_1_4(QEMUMachineInitArgs *args) +{ +disable_kvm_mmu_op(); +pc_init_pci_1_3(args); +} + static void pc_init_pci_no_kvmclock(QEMUMachineInitArgs *args) { ram_addr_t ram_size = args-ram_size; @@ -285,7 +292,7 @@ static QEMUMachine pc_machine_v1_4 = { .name = pc-1.4, .alias = pc, .desc = Standard PC, -.init = pc_init_pci_1_3, +.init = pc_init_pci_1_4, .max_cpus = 255, .is_default = 1, }; diff --git a/target-i386/cpu.c b/target-i386/cpu.c index e6435da..c83a566 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -158,6 +158,15 @@ void enable_kvm_pv_eoi(void) #endif } +void disable_kvm_mmu_op(void) +{ +#ifdef CONFIG_KVM No need for ifdef here too. Same case of the previous patch: KVM_FEATURE_MMU_OP is available only if CONFIG_KVM is set. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 05/11] target-i386: check/enforce: Fix CPUID leaf numbers on error messages
On Sun, Jan 06, 2013 at 04:12:54PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:06PM -0200, Eduardo Habkost wrote: The -cpu check/enforce warnings are printing incorrect information about the missing flags. There are no feature flags on CPUID leaves 0 and 0x8000, but there were references to 0 and 0x8000 in the table at kvm_check_features_against_host(). This changes the model_features_t struct to contain the register number as well, so the error messages print the correct CPUID leaf+register information, instead of wrong CPUID leaf numbers. This also changes the format of the error messages, so they follow the CPUID.leaf.register.name [bit offset] convention used on Intel documentation. Example output: $ qemu-system-x86_64 -machine pc-1.0,accel=kvm -cpu Opteron_G4,+ia64,enforce warning: host doesn't support requested feature: CPUID.01H:EDX.ia64 [bit 30] warning: host doesn't support requested feature: CPUID.01H:ECX.xsave [bit 26] warning: host doesn't support requested feature: CPUID.01H:ECX.avx [bit 28] warning: host doesn't support requested feature: CPUID.8001H:ECX.abm [bit 5] warning: host doesn't support requested feature: CPUID.8001H:ECX.sse4a [bit 6] warning: host doesn't support requested feature: CPUID.8001H:ECX.misalignsse [bit 7] warning: host doesn't support requested feature: CPUID.8001H:ECX.3dnowprefetch [bit 8] warning: host doesn't support requested feature: CPUID.8001H:ECX.xop [bit 11] warning: host doesn't support requested feature: CPUID.8001H:ECX.fma4 [bit 16] Unable to find x86 CPU definition $ Signed-off-by: Eduardo Habkost ehabk...@redhat.com Reviewed-by: Gleb Natapov g...@redhat.com But see the question below. --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Changes v2: - Coding style fixes - Add assert() for invalid register numbers on unavailable_host_feature() --- target-i386/cpu.c | 42 +- target-i386/cpu.h | 3 +++ 2 files changed, 36 insertions(+), 9 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index e916ae0..c3e5db8 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -124,6 +124,25 @@ static const char *cpuid_7_0_ebx_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +const char *get_register_name_32(unsigned int reg) +{ +static const char *reg_names[CPU_NB_REGS32] = { +[R_EAX] = EAX, +[R_ECX] = ECX, +[R_EDX] = EDX, +[R_EBX] = EBX, +[R_ESP] = ESP, +[R_EBP] = EBP, +[R_ESI] = ESI, +[R_EDI] = EDI, +}; + +if (reg CPU_NB_REGS32) { +return NULL; +} +return reg_names[reg]; +} + /* collects per-function cpuid data */ typedef struct model_features_t { @@ -132,7 +151,8 @@ typedef struct model_features_t { uint32_t check_feat; const char **flag_names; uint32_t cpuid; -} model_features_t; +int reg; +} model_features_t; int check_cpuid = 0; int enforce_cpuid = 0; @@ -923,10 +943,13 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) for (i = 0; i 32; ++i) if (1 i mask) { -fprintf(stderr, warning: host cpuid %04x_%04x lacks requested - flag '%s' [0x%08x]\n, -f-cpuid 16, f-cpuid 0x, -f-flag_names[i] ? f-flag_names[i] : [reserved], mask); +const char *reg = get_register_name_32(f-reg); +assert(reg); +fprintf(stderr, warning: host doesn't support requested feature: +CPUID.%02XH:%s%s%s [bit %d]\n, +f-cpuid, reg, +f-flag_names[i] ? . : , +f-flag_names[i] ? f-flag_names[i] : , i); break; } return 0; @@ -945,13 +968,14 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) int rv, i; struct model_features_t ft[] = { {guest_def-features, host_def.features, -~0, feature_name, 0x}, +~0, feature_name, 0x0001, R_EDX}, {guest_def-ext_features, host_def.ext_features, -~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001}, +~CPUID_EXT_HYPERVISOR, ext_feature_name, 0x0001, R_ECX}, {guest_def-ext2_features, host_def.ext2_features, -~PPRO_FEATURES, ext2_feature_name, 0x8000}, +~PPRO_FEATURES, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -~CPUID_EXT3_SVM, ext3_feature_name, 0x8001}}; +~CPUID_EXT3_SVM,
Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set
On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote: This will be necessary once kvm_check_features_against_host() starts using KVM-specific definitions (so it won't compile anymore if CONFIG_KVM is not set). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 1c3c7e1..876b0f6 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } +#ifdef CONFIG_KVM static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) { int i; @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) } return rv; } +#endif static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-kvm_features = ~minus_kvm_features; x86_cpu_def-svm_features = ~minus_svm_features; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features; +#ifdef CONFIG_KVM if (check_cpuid kvm_enabled()) { if (kvm_check_features_against_host(x86_cpu_def) enforce_cpuid) goto error; } +#endif Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop ifdef here. I will do. Igor probably will have to change his target-i386: move kvm_check_features_against_host() check to realize time patch to use the same approach, too. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes
On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote: Make git_get_out() consistent with spec. Currently pit_get_out() doesn't affect IRQ0, but it can be read by the guest in other ways. This makes it consistent with proposed changes in qemu's i8254 model as well. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 44 ++-- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index cd4ec60..fd38938 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int channel) WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock)); + /* FIXME: Add some way to represent a paused timer and return + * the paused-at counter value, to better model gate pausing, + * wait until next CLK pulse to load counter logic, etc. + */ t = kpit_elapsed(kvm, c, channel); d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC); @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int channel) counter = (c-count - d) 0x; break; case 3: - /* XXX: may be incorrect for odd counts */ - counter = c-count - (mod_64((2 * d), c-count)); + counter = (c-count - (mod_64((2 * d), c-count))) 0xfffe; break; default: counter = c-count - mod_64(d, c-count); @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int channel) switch (c-mode) { default: case 0: - out = (d = c-count); - break; case 1: - out = (d c-count); + out = (d = c-count); break; case 2: - out = ((mod_64(d, c-count) == 0) (d != 0)); + out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0); break; case 3: - out = (mod_64(d, c-count) ((c-count + 1) 1)); + out = (mod_64(d, c-count) ((c-count + 1) 1) || c-gate == 0); break; case 4: case 5: - out = (d == c-count); + out = (d != c-count); break; } @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) /* * The largest possible initial count is 0; this is equivalent - * to 216 for binary counting and 104 for BCD counting. + * to pow(2,16) for binary counting and pow(10,4) for BCD counting. */ if (val == 0) val = 0x1; @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) if (channel != 0) { ps-channels[channel].count_load_time = ktime_get(); + + /* In gate-triggered one-shot modes, + * indirectly model some pit_get_out() + * cases by setting the load time way + * back until gate-triggered. + * (Generally only affects reading status + * from channel 2 speaker, + * due to hard-wired gates on other + * channels.) + * + * FIXME: This might be redesigned if a paused + * timer state is added for pit_get_count(). + */ + if (ps-channels[channel].mode == 1 || + ps-channels[channel].mode == 5) { + u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ); + ps-channels[channel].count_load_time = +ktime_sub(ps-channels[channel].count_load_time, + ns_to_ktime(delta)); I do not understand what are you trying to do here. You assume that trigger will happen 2 clocks after counter is loaded? + } return; } @@ -383,7 +404,6 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) * mode 1 is one shot, mode 2 is period, otherwise del timer */ switch (ps-channels[0].mode) { case 0: - case 1: /* FIXME: enhance mode 4 precision */ case 4: create_pit_timer(kvm, val, 0); @@ -393,6 +413,10 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) create_pit_timer(kvm, val, 1); break; default: + /* Modes 1 and 5 are triggered by gate leading edge, + * but channel 0's gate is hard-wired high and has + * no edges (on normal real hardware). + */ destroy_pit_timer(kvm-arch.vpit); } } -- 1.7.10.2.484.gcd07cc5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body
Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words
On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote: This adds the following feature words to the list of flags to be checked by kvm_check_features_against_host(): - cpuid_7_0_ebx_features - ext4_features - kvm_features - svm_features This will ensure the enforce flag works as it should: it won't allow QEMU to be started unless every flag that was requested by the user or defined in the CPU model is supported by the host. This patch may cause existing configurations where enforce wasn't preventing QEMU from being started to abort QEMU. But that's exactly the point of this patch: if a flag was not supported by the host and QEMU wasn't aborting, it was a bug in the enforce code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com CCing libvirt people, as this is directly related to the planned usage of the enforce flag by libvirt. The libvirt team probably has a problem in their hands: libvirt should use enforce to make sure all requested flags are making their way into the guest (so the resulting CPU is always the same, on any host), but users may have existing working configurations where a flag is not supported by the guest and the user really doesn't care about it. Those configurations will necessarily break when libvirt starts using enforce. One example where it may cause trouble for common setups: pc-1.3 wants the kvm_pv_eoi flag enabled by default (so enforce will make sure it is enabled), but the user may have an existing VM running on a host without pv_eoi support. That setup is unsafe today because live-migration between different host kernel versions may enable/disable pv_eoi silently (that's why we need the enforce flag to be used by libvirt), but the user probably would like to be able to live-migrate that VM anyway (and have libvirt to just do the right thing). One possible solution to libvirt is to use enforce only on newer machine-types, so existing machines with older machine-types will keep the unsafe host-dependent-ABI behavior, but at least would keep live-migration working in case the user is careful. I really don't know what the libvirt team prefers, but that's the situation today. The longer we take to make enforce strict as it should and make libvirt finally use it, more users will have VMs with migration-unsafe unpredictable guest ABIs. Changes v2: - Coding style fix --- target-i386/cpu.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 876b0f6..52727ad 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) return 0; } -/* best effort attempt to inform user requested cpu flags aren't making - * their way to the guest. +/* Check if all requested cpu flags are making their way to the guest + * + * Returns 0 if all flags are supported by the host, non-zero otherwise. * * This function may be called only if KVM is enabled. */ @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) {guest_def-ext2_features, host_def.ext2_features, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -ext3_feature_name, 0x8001, R_ECX} +ext3_feature_name, 0x8001, R_ECX}, +{guest_def-ext4_features, host_def.ext4_features, +NULL, 0xC001, R_EDX}, Since there is not name array for ext4_features they cannot be added or removed on the command line hence no need to check them, no? In theory, yes. But it won't hurt to check it, and it will be useful to unify the list of feature words in a single place, so we can be sure the checking/filtering/setting code at kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(), will all check/filter/set exactly the same feature words. +{guest_def-cpuid_7_0_ebx_features, host_def.cpuid_7_0_ebx_features, +cpuid_7_0_ebx_feature_name, 7, R_EBX}, +{guest_def-svm_features, host_def.svm_features, +svm_feature_name, 0x800A, R_EDX}, +{guest_def-kvm_features, host_def.kvm_features, +kvm_feature_name, KVM_CPUID_FEATURES, R_EAX}, }; assert(kvm_enabled()); -- 1.7.11.7 -- Gleb. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words
On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote: This adds the following feature words to the list of flags to be checked by kvm_check_features_against_host(): - cpuid_7_0_ebx_features - ext4_features - kvm_features - svm_features This will ensure the enforce flag works as it should: it won't allow QEMU to be started unless every flag that was requested by the user or defined in the CPU model is supported by the host. This patch may cause existing configurations where enforce wasn't preventing QEMU from being started to abort QEMU. But that's exactly the point of this patch: if a flag was not supported by the host and QEMU wasn't aborting, it was a bug in the enforce code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com CCing libvirt people, as this is directly related to the planned usage of the enforce flag by libvirt. The libvirt team probably has a problem in their hands: libvirt should use enforce to make sure all requested flags are making their way into the guest (so the resulting CPU is always the same, on any host), but users may have existing working configurations where a flag is not supported by the guest and the user really doesn't care about it. Those configurations will necessarily break when libvirt starts using enforce. One example where it may cause trouble for common setups: pc-1.3 wants the kvm_pv_eoi flag enabled by default (so enforce will make sure it is enabled), but the user may have an existing VM running on a host without pv_eoi support. That setup is unsafe today because live-migration between different host kernel versions may enable/disable pv_eoi silently (that's why we need the enforce flag to be used by libvirt), but the user probably would like to be able to live-migrate that VM anyway (and have libvirt to just do the right thing). One possible solution to libvirt is to use enforce only on newer machine-types, so existing machines with older machine-types will keep the unsafe host-dependent-ABI behavior, but at least would keep live-migration working in case the user is careful. I really don't know what the libvirt team prefers, but that's the situation today. The longer we take to make enforce strict as it should and make libvirt finally use it, more users will have VMs with migration-unsafe unpredictable guest ABIs. Changes v2: - Coding style fix --- target-i386/cpu.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 876b0f6..52727ad 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) return 0; } -/* best effort attempt to inform user requested cpu flags aren't making - * their way to the guest. +/* Check if all requested cpu flags are making their way to the guest + * + * Returns 0 if all flags are supported by the host, non-zero otherwise. * * This function may be called only if KVM is enabled. */ @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) {guest_def-ext2_features, host_def.ext2_features, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -ext3_feature_name, 0x8001, R_ECX} +ext3_feature_name, 0x8001, R_ECX}, +{guest_def-ext4_features, host_def.ext4_features, +NULL, 0xC001, R_EDX}, Since there is not name array for ext4_features they cannot be added or removed on the command line hence no need to check them, no? In theory, yes. But it won't hurt to check it, and it will be useful to unify the list of feature words in a single place, so we can be sure the checking/filtering/setting code at kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(), will all check/filter/set exactly the same feature words. May be add a name array for the leaf? :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? It comes from the KVM kernel headers, that are included only if CONFIG_KVM is set, and probably won't even compile in non-Linux systems. I have a dejavu feeling. I believe we had this exact problem before, maybe about some other #defines that come from the Linux KVM headers and won't be available in non-Linux systems. +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} +#endif } void host_cpuid(uint32_t function, uint32_t count, -- 1.7.11.7 -- Gleb. -- Eduardo -- Gleb. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? It comes from the KVM kernel headers, that are included only if CONFIG_KVM is set, and probably won't even compile in non-Linux systems. I have a dejavu feeling. I believe we had this exact problem before, maybe about some other #defines that come from the Linux KVM headers and won't be available in non-Linux systems. It is better to hide all KVM related differences somewhere in the headers where no one sees them instead of sprinkle them all over the code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM part. Or have one ifdef CONFIG_KVM at the beginning of the file and define enable_kvm_pv_eoi() there and provide empty stub otherwise. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words
On Mon, Jan 07, 2013 at 02:06:38PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote: This adds the following feature words to the list of flags to be checked by kvm_check_features_against_host(): - cpuid_7_0_ebx_features - ext4_features - kvm_features - svm_features This will ensure the enforce flag works as it should: it won't allow QEMU to be started unless every flag that was requested by the user or defined in the CPU model is supported by the host. This patch may cause existing configurations where enforce wasn't preventing QEMU from being started to abort QEMU. But that's exactly the point of this patch: if a flag was not supported by the host and QEMU wasn't aborting, it was a bug in the enforce code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com CCing libvirt people, as this is directly related to the planned usage of the enforce flag by libvirt. The libvirt team probably has a problem in their hands: libvirt should use enforce to make sure all requested flags are making their way into the guest (so the resulting CPU is always the same, on any host), but users may have existing working configurations where a flag is not supported by the guest and the user really doesn't care about it. Those configurations will necessarily break when libvirt starts using enforce. One example where it may cause trouble for common setups: pc-1.3 wants the kvm_pv_eoi flag enabled by default (so enforce will make sure it is enabled), but the user may have an existing VM running on a host without pv_eoi support. That setup is unsafe today because live-migration between different host kernel versions may enable/disable pv_eoi silently (that's why we need the enforce flag to be used by libvirt), but the user probably would like to be able to live-migrate that VM anyway (and have libvirt to just do the right thing). One possible solution to libvirt is to use enforce only on newer machine-types, so existing machines with older machine-types will keep the unsafe host-dependent-ABI behavior, but at least would keep live-migration working in case the user is careful. I really don't know what the libvirt team prefers, but that's the situation today. The longer we take to make enforce strict as it should and make libvirt finally use it, more users will have VMs with migration-unsafe unpredictable guest ABIs. Changes v2: - Coding style fix --- target-i386/cpu.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 876b0f6..52727ad 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) return 0; } -/* best effort attempt to inform user requested cpu flags aren't making - * their way to the guest. +/* Check if all requested cpu flags are making their way to the guest + * + * Returns 0 if all flags are supported by the host, non-zero otherwise. * * This function may be called only if KVM is enabled. */ @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) {guest_def-ext2_features, host_def.ext2_features, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -ext3_feature_name, 0x8001, R_ECX} +ext3_feature_name, 0x8001, R_ECX}, +{guest_def-ext4_features, host_def.ext4_features, +NULL, 0xC001, R_EDX}, Since there is not name array for ext4_features they cannot be added or removed on the command line hence no need to check them, no? In theory, yes. But it won't hurt to check it, and it will be useful to unify the list of feature words in a single place, so we can be sure the checking/filtering/setting code at kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(), will all check/filter/set exactly the same feature words. May be add a name array for the leaf? :) If anybody find reliable documentation about the 0xC001 CPUID bits, I would happily do it. :-) While we don't have the docs and feature names, I still believe that having the complete list of feature words in the kvm_check_features_against_host() code will save us trouble later,
Re: [PATCH qom-cpu 11/11] target-i386: check/enforce: Check all feature words
On Mon, Jan 07, 2013 at 10:19:15AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 02:06:38PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:06:21AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 04:35:51PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:12PM -0200, Eduardo Habkost wrote: This adds the following feature words to the list of flags to be checked by kvm_check_features_against_host(): - cpuid_7_0_ebx_features - ext4_features - kvm_features - svm_features This will ensure the enforce flag works as it should: it won't allow QEMU to be started unless every flag that was requested by the user or defined in the CPU model is supported by the host. This patch may cause existing configurations where enforce wasn't preventing QEMU from being started to abort QEMU. But that's exactly the point of this patch: if a flag was not supported by the host and QEMU wasn't aborting, it was a bug in the enforce code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com CCing libvirt people, as this is directly related to the planned usage of the enforce flag by libvirt. The libvirt team probably has a problem in their hands: libvirt should use enforce to make sure all requested flags are making their way into the guest (so the resulting CPU is always the same, on any host), but users may have existing working configurations where a flag is not supported by the guest and the user really doesn't care about it. Those configurations will necessarily break when libvirt starts using enforce. One example where it may cause trouble for common setups: pc-1.3 wants the kvm_pv_eoi flag enabled by default (so enforce will make sure it is enabled), but the user may have an existing VM running on a host without pv_eoi support. That setup is unsafe today because live-migration between different host kernel versions may enable/disable pv_eoi silently (that's why we need the enforce flag to be used by libvirt), but the user probably would like to be able to live-migrate that VM anyway (and have libvirt to just do the right thing). One possible solution to libvirt is to use enforce only on newer machine-types, so existing machines with older machine-types will keep the unsafe host-dependent-ABI behavior, but at least would keep live-migration working in case the user is careful. I really don't know what the libvirt team prefers, but that's the situation today. The longer we take to make enforce strict as it should and make libvirt finally use it, more users will have VMs with migration-unsafe unpredictable guest ABIs. Changes v2: - Coding style fix --- target-i386/cpu.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 876b0f6..52727ad 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -955,8 +955,9 @@ static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) return 0; } -/* best effort attempt to inform user requested cpu flags aren't making - * their way to the guest. +/* Check if all requested cpu flags are making their way to the guest + * + * Returns 0 if all flags are supported by the host, non-zero otherwise. * * This function may be called only if KVM is enabled. */ @@ -973,7 +974,15 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) {guest_def-ext2_features, host_def.ext2_features, ext2_feature_name, 0x8001, R_EDX}, {guest_def-ext3_features, host_def.ext3_features, -ext3_feature_name, 0x8001, R_ECX} +ext3_feature_name, 0x8001, R_ECX}, +{guest_def-ext4_features, host_def.ext4_features, +NULL, 0xC001, R_EDX}, Since there is not name array for ext4_features they cannot be added or removed on the command line hence no need to check them, no? In theory, yes. But it won't hurt to check it, and it will be useful to unify the list of feature words in a single place, so we can be sure the checking/filtering/setting code at kvm_check_features_against_host()/kvm_filter_features_for_host()/kvm_cpu_fill_host(), will all check/filter/set exactly the same feature words. May be add a name array for the leaf? :) If anybody find reliable documentation about the 0xC001 CPUID bits, I would
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? It comes from the KVM kernel headers, that are included only if CONFIG_KVM is set, and probably won't even compile in non-Linux systems. I have a dejavu feeling. I believe we had this exact problem before, maybe about some other #defines that come from the Linux KVM headers and won't be available in non-Linux systems. It is better to hide all KVM related differences somewhere in the headers where no one sees them instead of sprinkle them all over the code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM part. Or have one ifdef CONFIG_KVM at the beginning of the file and define enable_kvm_pv_eoi() there and provide empty stub otherwise. If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef around the real implementation. I mean, I don't think this: #ifdef CONFIG_KVM int enable_kvm_pv_eoi() { [...] } #endif is any better than this: int enable_kvm_pv_eoi() { #ifdef CONFIG_KVM [...] #endif } So this is probably a good reason to duplicate the KVM_FEATURE_* #defines in the QEMU code, instead? -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 10:30:40AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? It comes from the KVM kernel headers, that are included only if CONFIG_KVM is set, and probably won't even compile in non-Linux systems. I have a dejavu feeling. I believe we had this exact problem before, maybe about some other #defines that come from the Linux KVM headers and won't be available in non-Linux systems. It is better to hide all KVM related differences somewhere in the headers where no one sees them instead of sprinkle them all over the code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM part. Or have one ifdef CONFIG_KVM at the beginning of the file and define enable_kvm_pv_eoi() there and provide empty stub otherwise. If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef around the real implementation. I mean, I don't think this: #ifdef CONFIG_KVM int enable_kvm_pv_eoi() { [...] } #endif You already have #ifdef CONFIG_KVM just above enable_kvm_pv_eoi(). Put everything KVM related there instead of adding #ifdef CONFIG_KVM all over the file. is any better than this: int enable_kvm_pv_eoi() { #ifdef CONFIG_KVM [...] #endif } So this is probably a good reason to duplicate the KVM_FEATURE_* #defines in the QEMU code, instead? Not even duplicate, they can be fake just to keep compiler happy. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high
On Mon, Jan 07, 2013 at 11:39:18AM +0200, Gleb Natapov wrote: On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote: Reading the spec, it is clear that most modes normally leave the IRQ output line high, and only pulse it low to generate a leading edge. Especially the most commonly used mode 2. The KVM i8254 model does not try to emulate the duration of the pulse at all, so just swap the high/low settings it to leave it high most of the time. This fix is a prerequisite to improving the i8259 model to handle the trailing edge of an interupt request as indicated in its spec: If it gets a trailing edge of an IRQ line before it starts to service the interrupt, the request should be canceled. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Risks: There is a risk that migrating a running guest between versions with and without this patch will lose or gain a single timer interrupt during the migration process. The only case where Can you elaborate on how exactly this can happen? Do not see it. this is likely to be serious is probably losing a single-shot (mode 4) interrupt, but if my understanding of how things work is good, then that should only be possible if a whole slew of conditions are all met: 1. The guest is configured to run in a tickless mode (like modern Linux). 2. The guest is for some reason still using the i8254 rather than something more modern like an HPET. (The combination of 1 and 2 should be rare.) This is not so rare. For performance reason it is better to not have HPET at all. In fact -no-hpet is how I would advice anyone to run qemu. It looks like Linux prefer to use APIC timer anyway. 3. The migration is going from a fixed version back to the old version. (Not sure how common this is, but it should be rarer than migrating from old to new.) 4. There are not going to be any timely events/interrupts (keyboard, network, process sleeps, etc) that cause the guest to reset the PIT mode 4 one-shot counter soon enough. This combination should be rare enough that more complicated solutions are not worth the effort. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index c1d30b2..cd4ec60 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work) } spin_unlock(ps-inject_lock); if (inject) { - kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); + /* Clear previous interrupt, then create a rising +* edge to request another interupt, and leave it at +* level=1 until time to inject another one. +*/ kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); + kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); /* * Provides NMI watchdog support via Virtual Wire mode. -- 1.7.10.2.484.gcd07cc5 -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
Gleb Natapov wrote on 2013-01-07: On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote: From: Yang Zhang yang.z.zh...@intel.com Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/ia64/kvm/lapic.h |6 ++ arch/x86/include/asm/kvm_host.h |8 ++ arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 56 +++- arch/x86/kvm/lapic.c| 87 +++--- arch/x86/kvm/lapic.h| 29 +- arch/x86/kvm/svm.c | 36 arch/x86/kvm/vmx.c | 190 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h|2 + virt/kvm/ioapic.c | 41 + virt/kvm/ioapic.h |1 + virt/kvm/irq_comm.c | 20 13 files changed, 451 insertions(+), 47 deletions(-) diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h index c5f92a9..cb59eb4 100644 --- a/arch/ia64/kvm/lapic.h +++ b/arch/ia64/kvm/lapic.h @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq); #define kvm_apic_present(x) (true) #define kvm_lapic_enabled(x) (true) +static inline void kvm_update_eoi_exitmap(struct kvm *kvm, +struct kvm_lapic_irq *irq) +{ +/* IA64 has no apicv supporting, do nothing here */ +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..135603f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,13 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); +int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); +void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr); +void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq); +void (*update_exitmap_start)(struct kvm_vcpu *vcpu); +void (*update_exitmap_end)(struct kvm_vcpu *vcpu); +void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); The amount of callbacks to update exit bitmap start to become insane. As your suggestion below, if using global lock, then three callbacks is enough +void (*restore_rvi)(struct kvm_vcpu *vcpu); rvi? Call it set_svi() and make it do just that - set svi. Typo. diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0664c13..e1baf37 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -133,6 +133,12 @@ static inline int apic_enabled(struct kvm_lapic *apic) return kvm_apic_sw_enabled(apic) kvm_apic_hw_enabled(apic); } +bool kvm_apic_present(struct kvm_vcpu *vcpu) +{ + return kvm_vcpu_has_lapic(vcpu) kvm_apic_hw_enabled(vcpu-arch.apic); +} +EXPORT_SYMBOL_GPL(kvm_apic_present); + Why is this change? Drop it. I cannot remember why. But it seems this change is needless now. #define LVT_MASK\ (APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK) @@ -150,23 +156,6 @@ static inline int kvm_apic_id(struct kvm_lapic *apic) return (kvm_apic_get_reg(apic, APIC_ID) 24) 0xff; } -static inline u16 apic_cluster_id(struct kvm_apic_map *map, u32 ldr) -{ -u16 cid; -ldr = 32 - map-ldr_bits; -cid = (ldr map-cid_shift) map-cid_mask; - -BUG_ON(cid = ARRAY_SIZE(map-logical_map)); - -return cid; -} - -static inline u16 apic_logical_id(struct kvm_apic_map *map, u32 ldr) -{ -ldr = (32 - map-ldr_bits); -return ldr map-lid_mask; -} - static void recalculate_apic_map(struct kvm *kvm) { struct kvm_apic_map *new, *old = NULL; @@ -236,12 +225,14 @@ static inline void kvm_apic_set_id(struct kvm_lapic *apic, u8 id) { apic_set_reg(apic, APIC_ID, id 24); recalculate_apic_map(apic-vcpu-kvm); + ioapic_update_eoi_exitmap(apic-vcpu-kvm); } static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id) { apic_set_reg(apic, APIC_LDR, id); recalculate_apic_map(apic-vcpu-kvm); + ioapic_update_eoi_exitmap(apic-vcpu-kvm); } static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type) @@ -345,6 +336,9 @@ static inline int apic_find_highest_irr(struct
Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set
On Mon, 7 Jan 2013 10:00:09 -0200 Eduardo Habkost ehabk...@redhat.com wrote: On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote: This will be necessary once kvm_check_features_against_host() starts using KVM-specific definitions (so it won't compile anymore if CONFIG_KVM is not set). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 1c3c7e1..876b0f6 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } +#ifdef CONFIG_KVM static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) { int i; @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) } return rv; } +#endif static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-kvm_features = ~minus_kvm_features; x86_cpu_def-svm_features = ~minus_svm_features; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features; +#ifdef CONFIG_KVM if (check_cpuid kvm_enabled()) { if (kvm_check_features_against_host(x86_cpu_def) enforce_cpuid) goto error; } +#endif Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop ifdef here. I will do. Igor probably will have to change his target-i386: move kvm_check_features_against_host() check to realize time patch to use the same approach, too. Gleb, Why do stub here? As result we will be adding more ifdef-s just in other places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and kvm_check_features_against_host() are bundled together in cpu.c so we could instead ifdef whole block. Like here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html For me code looks more readable with ifdef here, if we have stub, a reader would have to look at kvm_check_features_against_host() body to see if it does anything. -- Eduardo -- Regards, Igor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set
On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote: On Mon, 7 Jan 2013 10:00:09 -0200 Eduardo Habkost ehabk...@redhat.com wrote: On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote: This will be necessary once kvm_check_features_against_host() starts using KVM-specific definitions (so it won't compile anymore if CONFIG_KVM is not set). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 1c3c7e1..876b0f6 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } +#ifdef CONFIG_KVM static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) { int i; @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) } return rv; } +#endif static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-kvm_features = ~minus_kvm_features; x86_cpu_def-svm_features = ~minus_svm_features; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features; +#ifdef CONFIG_KVM if (check_cpuid kvm_enabled()) { if (kvm_check_features_against_host(x86_cpu_def) enforce_cpuid) goto error; } +#endif Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop ifdef here. I will do. Igor probably will have to change his target-i386: move kvm_check_features_against_host() check to realize time patch to use the same approach, too. Gleb, Why do stub here? As result we will be adding more ifdef-s just in other places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and kvm_check_features_against_host() are bundled together in cpu.c so we could instead ifdef whole block. Like here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html For me code looks more readable with ifdef here, if we have stub, a reader would have to look at kvm_check_features_against_host() body to see if it does anything. If CONFIG_KVM is not set, kvm_enabled() is always zero, so the function would never be called, so I find the ifdef-less code more readable and obvious. What I don't know is if we should do this: #ifdef CONFIG_KVM static int kvm_check_features_against_host(...) { /* real implementation here */ } static int kvm_do_something_else(...) { /* real implementation here */ } /* Other kvm_* functions here */ #else static int kvm_check_features_against_host(...) { } static int kvm_do_something_else(...) { } /* Other kvm_* stubs here */ #endif /* CONFIG_KVM */ Or this: static int kvm_check_features_against_host(...) { #ifdef CONFIG_KVM /* real implementation here */ #endif /* CONFIG_KVM */ } static int kvm_do_something_else(...) { #ifdef CONFIG_KVM /* real implementation here */ #endif /* CONFIG_KVM */ } I believe the latter is better, but based on Gleb's comments about enable_kvm_pv_eoi(), he seems to prefer the former. -- Eduardo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set
On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote: On Mon, 7 Jan 2013 10:00:09 -0200 Eduardo Habkost ehabk...@redhat.com wrote: On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote: This will be necessary once kvm_check_features_against_host() starts using KVM-specific definitions (so it won't compile anymore if CONFIG_KVM is not set). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 1c3c7e1..876b0f6 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } +#ifdef CONFIG_KVM static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) { int i; @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) } return rv; } +#endif static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-kvm_features = ~minus_kvm_features; x86_cpu_def-svm_features = ~minus_svm_features; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features; +#ifdef CONFIG_KVM if (check_cpuid kvm_enabled()) { if (kvm_check_features_against_host(x86_cpu_def) enforce_cpuid) goto error; } +#endif Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop ifdef here. I will do. Igor probably will have to change his target-i386: move kvm_check_features_against_host() check to realize time patch to use the same approach, too. Gleb, Why do stub here? As result we will be adding more ifdef-s just in other places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and Why will we be adding more ifdef-s in other places? kvm_check_features_against_host() are bundled together in cpu.c so we could instead ifdef whole block. Like here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html That's fine, but you can avoid things like: if (kvm_enabled() name strcmp(name, host) == 0) { +#ifdef CONFIG_KVM kvm_cpu_fill_host(x86_cpu_def); +#endif in your patch by providing stub for kvm_cpu_fill_host() for !CONFIG_KVM case. This is common practice really. Avoid ifdefs in the code. For me code looks more readable with ifdef here, if we have stub, a reader would have to look at kvm_check_features_against_host() body to see if it does anything. If reader cares about kvm it has to anyway. If he does not, there is friendly kvm_enabled() (which is stub in case of !CONFIG_KVM BTW) to tell him that he does not care. No need additional ifdef there. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote: From: Yang Zhang yang.z.zh...@intel.com Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/ia64/kvm/lapic.h |6 ++ arch/x86/include/asm/kvm_host.h |8 ++ arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 56 +++- arch/x86/kvm/lapic.c| 87 +++--- arch/x86/kvm/lapic.h| 29 +- arch/x86/kvm/svm.c | 36 arch/x86/kvm/vmx.c | 190 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h|2 + virt/kvm/ioapic.c | 41 + virt/kvm/ioapic.h |1 + virt/kvm/irq_comm.c | 20 13 files changed, 451 insertions(+), 47 deletions(-) diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h index c5f92a9..cb59eb4 100644 --- a/arch/ia64/kvm/lapic.h +++ b/arch/ia64/kvm/lapic.h @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq); #define kvm_apic_present(x) (true) #define kvm_lapic_enabled(x) (true) +static inline void kvm_update_eoi_exitmap(struct kvm *kvm, + struct kvm_lapic_irq *irq) +{ + /* IA64 has no apicv supporting, do nothing here */ +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..135603f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,13 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr); + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq); + void (*update_exitmap_start)(struct kvm_vcpu *vcpu); + void (*update_exitmap_end)(struct kvm_vcpu *vcpu); + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); + void (*restore_rvi)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva); int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v); int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 44c3f7e..d1ab331 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -62,6 +62,7 @@ #define EXIT_REASON_MCE_DURING_VMENTRY 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EOI_INDUCED 45 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 @@ -143,6 +144,7 @@ #define SECONDARY_EXEC_WBINVD_EXITING0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST0x0080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID0x1000 @@ -180,6 +182,7 @@ enum vmcs_field { GUEST_GS_SELECTOR = 0x080a, GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, + GUEST_INTR_STATUS = 0x0810, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -207,6 +210,14 @@ enum vmcs_field { APIC_ACCESS_ADDR_HIGH = 0x2015, EPT_POINTER =
Re: [PATCH qom-cpu 01/11] target-i386: Don't set any KVM flag by default if KVM is disabled
On Mon, Jan 07, 2013 at 02:33:25PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:30:40AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 02:15:59PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 10:09:24AM -0200, Eduardo Habkost wrote: On Mon, Jan 07, 2013 at 01:42:53PM +0200, Gleb Natapov wrote: On Mon, Jan 07, 2013 at 09:42:36AM -0200, Eduardo Habkost wrote: On Sun, Jan 06, 2013 at 01:32:34PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:02PM -0200, Eduardo Habkost wrote: This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses CONFIG_KVM and kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix --- target-i386/cpu.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 82685dc..e6435da 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -145,15 +145,17 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +#ifdef CONFIG_KVM You do not need ifdef here. We need it because KVM_FEATURE_PV_EOI is available only if CONFIG_KVM is set. I could also write it as: if (kvm_enabled()) { #ifdef CONFIG_KVM kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); #endif } But I find it less readable. Why not define KVM_FEATURE_PV_EOI unconditionally? It comes from the KVM kernel headers, that are included only if CONFIG_KVM is set, and probably won't even compile in non-Linux systems. I have a dejavu feeling. I believe we had this exact problem before, maybe about some other #defines that come from the Linux KVM headers and won't be available in non-Linux systems. It is better to hide all KVM related differences somewhere in the headers where no one sees them instead of sprinkle them all over the code. We can put those defines in include/sysemu/kvm.h in !CONFIG_KVM part. Or have one ifdef CONFIG_KVM at the beginning of the file and define enable_kvm_pv_eoi() there and provide empty stub otherwise. If we had an empty enable_kvm_pv_eoi() stub, we would need an #ifdef around the real implementation. I mean, I don't think this: #ifdef CONFIG_KVM int enable_kvm_pv_eoi() { [...] } #endif You already have #ifdef CONFIG_KVM just above enable_kvm_pv_eoi(). Put everything KVM related there instead of adding #ifdef CONFIG_KVM all over the file. But it also creates the need to write a separate stub function somewhere else, while we could have a ready-to-use stub function automatically by simply #ifdefing the whole function body. But anyway: this won't matter if we choose the duplicate/fake #defines approach mentioned below. is any better than this: int enable_kvm_pv_eoi() { #ifdef CONFIG_KVM [...] #endif } So this is probably a good reason to duplicate the
Re: [Qemu-devel] [PATCH qom-cpu 10/11] target-i386: Call kvm_check_features_against_host() only if CONFIG_KVM is set
On Mon, 7 Jan 2013 15:30:26 +0200 Gleb Natapov g...@redhat.com wrote: On Mon, Jan 07, 2013 at 02:15:14PM +0100, Igor Mammedov wrote: On Mon, 7 Jan 2013 10:00:09 -0200 Eduardo Habkost ehabk...@redhat.com wrote: On Sun, Jan 06, 2013 at 04:27:19PM +0200, Gleb Natapov wrote: On Fri, Jan 04, 2013 at 08:01:11PM -0200, Eduardo Habkost wrote: This will be necessary once kvm_check_features_against_host() starts using KVM-specific definitions (so it won't compile anymore if CONFIG_KVM is not set). Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 4 1 file changed, 4 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 1c3c7e1..876b0f6 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -936,6 +936,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } +#ifdef CONFIG_KVM static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) { int i; @@ -987,6 +988,7 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) } return rv; } +#endif static void x86_cpuid_version_get_family(Object *obj, Visitor *v, void *opaque, const char *name, Error **errp) @@ -1410,10 +1412,12 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-kvm_features = ~minus_kvm_features; x86_cpu_def-svm_features = ~minus_svm_features; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_7_0_ebx_features; +#ifdef CONFIG_KVM if (check_cpuid kvm_enabled()) { if (kvm_check_features_against_host(x86_cpu_def) enforce_cpuid) goto error; } +#endif Provide kvm_check_features_against_host() stub if !CONFIG_KVM and drop ifdef here. I will do. Igor probably will have to change his target-i386: move kvm_check_features_against_host() check to realize time patch to use the same approach, too. Gleb, Why do stub here? As result we will be adding more ifdef-s just in other places. Currently kvm_cpu_fill_host(), unavailable_host_feature() and Why will we be adding more ifdef-s in other places? unavailable_host_feature() is being ifdef-ed above kvm_check_features_against_host() are bundled together in cpu.c so we could instead ifdef whole block. Like here: http://www.mail-archive.com/qemu-devel@nongnu.org/msg146536.html That's fine, but you can avoid things like: if (kvm_enabled() name strcmp(name, host) == 0) { +#ifdef CONFIG_KVM kvm_cpu_fill_host(x86_cpu_def); +#endif in your patch by providing stub for kvm_cpu_fill_host() for !CONFIG_KVM case. This is common practice really. Avoid ifdefs in the code. This ifdef could be eliminated later when cpus are converted into sub-classes. Then we would put host subclass close to kvm_cpu_fill_host inside of the same ifdef. that would leave ifdef around kvm_check_features_against_host() in cpu_x86_parse_featurestr(). For me code looks more readable with ifdef here, if we have stub, a reader would have to look at kvm_check_features_against_host() body to see if it does anything. If reader cares about kvm it has to anyway. If he does not, there is friendly kvm_enabled() (which is stub in case of !CONFIG_KVM BTW) to tell him that he does not care. No need additional ifdef there. both ways would work, but if stubs are preferred style then there is no point arguing. -- Gleb. -- Regards, Igor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers
Il 07/01/2013 01:02, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: Il 02/01/2013 06:03, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: The virtqueue_add_buf function has two limitations: 1) it requires the caller to provide all the buffers in a single call; 2) it does not support chained scatterlists: the buffers must be provided as an array of struct scatterlist; Chained scatterlists are a horrible interface, but that doesn't mean we shouldn't support them if there's a need. I think I once even had a patch which passed two chained sgs, rather than a combo sg and two length numbers. It's very old, but I've pasted it below. Duplicating the implementation by having another interface is pretty nasty; I think I'd prefer the chained scatterlists, if that's optimal for you. Unfortunately, that cannot work because not all architectures support chained scatterlists. WHAT? I can't figure out what an arch needs to do to support this? It needs to use the iterator functions in its DMA driver. All archs we care about support them, though, so I think we can ignore this issue for now. Kind of... In principle all QEMU-supported arches can use virtio, and the speedup can be quite useful. And there is no Kconfig symbol for SG chains that I can use to disable virtio-scsi on unsupported arches. :/ Paolo (Also, as you mention chained scatterlists are horrible. They'd happen to work for virtio-scsi, but not for virtio-blk where the response status is part of the footer, not the header). We lost that debate 5 years ago, so we hack around it as needed. We can add helpers to append if we need. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: s390: Gracefully handle busy conditions on ccw_device_start
From: Christian Borntraeger borntrae...@de.ibm.com In rare cases a virtio command might try to issue a ccw before a former ccw was answered with a tsch. This will cause CC=2 (busy). Lets just retry in that case. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/s390/kvm/virtio_ccw.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 70419a7..2edd94a 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -132,11 +132,14 @@ static int ccw_io_helper(struct virtio_ccw_device *vcdev, unsigned long flags; int flag = intparm VIRTIO_CCW_INTPARM_MASK; - spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags); - ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0); - if (!ret) - vcdev-curr_io |= flag; - spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags); + do { + spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags); + ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0); + if (!ret) + vcdev-curr_io |= flag; + spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags); + cpu_relax(); + } while (ret == -EBUSY); wait_event(vcdev-wait_q, doing_io(vcdev, flag) == 0); return ret ? ret : vcdev-err; } -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: s390: Dynamic allocation of virtio-ccw I/O data.
Dynamically allocate any data structures like ccw used when doing channel I/O. Otherwise, we'd need to add extra serialization for the different callbacks using the same data structures. Reported-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com --- drivers/s390/kvm/virtio_ccw.c | 280 ++ 1 file changed, 174 insertions(+), 106 deletions(-) diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 1a5aff3..70419a7 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -46,11 +46,9 @@ struct vq_config_block { struct virtio_ccw_device { struct virtio_device vdev; - __u8 status; + __u8 *status; __u8 config[VIRTIO_CCW_CONFIG_SIZE]; struct ccw_device *cdev; - struct ccw1 *ccw; - __u32 area; __u32 curr_io; int err; wait_queue_head_t wait_q; @@ -127,14 +125,15 @@ static int doing_io(struct virtio_ccw_device *vcdev, __u32 flag) return ret; } -static int ccw_io_helper(struct virtio_ccw_device *vcdev, __u32 intparm) +static int ccw_io_helper(struct virtio_ccw_device *vcdev, +struct ccw1 *ccw, __u32 intparm) { int ret; unsigned long flags; int flag = intparm VIRTIO_CCW_INTPARM_MASK; spin_lock_irqsave(get_ccwdev_lock(vcdev-cdev), flags); - ret = ccw_device_start(vcdev-cdev, vcdev-ccw, intparm, 0, 0); + ret = ccw_device_start(vcdev-cdev, ccw, intparm, 0, 0); if (!ret) vcdev-curr_io |= flag; spin_unlock_irqrestore(get_ccwdev_lock(vcdev-cdev), flags); @@ -167,18 +166,19 @@ static void virtio_ccw_kvm_notify(struct virtqueue *vq) do_kvm_notify(schid, virtqueue_get_queue_index(vq)); } -static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, int index) +static int virtio_ccw_read_vq_conf(struct virtio_ccw_device *vcdev, + struct ccw1 *ccw, int index) { vcdev-config_block-index = index; - vcdev-ccw-cmd_code = CCW_CMD_READ_VQ_CONF; - vcdev-ccw-flags = 0; - vcdev-ccw-count = sizeof(struct vq_config_block); - vcdev-ccw-cda = (__u32)(unsigned long)(vcdev-config_block); - ccw_io_helper(vcdev, VIRTIO_CCW_DOING_READ_VQ_CONF); + ccw-cmd_code = CCW_CMD_READ_VQ_CONF; + ccw-flags = 0; + ccw-count = sizeof(struct vq_config_block); + ccw-cda = (__u32)(unsigned long)(vcdev-config_block); + ccw_io_helper(vcdev, ccw, VIRTIO_CCW_DOING_READ_VQ_CONF); return vcdev-config_block-num; } -static void virtio_ccw_del_vq(struct virtqueue *vq) +static void virtio_ccw_del_vq(struct virtqueue *vq, struct ccw1 *ccw) { struct virtio_ccw_device *vcdev = to_vc_device(vq-vdev); struct virtio_ccw_vq_info *info = vq-priv; @@ -197,11 +197,12 @@ static void virtio_ccw_del_vq(struct virtqueue *vq) info-info_block-align = 0; info-info_block-index = index; info-info_block-num = 0; - vcdev-ccw-cmd_code = CCW_CMD_SET_VQ; - vcdev-ccw-flags = 0; - vcdev-ccw-count = sizeof(*info-info_block); - vcdev-ccw-cda = (__u32)(unsigned long)(info-info_block); - ret = ccw_io_helper(vcdev, VIRTIO_CCW_DOING_SET_VQ | index); + ccw-cmd_code = CCW_CMD_SET_VQ; + ccw-flags = 0; + ccw-count = sizeof(*info-info_block); + ccw-cda = (__u32)(unsigned long)(info-info_block); + ret = ccw_io_helper(vcdev, ccw, + VIRTIO_CCW_DOING_SET_VQ | index); /* * -ENODEV isn't considered an error: The device is gone anyway. * This may happen on device detach. @@ -220,14 +221,23 @@ static void virtio_ccw_del_vq(struct virtqueue *vq) static void virtio_ccw_del_vqs(struct virtio_device *vdev) { struct virtqueue *vq, *n; + struct ccw1 *ccw; + + ccw = kzalloc(sizeof(*ccw), GFP_DMA | GFP_KERNEL); + if (!ccw) + return; + list_for_each_entry_safe(vq, n, vdev-vqs, list) - virtio_ccw_del_vq(vq); + virtio_ccw_del_vq(vq, ccw); + + kfree(ccw); } static struct virtqueue *virtio_ccw_setup_vq(struct virtio_device *vdev, int i, vq_callback_t *callback, -const char *name) +const char *name, +struct ccw1 *ccw) { struct virtio_ccw_device *vcdev = to_vc_device(vdev); int err; @@ -250,7 +260,7 @@ static struct virtqueue *virtio_ccw_setup_vq(struct virtio_device *vdev, err = -ENOMEM; goto out_err; } - info-num = virtio_ccw_read_vq_conf(vcdev, i); + info-num = virtio_ccw_read_vq_conf(vcdev, ccw, i); size = PAGE_ALIGN(vring_size(info-num, KVM_VIRTIO_CCW_RING_ALIGN));
[PATCH 0/2] KVM: s390: Bugfixes for virtio-ccw.
Hi, Christian discovered some problems with regard to serialization in the virtio-ccw guest driver. Per-device data structures might contain data obtained by channel programs issued later on, leading to confusing behaviour. We cannot rely on the common I/O layer serialization here. Rather than adding extra serialization, we decided to keep it simple with per-request allocated data structures and retries on busy. These patches have been run in our internal testing without problems for a bit now. Please apply to kvm-next. Christian Borntraeger (1): KVM: s390: Gracefully handle busy conditions on ccw_device_start Cornelia Huck (1): KVM: s390: Dynamic allocation of virtio-ccw I/O data. drivers/s390/kvm/virtio_ccw.c | 291 ++ 1 file changed, 181 insertions(+), 110 deletions(-) -- 1.7.12.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V3 2/2] vhost: handle polling errors
On Mon, Jan 07, 2013 at 12:38:17PM +0800, Jason Wang wrote: On 01/06/2013 09:22 PM, Michael S. Tsirkin wrote: On Sun, Jan 06, 2013 at 03:18:38PM +0800, Jason Wang wrote: Polling errors were ignored by vhost/vhost_net, this may lead to crash when trying to remove vhost from waitqueue when after the polling is failed. Solve this problem by: - checking the poll-wqh before trying to remove from waitqueue - report an error when poll() returns a POLLERR in vhost_start_poll() - report an error when vhost_start_poll() fails in vhost_vring_ioctl()/vhost_net_set_backend() which is used to notify the failure to userspace. - report an error in the data path in vhost_net when meet polling errors. After those changes, we can safely drop the tx polling state in vhost_net since it was replaced by the checking of poll-wqh. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 74 drivers/vhost/vhost.c | 31 +++- drivers/vhost/vhost.h |2 +- 3 files changed, 49 insertions(+), 58 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d10ad6f..125c1e5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -64,20 +64,10 @@ enum { VHOST_NET_VQ_MAX = 2, }; -enum vhost_net_poll_state { - VHOST_NET_POLL_DISABLED = 0, - VHOST_NET_POLL_STARTED = 1, - VHOST_NET_POLL_STOPPED = 2, -}; - struct vhost_net { struct vhost_dev dev; struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX]; struct vhost_poll poll[VHOST_NET_VQ_MAX]; - /* Tells us whether we are polling a socket for TX. - * We only do this when socket buffer fills up. - * Protected by tx vq lock. */ - enum vhost_net_poll_state tx_poll_state; /* Number of TX recently submitted. * Protected by tx vq lock. */ unsigned tx_packets; @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, } } -/* Caller must have TX VQ lock */ -static void tx_poll_stop(struct vhost_net *net) -{ - if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED)) - return; - vhost_poll_stop(net-poll + VHOST_NET_VQ_TX); - net-tx_poll_state = VHOST_NET_POLL_STOPPED; -} - -/* Caller must have TX VQ lock */ -static void tx_poll_start(struct vhost_net *net, struct socket *sock) -{ - if (unlikely(net-tx_poll_state != VHOST_NET_POLL_STOPPED)) - return; - vhost_poll_start(net-poll + VHOST_NET_VQ_TX, sock-file); - net-tx_poll_state = VHOST_NET_POLL_STARTED; -} - /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -227,6 +199,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) static void handle_tx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + struct vhost_poll *poll = net-poll + VHOST_NET_VQ_TX; unsigned out, in, s; int head; struct msghdr msg = { @@ -252,7 +225,8 @@ static void handle_tx(struct vhost_net *net) wmem = atomic_read(sock-sk-sk_wmem_alloc); if (wmem = sock-sk-sk_sndbuf) { mutex_lock(vq-mutex); - tx_poll_start(net, sock); + if (vhost_poll_start(poll, sock-file)) + vq_err(vq, Fail to start TX polling\n); s/Fail/Failed/ A question though: how can this happen? Could you clarify please? Maybe we can find a way to prevent this error? Two conditions I think this can happen: 1) a buggy userspace disable a queue through TUNSETQUEUE 2) the net device were gone For 1, looks like we can delay the disabling until the refcnt goes to zero. For 2 may needs more changes. I'd expect keeping a socket reference would prevent both issues. Doesn't it? Not sure it's worth to do this work, maybe a warning is enough just like other failure. With other failures, you normally can correct the error then kick to have it restart. This is soomething thagt would not work here. mutex_unlock(vq-mutex); return; } @@ -261,7 +235,7 @@ static void handle_tx(struct vhost_net *net) vhost_disable_notify(net-dev, vq); if (wmem sock-sk-sk_sndbuf / 2) - tx_poll_stop(net); + vhost_poll_stop(poll); hdr_size = vq-vhost_hlen; zcopy = vq-ubufs; @@ -283,8 +257,10 @@ static void handle_tx(struct vhost_net *net) wmem = atomic_read(sock-sk-sk_wmem_alloc); if (wmem = sock-sk-sk_sndbuf * 3 / 4) { - tx_poll_start(net, sock); - set_bit(SOCK_ASYNC_NOSPACE, sock-flags); + if (vhost_poll_start(poll, sock-file)) +
Re: [PATCH V3 2/2] vhost: handle polling errors
On 01/07/2013 10:55 PM, Michael S. Tsirkin wrote: On Mon, Jan 07, 2013 at 12:38:17PM +0800, Jason Wang wrote: On 01/06/2013 09:22 PM, Michael S. Tsirkin wrote: On Sun, Jan 06, 2013 at 03:18:38PM +0800, Jason Wang wrote: Polling errors were ignored by vhost/vhost_net, this may lead to crash when trying to remove vhost from waitqueue when after the polling is failed. Solve this problem by: - checking the poll-wqh before trying to remove from waitqueue - report an error when poll() returns a POLLERR in vhost_start_poll() - report an error when vhost_start_poll() fails in vhost_vring_ioctl()/vhost_net_set_backend() which is used to notify the failure to userspace. - report an error in the data path in vhost_net when meet polling errors. After those changes, we can safely drop the tx polling state in vhost_net since it was replaced by the checking of poll-wqh. Signed-off-by: Jason Wang jasow...@redhat.com --- drivers/vhost/net.c | 74 drivers/vhost/vhost.c | 31 +++- drivers/vhost/vhost.h |2 +- 3 files changed, 49 insertions(+), 58 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d10ad6f..125c1e5 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -64,20 +64,10 @@ enum { VHOST_NET_VQ_MAX = 2, }; -enum vhost_net_poll_state { - VHOST_NET_POLL_DISABLED = 0, - VHOST_NET_POLL_STARTED = 1, - VHOST_NET_POLL_STOPPED = 2, -}; - struct vhost_net { struct vhost_dev dev; struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX]; struct vhost_poll poll[VHOST_NET_VQ_MAX]; - /* Tells us whether we are polling a socket for TX. - * We only do this when socket buffer fills up. - * Protected by tx vq lock. */ - enum vhost_net_poll_state tx_poll_state; /* Number of TX recently submitted. * Protected by tx vq lock. */ unsigned tx_packets; @@ -155,24 +145,6 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, } } -/* Caller must have TX VQ lock */ -static void tx_poll_stop(struct vhost_net *net) -{ - if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED)) - return; - vhost_poll_stop(net-poll + VHOST_NET_VQ_TX); - net-tx_poll_state = VHOST_NET_POLL_STOPPED; -} - -/* Caller must have TX VQ lock */ -static void tx_poll_start(struct vhost_net *net, struct socket *sock) -{ - if (unlikely(net-tx_poll_state != VHOST_NET_POLL_STOPPED)) - return; - vhost_poll_start(net-poll + VHOST_NET_VQ_TX, sock-file); - net-tx_poll_state = VHOST_NET_POLL_STARTED; -} - /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -227,6 +199,7 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success) static void handle_tx(struct vhost_net *net) { struct vhost_virtqueue *vq = net-dev.vqs[VHOST_NET_VQ_TX]; + struct vhost_poll *poll = net-poll + VHOST_NET_VQ_TX; unsigned out, in, s; int head; struct msghdr msg = { @@ -252,7 +225,8 @@ static void handle_tx(struct vhost_net *net) wmem = atomic_read(sock-sk-sk_wmem_alloc); if (wmem = sock-sk-sk_sndbuf) { mutex_lock(vq-mutex); - tx_poll_start(net, sock); + if (vhost_poll_start(poll, sock-file)) + vq_err(vq, Fail to start TX polling\n); s/Fail/Failed/ A question though: how can this happen? Could you clarify please? Maybe we can find a way to prevent this error? Two conditions I think this can happen: 1) a buggy userspace disable a queue through TUNSETQUEUE 2) the net device were gone For 1, looks like we can delay the disabling until the refcnt goes to zero. For 2 may needs more changes. I'd expect keeping a socket reference would prevent both issues. Doesn't it? Doesn't work for 2 I think, the socket didn't hold a refcnt of the device, so the device can go away at anytime. Although we can change this, but it's the behaviour before multiqueue support. Not sure it's worth to do this work, maybe a warning is enough just like other failure. With other failures, you normally can correct the error then kick to have it restart. This is soomething thagt would not work here. If userspace is wrote correctly, (e.g passing a fd with correct state) it can also be corrected. mutex_unlock(vq-mutex); return; } @@ -261,7 +235,7 @@ static void handle_tx(struct vhost_net *net) vhost_disable_notify(net-dev, vq); if (wmem sock-sk-sk_sndbuf / 2) - tx_poll_stop(net); + vhost_poll_stop(poll); hdr_size = vq-vhost_hlen; zcopy = vq-ubufs; @@ -283,8 +257,10 @@ static void handle_tx(struct vhost_net *net) wmem = atomic_read(sock-sk-sk_wmem_alloc); if (wmem =
Re: [RESEND PATCH] pci-assign: Enable MSIX on device to match guest
On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote: When a guest enables MSIX on a device we evaluate the MSIX vector table, typically find no unmasked vectors and don't switch the device to MSIX mode. This generally works fine and the device will be switched once the guest enables and therefore unmasks a vector. Unfortunately some drivers enable MSIX, then use interfaces to send commands between VF PF or PF firmware that act based on the host state of the device. These therefore may break when MSIX is managed lazily. This change re-enables the previous test used to enable MSIX (see qemu-kvm a6b402c9), which basically guesses whether a vector will be used based on the data field of the vector table. Cc: qemu-sta...@nongnu.org Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com --- Michael has now ack'd this patch as the correct initial first step, so I'm resending with that included. I'm actually not sure what the expected upstream path is for this file now that it's part of qemu. There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically for this file. Is kvm still upstream for this, through the uq branch or is it qemu for anything not specifically part of a kvm interface? Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree, any path is fine by me. Thanks, Alex I can merge this if there are no other takers. hw/kvm/pci-assign.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c index 8ee9428..896cfe8 100644 --- a/hw/kvm/pci-assign.c +++ b/hw/kvm/pci-assign.c @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry *entry) return (entry-ctrl cpu_to_le32(0x1)) != 0; } +/* + * When MSI-X is first enabled the vector table typically has all the + * vectors masked, so we can't use that as the obvious test to figure out + * how many vectors to initially enable. Instead we look at the data field + * because this is what worked for pci-assign for a long time. This makes + * sure the physical MSI-X state tracks the guest's view, which is important + * for some VF/PF and PF/fw communication channels. + */ +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry) +{ +return !entry-data; +} + static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) /* Get the usable entry number for allocating */ for (i = 0; i adev-msix_max; i++, entry++) { -if (assigned_dev_msix_masked(entry)) { +if (assigned_dev_msix_skipped(entry)) { continue; } entries_nr++; @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) for (i = 0; i adev-msix_max; i++, entry++) { adev-msi_virq[i] = -1; -if (assigned_dev_msix_masked(entry)) { +if (assigned_dev_msix_skipped(entry)) { continue; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rotate graphical output of vm
Dear KVM-list, I would like to rotate the graphical output of a vm. I tried it with a Fedora 17 and a OpenSuse 12.2 guest with vga option -vga -qxl and -vmvga and: xrandr -o left, but I got the error message: X Error of failed request: BadMatch (invalid parameter attributes) Major opcode of failed request: 129 (RANDR) Minor opcode of failed request: 2 (RRsetScreenConfig) Serial number of failed request: 12 Current serial number in output stream: 12 Also my try to change the xorg.conf in SLES11SP1 did not change the rotation, because the Xserver log says: Option Rotate is not used The only vga option which worked partly is: cirrus. But with cirrus the graphical output is not displayed correctly in my vnc-viewer. I tried it with a SLES11SP2 host (Kernel 3.0.13-0.27) and a Fedora17 host (Kernel 3.3.4). Any ideas? Best regards and thanks in advance Dennis -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2013-01-08
Hi Please send in any agenda topics you are interested in. Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
FreeBSD-amd64 fails to start with SMP on quemu-kvm
Hello, When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm (Fedora Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with: qemu-kvm -m 1024m -cpu host -smp 2 -cdrom /storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso it freezes KVM with: KVM internal error. Suberror: 1 emulation failure RAX=80b0d4c0 RBX=0009f000 RCX=c080 RDX= RSI=d238 RDI= RBP= RSP= R8 = R9 = R10= R11= R12= R13= R14= R15= RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = f300 DPL=3 DS16 [-WA] CS =0008 00209900 DPL=0 CS64 [--A] SS =9f00 0009f000 f300 DPL=3 DS16 [-WA] DS =0018 00c09300 DPL=0 DS [-WA] FS = f300 DPL=3 DS16 [-WA] GS = f300 DPL=3 DS16 [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS64-busy GDT= 0009f080 0020 IDT= CR0=8011 CR2= CR3=0009c000 CR4=0030 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0501 Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80 ff ff ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 99 20 00 ff ff 00 00 Freeze occurs immediately after FreeBSD kernel messages: Copyright (c) 1992-2012 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 r...@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64 CPU: Intel(R) Xeon(R) CPU X5570 @ 2.93GHz (2925.91-MHz K8-class CPU) Origin = GenuineIntel Id = 0x106a5 Family = 6 Model = 1a Stepping = 5 Features=0xf83fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,SS Features2=0x80982201SSE3,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,HV AMD Features=0x28100800SYSCALL,NX,RDTSCP,LM AMD Features2=0x1LAHF real memory = 1073741824 (1024 MB) avail memory = 1011343360 (964 MB) Event timer LAPIC quality 400 ACPI APIC Table: BOCHS BXPCAPIC so just prior to probing of SMP. This also applies to FreeBSD-7.3-RELEASE-amd64 and FreeBSD-9.1-RC3-amd64 (other releases not tested). When quemu-kvm is started without SMP (1 vpcu) amd64 FreeBSD kernel boots correctly. I did not notice this problem (SMP) for the i386 versions (FreeBSD-7.3-RELEASE-i386, FreeBSD-9.0-RELEASE-i386, FreeBSD-9.1-RC3-i386). Additional info: - KVM Host OS: Fedora Core 17 - CPUs on my KVM host -- Xeons X5570 # cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 26 model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz stepping: 5 microcode : 0x11 cpu MHz : 2926.183 cache size : 8192 KB physical id : 1 siblings: 8 core id : 0 cpu cores : 4 apicid : 16 initial apicid : 16 fpu : yes fpu_exception : yes cpuid level : 11 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm ida dtherm tpr_shadow vnmi flexpriority ept vpid bogomips: 5852.36 clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: - kernel (from FC17 repo): 3.6.9 (kernel-3.6.9-2.fc17.x86_64) - quemu version: qemu-kvm 1.0.1 (qemu-kvm-1.0.1-2.fc17.x86_64) - neither the -no-kvm-irqchip nor -no-kvm-pit switch helps - with he -no-kvm switch FreeBSD boots correctly - linux guest (x86_64 with SMP) works perfectly ok I suspect that this bug is related in some way with the hardware. I tested the same KVM-host system (exact clone) with the same guest (FreeBSD-amd64) on another machine (i3-2120 workstation) and have not noticed a similar problems witch SMP. I will be grateful for any hints. Regards, Artur Samborski -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit
On 01/04/2013 05:41:42 PM, Alexander Graf wrote: @@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, set_guest_esr(vcpu, vcpu-arch.queued_esr); if (update_dear == true) set_guest_dear(vcpu, vcpu-arch.queued_dear); + if (update_epr == true) { + kvm_make_request(KVM_REQ_EPR_EXIT, vcpu); + /* Indicate that we want to recheck requests */ + allowed = 2; + } We shouldn't need allowed = 2 anymore. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
On Mon, Jan 07, 2013 at 11:52:21AM -0200, Marcelo Tosatti wrote: On Mon, Jan 07, 2013 at 10:02:36AM +0800, Yang Zhang wrote: From: Yang Zhang yang.z.zh...@intel.com Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Signed-off-by: Kevin Tian kevin.t...@intel.com Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- arch/ia64/kvm/lapic.h |6 ++ arch/x86/include/asm/kvm_host.h |8 ++ arch/x86/include/asm/vmx.h | 11 +++ arch/x86/kvm/irq.c | 56 +++- arch/x86/kvm/lapic.c| 87 +++--- arch/x86/kvm/lapic.h| 29 +- arch/x86/kvm/svm.c | 36 arch/x86/kvm/vmx.c | 190 ++- arch/x86/kvm/x86.c | 11 ++- include/linux/kvm_host.h|2 + virt/kvm/ioapic.c | 41 + virt/kvm/ioapic.h |1 + virt/kvm/irq_comm.c | 20 13 files changed, 451 insertions(+), 47 deletions(-) diff --git a/arch/ia64/kvm/lapic.h b/arch/ia64/kvm/lapic.h index c5f92a9..cb59eb4 100644 --- a/arch/ia64/kvm/lapic.h +++ b/arch/ia64/kvm/lapic.h @@ -27,4 +27,10 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq); #define kvm_apic_present(x) (true) #define kvm_lapic_enabled(x) (true) +static inline void kvm_update_eoi_exitmap(struct kvm *kvm, + struct kvm_lapic_irq *irq) +{ + /* IA64 has no apicv supporting, do nothing here */ +} + #endif diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..135603f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -697,6 +697,13 @@ struct kvm_x86_ops { void (*enable_nmi_window)(struct kvm_vcpu *vcpu); void (*enable_irq_window)(struct kvm_vcpu *vcpu); void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr); + int (*has_virtual_interrupt_delivery)(struct kvm_vcpu *vcpu); + void (*update_apic_irq)(struct kvm_vcpu *vcpu, int max_irr); + void (*update_eoi_exitmap)(struct kvm *kvm, struct kvm_lapic_irq *irq); + void (*update_exitmap_start)(struct kvm_vcpu *vcpu); + void (*update_exitmap_end)(struct kvm_vcpu *vcpu); + void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu); + void (*restore_rvi)(struct kvm_vcpu *vcpu); int (*set_tss_addr)(struct kvm *kvm, unsigned int addr); int (*get_tdp_level)(void); u64 (*get_mt_mask)(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio); @@ -991,6 +998,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long hva); int kvm_test_age_hva(struct kvm *kvm, unsigned long hva); void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte); int cpuid_maxphyaddr(struct kvm_vcpu *vcpu); +int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v); int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu); int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 44c3f7e..d1ab331 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -62,6 +62,7 @@ #define EXIT_REASON_MCE_DURING_VMENTRY 41 #define EXIT_REASON_TPR_BELOW_THRESHOLD 43 #define EXIT_REASON_APIC_ACCESS 44 +#define EXIT_REASON_EOI_INDUCED 45 #define EXIT_REASON_EPT_VIOLATION 48 #define EXIT_REASON_EPT_MISCONFIG 49 #define EXIT_REASON_WBINVD 54 @@ -143,6 +144,7 @@ #define SECONDARY_EXEC_WBINVD_EXITING 0x0040 #define SECONDARY_EXEC_UNRESTRICTED_GUEST 0x0080 #define SECONDARY_EXEC_APIC_REGISTER_VIRT 0x0100 +#define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY0x0200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x0400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x1000 @@ -180,6 +182,7 @@ enum vmcs_field { GUEST_GS_SELECTOR = 0x080a, GUEST_LDTR_SELECTOR = 0x080c, GUEST_TR_SELECTOR = 0x080e, + GUEST_INTR_STATUS = 0x0810, HOST_ES_SELECTOR= 0x0c00, HOST_CS_SELECTOR= 0x0c02, HOST_SS_SELECTOR= 0x0c04, @@ -207,6 +210,14 @@
Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit
On 07.01.2013, at 18:47, Scott Wood wrote: On 01/04/2013 05:41:42 PM, Alexander Graf wrote: @@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, set_guest_esr(vcpu, vcpu-arch.queued_esr); if (update_dear == true) set_guest_dear(vcpu, vcpu-arch.queued_dear); +if (update_epr == true) { +kvm_make_request(KVM_REQ_EPR_EXIT, vcpu); +/* Indicate that we want to recheck requests */ +allowed = 2; +} We shouldn't need allowed = 2 anymore. Thanks for noticing :). I already removed this one locally today but wanted to wait for further comments on the series before sending out a v3. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH qom-cpu 00/11] disable-kvm_mmu + -cpu check/enforce fixes (v2)
Am 04.01.2013 23:01, schrieb Eduardo Habkost: Eduardo Habkost (11): [...] target-i386: kvm: -cpu host: Use GET_SUPPORTED_CPUID for SVM features target-i386: kvm: Enable all supported KVM features for -cpu host target-i386: check/enforce: Fix CPUID leaf numbers on error messages target-i386: check/enforce: Do not ignore hypervisor flag target-i386: check/enforce: Check all CPUID.8001H.EDX bits target-i386: check/enforce: Check SVM flag support as well target-i386: check/enforce: Eliminate check_feat field [snip] Thanks, applied patches 3-9 to qom-cpu queue (fixing some typos in commit messages): https://github.com/afaerber/qemu-cpu/commits/qom-cpu Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] kvm tools: arm: make .dtb dumping a command-line option
It can sometimes be useful to dump the .dtb file generated by kvmtool when debugging a guest. Currently, this is achieved by rebuilding the tool and changing some #defines, which is fairly clumsy to use. This patch adds a new command-line option for ARM, allowing the dtb to be dumped to a named file at runtime. Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/arm/fdt.c | 18 ++ tools/kvm/arm/include/kvm/kvm-config-arch.h | 8 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/tools/kvm/arm/fdt.c b/tools/kvm/arm/fdt.c index c7f4b52..e52c10c 100644 --- a/tools/kvm/arm/fdt.c +++ b/tools/kvm/arm/fdt.c @@ -13,9 +13,6 @@ #include linux/kernel.h #include linux/sizes.h -#define DEBUG 0 -#define DEBUG_FDT_DUMP_FILE/tmp/kvmtool.dtb - static char kern_cmdline[COMMAND_LINE_SIZE]; bool kvm__load_firmware(struct kvm *kvm, const char *firmware_filename) @@ -28,25 +25,21 @@ int kvm__arch_setup_firmware(struct kvm *kvm) return 0; } -#if DEBUG -static void dump_fdt(void *fdt) +static void dump_fdt(const char *dtb_file, void *fdt) { int count, fd; - fd = open(DEBUG_FDT_DUMP_FILE, O_CREAT | O_TRUNC | O_RDWR, 0666); + fd = open(dtb_file, O_CREAT | O_TRUNC | O_RDWR, 0666); if (fd 0) - die(Failed to write dtb to %s, DEBUG_FDT_DUMP_FILE); + die(Failed to write dtb to %s, dtb_file); count = write(fd, fdt, FDT_MAX_SIZE); if (count 0) die_perror(Failed to dump dtb); - pr_info(Wrote %d bytes to dtb %s\n, count, DEBUG_FDT_DUMP_FILE); + pr_info(Wrote %d bytes to dtb %s\n, count, dtb_file); close(fd); } -#else -static void dump_fdt(void *fdt) { } -#endif #define DEVICE_NAME_MAX_LEN 32 static void generate_virtio_mmio_node(void *fdt, struct virtio_mmio *vmmio) @@ -143,7 +136,8 @@ static int setup_fdt(struct kvm *kvm) _FDT(fdt_open_into(fdt, fdt_dest, FDT_MAX_SIZE)); _FDT(fdt_pack(fdt_dest)); - dump_fdt(fdt_dest); + if (kvm-cfg.arch.dump_dtb_filename) + dump_fdt(kvm-cfg.arch.dump_dtb_filename, fdt_dest); return 0; } late_init(setup_fdt); diff --git a/tools/kvm/arm/include/kvm/kvm-config-arch.h b/tools/kvm/arm/include/kvm/kvm-config-arch.h index 60f61de..f63f302 100644 --- a/tools/kvm/arm/include/kvm/kvm-config-arch.h +++ b/tools/kvm/arm/include/kvm/kvm-config-arch.h @@ -1,7 +1,15 @@ #ifndef KVM__KVM_CONFIG_ARCH_H #define KVM__KVM_CONFIG_ARCH_H +#include kvm/parse-options.h + struct kvm_config_arch { + const char *dump_dtb_filename; }; +#define OPT_ARCH_RUN(pfx, cfg) \ + pfx,\ + OPT_STRING('\0', dump-dtb, (cfg)-dump_dtb_filename, \ + .dtb file, Dump generated .dtb to specified file), + #endif /* KVM__KVM_CONFIG_ARCH_H */ -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] ARM updates for kvmtool
Hello kvm hackers, This patch series introduces some updates to the ARM (AArch32) kvm tools code: - virtio mmio fixes to deal with guest page sizes != 4k (in preparation for AArch64, which I will post separately). - .dtb dumping via the lkvm command line - Support for PSCI firmware as a replacement to the spin-table based SMP boot code The last option was implemented after discussion on the linux-arm-kernel list when adding support for the mach-virt platform. I hope to upstream the kernel-side part of the implementation for 3.9 and expect the kvm bits to follow once that has been merged. All feedback welcome. Will Will Deacon (4): kvm tools: virtio: remove hardcoded assumptions about guest page size kvm tools: pedantry: fix annoying typo kvm tools: arm: make .dtb dumping a command-line option kvm tools: arm: add support for PSCI firmware in place of spin-tables tools/kvm/Makefile | 5 +- tools/kvm/arm/aarch32/cortex-a15.c | 8 +-- tools/kvm/arm/aarch32/include/kvm/kvm-arch.h | 1 - tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h | 12 + tools/kvm/arm/aarch32/kvm-cpu.c| 59 ++ tools/kvm/arm/aarch32/smp-pen.S| 39 -- tools/kvm/arm/fdt.c| 54 +++- tools/kvm/arm/include/arm-common/gic.h | 2 - tools/kvm/arm/include/arm-common/kvm-arch.h| 5 -- .../arm/include/{kvm = arm-common}/kvm-cpu-arch.h | 6 +-- tools/kvm/arm/include/kvm/kvm-config-arch.h| 8 +++ tools/kvm/arm/kvm-cpu.c| 4 +- tools/kvm/arm/kvm.c| 1 + tools/kvm/arm/smp.c| 21 tools/kvm/include/kvm/virtio.h | 14 + tools/kvm/kvm.c| 2 +- tools/kvm/virtio/9p.c | 7 +-- tools/kvm/virtio/balloon.c | 7 +-- tools/kvm/virtio/blk.c | 7 +-- tools/kvm/virtio/console.c | 7 +-- tools/kvm/virtio/mmio.c| 8 +-- tools/kvm/virtio/net.c | 8 +-- tools/kvm/virtio/pci.c | 4 +- tools/kvm/virtio/rng.c | 7 +-- tools/kvm/virtio/scsi.c| 7 +-- 25 files changed, 114 insertions(+), 189 deletions(-) create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h delete mode 100644 tools/kvm/arm/aarch32/smp-pen.S rename tools/kvm/arm/include/{kvm = arm-common}/kvm-cpu-arch.h (87%) delete mode 100644 tools/kvm/arm/smp.c -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] kvm tools: pedantry: fix annoying typo
s/extention/extension/ I should get out more... Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/kvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/kvm/kvm.c b/tools/kvm/kvm.c index 3ea6339..a6b3c23 100644 --- a/tools/kvm/kvm.c +++ b/tools/kvm/kvm.c @@ -291,7 +291,7 @@ int kvm__init(struct kvm *kvm) } if (kvm__check_extensions(kvm)) { - pr_err(A required KVM extention is not supported by OS); + pr_err(A required KVM extension is not supported by OS); ret = -ENOSYS; goto err_vm_fd; } -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] kvm tools: virtio: remove hardcoded assumptions about guest page size
virtio-based PCI devices deal only with 4k memory granules, making direct use of the VIRTIO_PCI_VRING_ALIGN and VIRTIO_PCI_QUEUE_ADDR_SHIFT constants when initialising the virtqueues for a device. For MMIO-based devices, the guest page size is arbitrary and may differ from that of the host (this is the case on AArch64, where both 4k and 64k pages are supported). This patch fixes the virtio drivers to honour the guest page size passed when configuring the virtio device and align the virtqueues accordingly. Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/include/kvm/virtio.h | 14 ++ tools/kvm/virtio/9p.c | 7 --- tools/kvm/virtio/balloon.c | 7 --- tools/kvm/virtio/blk.c | 7 --- tools/kvm/virtio/console.c | 7 --- tools/kvm/virtio/mmio.c| 8 tools/kvm/virtio/net.c | 8 tools/kvm/virtio/pci.c | 4 +++- tools/kvm/virtio/rng.c | 7 --- tools/kvm/virtio/scsi.c| 7 --- 10 files changed, 37 insertions(+), 39 deletions(-) diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index 5dc2544..924279b 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -43,17 +43,6 @@ static inline bool virt_queue__available(struct virt_queue *vq) return vq-vring.avail-idx != vq-last_avail_idx; } -/* - * Warning: on 32-bit hosts, shifting pfn left may cause a truncation of pfn values - * higher than 4GB - thus, pointing to the wrong area in guest virtual memory space - * and breaking the virt queue which owns this pfn. - */ -static inline void *guest_pfn_to_host(struct kvm *kvm, u32 pfn) -{ - return guest_flat_to_host(kvm, (unsigned long)pfn VIRTIO_PCI_QUEUE_ADDR_SHIFT); -} - - struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len); bool virtio_queue__should_signal(struct virt_queue *vq); @@ -81,7 +70,8 @@ struct virtio_ops { u8 *(*get_config)(struct kvm *kvm, void *dev); u32 (*get_host_features)(struct kvm *kvm, void *dev); void (*set_guest_features)(struct kvm *kvm, void *dev, u32 features); - int (*init_vq)(struct kvm *kvm, void *dev, u32 vq, u32 pfn); + int (*init_vq)(struct kvm *kvm, void *dev, u32 vq, u32 page_size, + u32 align, u32 pfn); int (*notify_vq)(struct kvm *kvm, void *dev, u32 vq); int (*get_pfn_vq)(struct kvm *kvm, void *dev, u32 vq); int (*get_size_vq)(struct kvm *kvm, void *dev, u32 vq); diff --git a/tools/kvm/virtio/9p.c b/tools/kvm/virtio/9p.c index 4665876..60865dd 100644 --- a/tools/kvm/virtio/9p.c +++ b/tools/kvm/virtio/9p.c @@ -1254,7 +1254,8 @@ static void set_guest_features(struct kvm *kvm, void *dev, u32 features) p9dev-features = features; } -static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn) +static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, + u32 pfn) { struct p9_dev *p9dev = dev; struct p9_dev_job *job; @@ -1265,10 +1266,10 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn) queue = p9dev-vqs[vq]; queue-pfn = pfn; - p = guest_pfn_to_host(kvm, queue-pfn); + p = guest_flat_to_host(kvm, queue-pfn * page_size); job = p9dev-jobs[vq]; - vring_init(queue-vring, VIRTQUEUE_NUM, p, VIRTIO_PCI_VRING_ALIGN); + vring_init(queue-vring, VIRTQUEUE_NUM, p, align); *job= (struct p9_dev_job) { .vq = queue, diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c index 9edce87..d1b64fa 100644 --- a/tools/kvm/virtio/balloon.c +++ b/tools/kvm/virtio/balloon.c @@ -193,7 +193,8 @@ static void set_guest_features(struct kvm *kvm, void *dev, u32 features) bdev-features = features; } -static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn) +static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 page_size, u32 align, + u32 pfn) { struct bln_dev *bdev = dev; struct virt_queue *queue; @@ -203,10 +204,10 @@ static int init_vq(struct kvm *kvm, void *dev, u32 vq, u32 pfn) queue = bdev-vqs[vq]; queue-pfn = pfn; - p = guest_pfn_to_host(kvm, queue-pfn); + p = guest_flat_to_host(kvm, queue-pfn * page_size); thread_pool__init_job(bdev-jobs[vq], kvm, virtio_bln_do_io, queue); - vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, VIRTIO_PCI_VRING_ALIGN); + vring_init(queue-vring, VIRTIO_BLN_QUEUE_SIZE, p, align); return 0; } diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c index ec57e96..44ac44b 100644 --- a/tools/kvm/virtio/blk.c +++ b/tools/kvm/virtio/blk.c @@ -156,7 +156,8 @@ static void set_guest_features(struct kvm *kvm, void *dev, u32 features)
[PATCH 4/4] kvm tools: arm: add support for PSCI firmware in place of spin-tables
ARM has recently published a document describing a firmware interface for CPU power management, which can be used for booting secondary cores on an SMP platform, amongst other things. As part of the mach-virt upstreaming for the kernel (that is, the virtual platform targetted by kvmtool), it was suggested that we use this interface instead of the current spin-table based approach. This patch implements PSCI support in kvmtool for ARM, removing a fair amount of code in the process. Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/Makefile | 5 +- tools/kvm/arm/aarch32/cortex-a15.c | 8 +-- tools/kvm/arm/aarch32/include/kvm/kvm-arch.h | 1 - tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h | 12 + tools/kvm/arm/aarch32/kvm-cpu.c| 59 ++ tools/kvm/arm/aarch32/smp-pen.S| 39 -- tools/kvm/arm/fdt.c| 36 + tools/kvm/arm/include/arm-common/gic.h | 2 - tools/kvm/arm/include/arm-common/kvm-arch.h| 5 -- .../arm/include/{kvm = arm-common}/kvm-cpu-arch.h | 6 +-- tools/kvm/arm/kvm-cpu.c| 4 +- tools/kvm/arm/kvm.c| 1 + tools/kvm/arm/smp.c| 21 13 files changed, 62 insertions(+), 137 deletions(-) create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h delete mode 100644 tools/kvm/arm/aarch32/smp-pen.S rename tools/kvm/arm/include/{kvm = arm-common}/kvm-cpu-arch.h (87%) delete mode 100644 tools/kvm/arm/smp.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index a83dd10..33aa4d8 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -160,18 +160,15 @@ endif # ARM OBJS_ARM_COMMON:= arm/fdt.o arm/gic.o arm/ioport.o arm/irq.o \ - arm/kvm.o arm/kvm-cpu.o arm/smp.o + arm/kvm.o arm/kvm-cpu.o HDRS_ARM_COMMON:= arm/include ifeq ($(ARCH), arm) DEFINES += -DCONFIG_ARM OBJS+= $(OBJS_ARM_COMMON) OBJS+= arm/aarch32/cortex-a15.o OBJS+= arm/aarch32/kvm-cpu.o - OBJS+= arm/aarch32/smp-pen.o ARCH_INCLUDE:= $(HDRS_ARM_COMMON) ARCH_INCLUDE+= -Iarm/aarch32/include - ASFLAGS += -D__ASSEMBLY__ - ASFLAGS += -I$(ARCH_INCLUDE) CFLAGS += -march=armv7-a CFLAGS += -I../../scripts/dtc/libfdt OTHEROBJS += $(LIBFDT_OBJS) diff --git a/tools/kvm/arm/aarch32/cortex-a15.c b/tools/kvm/arm/aarch32/cortex-a15.c index eac0bb9..8031747 100644 --- a/tools/kvm/arm/aarch32/cortex-a15.c +++ b/tools/kvm/arm/aarch32/cortex-a15.c @@ -31,12 +31,8 @@ static void generate_cpu_nodes(void *fdt, struct kvm *kvm) _FDT(fdt_property_string(fdt, device_type, cpu)); _FDT(fdt_property_string(fdt, compatible, arm,cortex-a15)); - if (kvm-nrcpus 1) { - _FDT(fdt_property_string(fdt, enable-method, -spin-table)); - _FDT(fdt_property_cell(fdt, cpu-release-addr, - kvm-arch.smp_jump_guest_start)); - } + if (kvm-nrcpus 1) + _FDT(fdt_property_string(fdt, enable-method, psci)); _FDT(fdt_property_cell(fdt, reg, cpu)); _FDT(fdt_end_node(fdt)); diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h index f236895..ca79b24 100644 --- a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h +++ b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h @@ -15,7 +15,6 @@ #define ARM_KERN_OFFSET0x8000 -#define ARM_SMP_PEN_SIZE PAGE_SIZE #define ARM_VIRTIO_MMIO_SIZE (ARM_GIC_DIST_BASE - ARM_LOMAP_MMIO_AREA) #define ARM_PCI_MMIO_SIZE (ARM_LOMAP_MEMORY_AREA - ARM_LOMAP_AXI_AREA) diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h b/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h new file mode 100644 index 000..b9fda07 --- /dev/null +++ b/tools/kvm/arm/aarch32/include/kvm/kvm-cpu-arch.h @@ -0,0 +1,12 @@ +#ifndef KVM__KVM_CPU_ARCH_H +#define KVM__KVM_CPU_ARCH_H + +#include kvm/kvm.h + +#include arm-common/kvm-cpu-arch.h + +#define ARM_VCPU_FEATURE_FLAGS(kvm, cpuid) { \ + [0] = (!!(cpuid) KVM_ARM_VCPU_POWER_OFF),\ +} + +#endif /* KVM__KVM_CPU_ARCH_H */ diff --git a/tools/kvm/arm/aarch32/kvm-cpu.c b/tools/kvm/arm/aarch32/kvm-cpu.c index f00a2f1..a528789 100644 --- a/tools/kvm/arm/aarch32/kvm-cpu.c +++ b/tools/kvm/arm/aarch32/kvm-cpu.c @@ -21,38 +21,33 @@ void kvm_cpu__reset_vcpu(struct kvm_cpu *vcpu) if (ioctl(vcpu-vcpu_fd, KVM_SET_ONE_REG, reg) 0)
[PATCH qom-cpu 1/7] kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code
Any KVM-specific code that use these constants must check if kvm_enabled() is true before using them. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com --- include/sysemu/kvm.h | 14 ++ 1 file changed, 14 insertions(+) diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 3db19ff..15f9658 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -21,6 +21,20 @@ #ifdef CONFIG_KVM #include linux/kvm.h #include linux/kvm_para.h +#else +/* These constants must never be used at runtime if kvm_enabled() is false. + * They exist so we don't need #ifdefs around KVM-specific code that already + * checks kvm_enabled() properly. + */ +#define KVM_CPUID_SIGNATURE 0 +#define KVM_CPUID_FEATURES 0 +#define KVM_FEATURE_CLOCKSOURCE 0 +#define KVM_FEATURE_NOP_IO_DELAY 0 +#define KVM_FEATURE_MMU_OP 0 +#define KVM_FEATURE_CLOCKSOURCE2 0 +#define KVM_FEATURE_ASYNC_PF 0 +#define KVM_FEATURE_STEAL_TIME 0 +#define KVM_FEATURE_PV_EOI 0 #endif extern int kvm_allowed; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 0/7] disable kvm_mmu + -cpu enforce fixes (v3)
Changes on v3: - Patches 3-9 from v2 are now already on qom-cpu tree - Remove CONFIG_KVM #ifdefs by declaring fake KVM_* #defines on sysemu/kvm.h - Refactor code that uses the feature word arrays (to make it easier to add a new feature name array) - Add feature name array for CPUID leaf 0xC001 Changes on v2: - Now both the kvm_mmu-disable and -cpu enforce changes are on the same series - Coding style fixes Git tree for reference: git://github.com/ehabkost/qemu-hacks.git cpu-enforce-all.v3 https://github.com/ehabkost/qemu-hacks/tree/cpu-enforce-all.v3 The changes are a bit intrusive, but: - The longer we take to make enforce strict as it should (and make libvirt finally use it), more users will have VMs with migration-unsafe unpredictable guest ABIs. For this reason, I would like to get this into QEMU 1.4. - The changes in this series should affect only users that are already using the enforce flag, and I believe whoever is using the enforce flag really want the strict behavior introduced by this series. Eduardo Habkost (7): kvm: Add fake KVM constants to avoid #ifdefs on KVM-specific code target-i386: Don't set any KVM flag by default if KVM is disabled target-i386: Disable kvm_mmu by default target-i386/cpu: Introduce FeatureWord typedefs target-i386: kvm_check_features_against_host(): Use feature_word_info target-i386/cpu.c: Add feature name array for ext4_features target-i386: check/enforce: Check all feature words include/sysemu/kvm.h | 14 target-i386/cpu.c| 193 --- target-i386/cpu.h| 15 3 files changed, 150 insertions(+), 72 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 2/7] target-i386: Don't set any KVM flag by default if KVM is disabled
This is a cleanup that tries to solve two small issues: - We don't need a separate kvm_pv_eoi_features variable just to keep a constant calculated at compile-time, and this style would require adding a separate variable (that's declared twice because of the CONFIG_KVM ifdef) for each feature that's going to be enabled/disable by machine-type compat code. - The pc-1.3 code is setting the kvm_pv_eoi flag on cpuid_kvm_features even when KVM is disabled at runtime. This small incosistency in the cpuid_kvm_features field isn't a problem today because cpuid_kvm_features is ignored by the TCG code, but it may cause unexpected problems later when refactoring the CPUID handling code. This patch eliminates the kvm_pv_eoi_features variable and simply uses kvm_enabled() inside the enable_kvm_pv_eoi() compat function, so it enables kvm_pv_eoi only if KVM is enabled. I believe this makes the behavior of enable_kvm_pv_eoi() clearer and easier to understand. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Changes v2: - Coding style fix Changes v3: - Eliminate #ifdef by using the fake KVM_FEATURE_PV_EOI #define --- target-i386/cpu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 951e206..40400ac 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -164,15 +164,15 @@ static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | (1 KVM_FEATURE_CLOCKSOURCE_STABLE_BIT); -static const uint32_t kvm_pv_eoi_features = (0x1 KVM_FEATURE_PV_EOI); #else static uint32_t kvm_default_features = 0; -static const uint32_t kvm_pv_eoi_features = 0; #endif void enable_kvm_pv_eoi(void) { -kvm_default_features |= kvm_pv_eoi_features; +if (kvm_enabled()) { +kvm_default_features |= (1UL KVM_FEATURE_PV_EOI); +} } void host_cpuid(uint32_t function, uint32_t count, -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 6/7] target-i386/cpu.c: Add feature name array for ext4_features
Feature names were taken from the X86_FEATURE_* constants in the Linux kernel code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com --- target-i386/cpu.c | 17 + 1 file changed, 17 insertions(+) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 4b3ee63..a54c464 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -95,6 +95,17 @@ static const char *ext3_feature_name[] = { NULL, NULL, NULL, NULL, }; +static const char *ext4_feature_name[] = { +NULL, NULL, xstore,xstore-en, +NULL, NULL, xcrypt,xcrypt-en, +ace2, ace2-en, phe, phe-en, +pmm, pmm-en, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +}; + static const char *kvm_feature_name[] = { kvmclock, kvm_nopiodelay, kvm_mmu, kvmclock, kvm_asyncpf, kvm_steal_time, kvm_pv_eoi, NULL, @@ -147,6 +158,10 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { .feat_names = ext3_feature_name, .cpuid_eax = 0x8001, .cpuid_reg = R_ECX, }, +[FEAT_C000_0001_EDX] = { +.feat_names = ext4_feature_name, +.cpuid_eax = 0xC001, .cpuid_reg = R_EDX, +}, [FEAT_KVM] = { .feat_names = kvm_feature_name, .cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EAX, @@ -1412,6 +1427,7 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-ext_features |= plus_features[FEAT_1_ECX]; x86_cpu_def-ext2_features |= plus_features[FEAT_8000_0001_EDX]; x86_cpu_def-ext3_features |= plus_features[FEAT_8000_0001_ECX]; +x86_cpu_def-ext4_features |= plus_features[FEAT_C000_0001_EDX]; x86_cpu_def-kvm_features |= plus_features[FEAT_KVM]; x86_cpu_def-svm_features |= plus_features[FEAT_SVM]; x86_cpu_def-cpuid_7_0_ebx_features |= plus_features[FEAT_7_0_EBX]; @@ -1419,6 +1435,7 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) x86_cpu_def-ext_features = ~minus_features[FEAT_1_ECX]; x86_cpu_def-ext2_features = ~minus_features[FEAT_8000_0001_EDX]; x86_cpu_def-ext3_features = ~minus_features[FEAT_8000_0001_ECX]; +x86_cpu_def-ext4_features = ~minus_features[FEAT_C000_0001_EDX]; x86_cpu_def-kvm_features = ~minus_features[FEAT_KVM]; x86_cpu_def-svm_features = ~minus_features[FEAT_SVM]; x86_cpu_def-cpuid_7_0_ebx_features = ~minus_features[FEAT_7_0_EBX]; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 3/7] target-i386: Disable kvm_mmu by default
KVM_CAP_PV_MMU capability reporting was removed from the kernel since v2.6.33 (see commit a68a6a7282373), and was completely removed from the kernel since v3.3 (see commit fb92045843). It doesn't make sense to keep it enabled by default, as it would cause unnecessary hassle when using the enforce flag. This disables kvm_mmu on all machine-types. With this fix, the possible scenarios when migrating from QEMU = 1.3 to QEMU 1.4 are; ++ src kernel | dst kernel | Result ++ = 2.6.33 | any| kvm_mmu was already disabled and will stay disabled = 2.6.32 | = 3.3 | correct live migration is impossible = 2.6.32 | = 3.2 | kvm_mmu will be disabled on next guest reboot * ++ * If they are running kernel = 2.6.32 and want kvm_mmu to be kept enabled on guest reboot, they can explicitly add +kvm_mmu to the QEMU command-line. Using 2.6.33 and higher, it is not possible to enable kvm_mmu explicitly anymore. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: kvm@vger.kernel.org Cc: Michael S. Tsirkin m...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com Changes v2: - Coding style fix - Removed redundant comments above machine init functions Changes v3: - Eliminate per-machine-type compatibility code --- target-i386/cpu.c | 1 - 1 file changed, 1 deletion(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 40400ac..b09b625 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -159,7 +159,6 @@ int enforce_cpuid = 0; #if defined(CONFIG_KVM) static uint32_t kvm_default_features = (1 KVM_FEATURE_CLOCKSOURCE) | (1 KVM_FEATURE_NOP_IO_DELAY) | -(1 KVM_FEATURE_MMU_OP) | (1 KVM_FEATURE_CLOCKSOURCE2) | (1 KVM_FEATURE_ASYNC_PF) | (1 KVM_FEATURE_STEAL_TIME) | -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 5/7] target-i386: kvm_check_features_against_host(): Use feature_word_info
Instead of carrying the CPUID leaf/register and feature name array on the model_features_t struct, move that information into feature_word_info so it can be reused by other functions. The goal is to eventually kill model_features_t entirely, but to do that we have to either convert x86_def_t.features to an array or use offsetof() inside FeatureWordInfo (to replace the pointers inside model_features_t). So by now just move most of the model_features_t fields to FeatureWordInfo except for the two pointers to local arguments. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 73 +-- 1 file changed, 49 insertions(+), 24 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 7d62d48..4b3ee63 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -126,16 +126,39 @@ static const char *cpuid_7_0_ebx_feature_name[] = { typedef struct FeatureWordInfo { const char **feat_names; +uint32_t cpuid_eax; /* Input EAX for CPUID */ +int cpuid_reg; /* R_* register constant */ } FeatureWordInfo; static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { -[FEAT_1_EDX] = { .feat_names = feature_name }, -[FEAT_1_ECX] = { .feat_names = ext_feature_name }, -[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name }, -[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name }, -[FEAT_KVM] = { .feat_names = kvm_feature_name }, -[FEAT_SVM] = { .feat_names = svm_feature_name }, -[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name }, +[FEAT_1_EDX] = { +.feat_names = feature_name, +.cpuid_eax = 1, .cpuid_reg = R_EDX, +}, +[FEAT_1_ECX] = { +.feat_names = ext_feature_name, +.cpuid_eax = 1, .cpuid_reg = R_ECX, +}, +[FEAT_8000_0001_EDX] = { +.feat_names = ext2_feature_name, +.cpuid_eax = 0x8001, .cpuid_reg = R_EDX, +}, +[FEAT_8000_0001_ECX] = { +.feat_names = ext3_feature_name, +.cpuid_eax = 0x8001, .cpuid_reg = R_ECX, +}, +[FEAT_KVM] = { +.feat_names = kvm_feature_name, +.cpuid_eax = KVM_CPUID_FEATURES, .cpuid_reg = R_EAX, +}, +[FEAT_SVM] = { +.feat_names = svm_feature_name, +.cpuid_eax = 0x800A, .cpuid_reg = R_EDX, +}, +[FEAT_7_0_EBX] = { +.feat_names = cpuid_7_0_ebx_feature_name, +.cpuid_eax = 7, .cpuid_reg = R_EBX, +}, }; const char *get_register_name_32(unsigned int reg) @@ -162,9 +185,7 @@ const char *get_register_name_32(unsigned int reg) typedef struct model_features_t { uint32_t *guest_feat; uint32_t *host_feat; -const char **flag_names; -uint32_t cpuid; -int reg; +FeatureWord feat_word; } model_features_t; int check_cpuid = 0; @@ -935,19 +956,19 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def) #endif /* CONFIG_KVM */ } -static int unavailable_host_feature(struct model_features_t *f, uint32_t mask) +static int unavailable_host_feature(FeatureWordInfo *f, uint32_t mask) { int i; for (i = 0; i 32; ++i) if (1 i mask) { -const char *reg = get_register_name_32(f-reg); +const char *reg = get_register_name_32(f-cpuid_reg); assert(reg); fprintf(stderr, warning: host doesn't support requested feature: CPUID.%02XH:%s%s%s [bit %d]\n, -f-cpuid, reg, -f-flag_names[i] ? . : , -f-flag_names[i] ? f-flag_names[i] : , i); +f-cpuid_eax, reg, +f-feat_names[i] ? . : , +f-feat_names[i] ? f-feat_names[i] : , i); break; } return 0; @@ -965,25 +986,29 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) int rv, i; struct model_features_t ft[] = { {guest_def-features, host_def.features, -feature_name, 0x0001, R_EDX}, +FEAT_1_EDX }, {guest_def-ext_features, host_def.ext_features, -ext_feature_name, 0x0001, R_ECX}, +FEAT_1_ECX }, {guest_def-ext2_features, host_def.ext2_features, -ext2_feature_name, 0x8001, R_EDX}, +FEAT_8000_0001_EDX }, {guest_def-ext3_features, host_def.ext3_features, -ext3_feature_name, 0x8001, R_ECX} +FEAT_8000_0001_ECX }, }; assert(kvm_enabled()); kvm_cpu_fill_host(host_def); -for (rv = 0, i = 0; i ARRAY_SIZE(ft); ++i) -for (mask = 1; mask; mask = 1) +for (rv = 0, i = 0; i ARRAY_SIZE(ft); ++i) { +FeatureWord w = ft[i].feat_word; +FeatureWordInfo *wi = feature_word_info[w]; +for (mask = 1; mask; mask = 1) { if (*ft[i].guest_feat mask !(*ft[i].host_feat mask)) { -unavailable_host_feature(ft[i], mask); -rv = 1; -} +
[PATCH qom-cpu 7/7] target-i386: check/enforce: Check all feature words
This adds the following feature words to the list of flags to be checked by kvm_check_features_against_host(): - cpuid_7_0_ebx_features - ext4_features - kvm_features - svm_features This will ensure the enforce flag works as it should: it won't allow QEMU to be started unless every flag that was requested by the user or defined in the CPU model is supported by the host. This patch may cause existing configurations where enforce wasn't preventing QEMU from being started to abort QEMU. But that's exactly the point of this patch: if a flag was not supported by the host and QEMU wasn't aborting, it was a bug in the enforce code. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- Cc: Gleb Natapov g...@redhat.com Cc: Marcelo Tosatti mtosa...@redhat.com Cc: kvm@vger.kernel.org Cc: libvir-l...@redhat.com Cc: Jiri Denemark jdene...@redhat.com CCing libvirt people, as this is directly related to the planned usage of the enforce flag by libvirt. The libvirt team probably has a problem in their hands: libvirt should use enforce to make sure all requested flags are making their way into the guest (so the resulting CPU is always the same, on any host), but users may have existing working configurations where a flag is not supported by the guest and the user really doesn't care about it. Those configurations will necessarily break when libvirt starts using enforce. One example where it may cause trouble for common setups: pc-1.3 wants the kvm_pv_eoi flag enabled by default (so enforce will make sure it is enabled), but the user may have an existing VM running on a host without pv_eoi support. That setup is unsafe today because live-migration between different host kernel versions may enable/disable pv_eoi silently (that's why we need the enforce flag to be used by libvirt), but the user probably would like to be able to live-migrate that VM anyway (and have libvirt to just do the right thing). One possible solution to libvirt is to use enforce only on newer machine-types, so existing machines with older machine-types will keep the unsafe host-dependent-ABI behavior, but at least would keep live-migration working in case the user is careful. I really don't know what the libvirt team prefers, but that's the situation today. The longer we take to make enforce strict as it should and make libvirt finally use it, more users will have VMs with migration-unsafe unpredictable guest ABIs. Changes v2: - Coding style fix Changes v3: - Added ext4_feature_name array --- target-i386/cpu.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index a54c464..68cabcf 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -989,8 +989,9 @@ static int unavailable_host_feature(FeatureWordInfo *f, uint32_t mask) return 0; } -/* best effort attempt to inform user requested cpu flags aren't making - * their way to the guest. +/* Check if all requested cpu flags are making their way to the guest + * + * Returns 0 if all flags are supported by the host, non-zero otherwise. * * This function may be called only if KVM is enabled. */ @@ -1008,6 +1009,14 @@ static int kvm_check_features_against_host(x86_def_t *guest_def) FEAT_8000_0001_EDX }, {guest_def-ext3_features, host_def.ext3_features, FEAT_8000_0001_ECX }, +{guest_def-ext4_features, host_def.ext4_features, +FEAT_C000_0001_EDX }, +{guest_def-cpuid_7_0_ebx_features, host_def.cpuid_7_0_ebx_features, +FEAT_7_0_EBX }, +{guest_def-svm_features, host_def.svm_features, +FEAT_SVM }, +{guest_def-kvm_features, host_def.kvm_features, +FEAT_KVM }, }; assert(kvm_enabled()); -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu 4/7] target-i386/cpu: Introduce FeatureWord typedefs
This introduces a FeatureWord enum, FeatureWordInfo struct (with generation information about a feature word), and a FeatureWordArray typedef, and changes add_flagname_to_bitmaps() code and cpu_x86_parse_featurestr() to use the new typedefs instead of separate variables for each feature word. This will help us keep the code at kvm_check_features_against_host(), cpu_x86_parse_featurestr() and add_flagname_to_bitmaps() sane while adding new feature name arrays. Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- target-i386/cpu.c | 97 +++ target-i386/cpu.h | 15 + 2 files changed, 63 insertions(+), 49 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index b09b625..7d62d48 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -124,6 +124,20 @@ static const char *cpuid_7_0_ebx_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +typedef struct FeatureWordInfo { +const char **feat_names; +} FeatureWordInfo; + +static FeatureWordInfo feature_word_info[FEATURE_WORDS] = { +[FEAT_1_EDX] = { .feat_names = feature_name }, +[FEAT_1_ECX] = { .feat_names = ext_feature_name }, +[FEAT_8000_0001_EDX] = { .feat_names = ext2_feature_name }, +[FEAT_8000_0001_ECX] = { .feat_names = ext3_feature_name }, +[FEAT_KVM] = { .feat_names = kvm_feature_name }, +[FEAT_SVM] = { .feat_names = svm_feature_name }, +[FEAT_7_0_EBX] = { .feat_names = cpuid_7_0_ebx_feature_name }, +}; + const char *get_register_name_32(unsigned int reg) { static const char *reg_names[CPU_NB_REGS32] = { @@ -271,23 +285,20 @@ static bool lookup_feature(uint32_t *pval, const char *s, const char *e, return found; } -static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, -uint32_t *ext_features, -uint32_t *ext2_features, -uint32_t *ext3_features, -uint32_t *kvm_features, -uint32_t *svm_features, -uint32_t *cpuid_7_0_ebx_features) +static void add_flagname_to_bitmaps(const char *flagname, +FeatureWordArray words) { -if (!lookup_feature(features, flagname, NULL, feature_name) -!lookup_feature(ext_features, flagname, NULL, ext_feature_name) -!lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) -!lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) -!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) -!lookup_feature(svm_features, flagname, NULL, svm_feature_name) -!lookup_feature(cpuid_7_0_ebx_features, flagname, NULL, -cpuid_7_0_ebx_feature_name)) -fprintf(stderr, CPU feature %s not found\n, flagname); +FeatureWord w; +for (w = 0; w FEATURE_WORDS; w++) { +FeatureWordInfo *wi = feature_word_info[w]; +if (wi-feat_names +lookup_feature(words[w], flagname, NULL, wi-feat_names)) { +break; +} +} +if (w == FEATURE_WORDS) { +fprintf(stderr, CPU feature %s not found\n, flagname); +} } typedef struct x86_def_t { @@ -1256,35 +1267,23 @@ static int cpu_x86_parse_featurestr(x86_def_t *x86_cpu_def, char *features) unsigned int i; char *featurestr; /* Single 'key=value string being parsed */ /* Features to be added */ -uint32_t plus_features = 0, plus_ext_features = 0; -uint32_t plus_ext2_features = 0, plus_ext3_features = 0; -uint32_t plus_kvm_features = kvm_default_features, plus_svm_features = 0; -uint32_t plus_7_0_ebx_features = 0; +FeatureWordArray plus_features = { +[FEAT_KVM] = kvm_default_features, +}; /* Features to be removed */ -uint32_t minus_features = 0, minus_ext_features = 0; -uint32_t minus_ext2_features = 0, minus_ext3_features = 0; -uint32_t minus_kvm_features = 0, minus_svm_features = 0; -uint32_t minus_7_0_ebx_features = 0; +FeatureWordArray minus_features = { 0 }; uint32_t numvalue; -add_flagname_to_bitmaps(hypervisor, plus_features, -plus_ext_features, plus_ext2_features, plus_ext3_features, -plus_kvm_features, plus_svm_features, plus_7_0_ebx_features); +add_flagname_to_bitmaps(hypervisor, plus_features); featurestr = features ? strtok(features, ,) : NULL; while (featurestr) { char *val; if (featurestr[0] == '+') { -add_flagname_to_bitmaps(featurestr + 1, plus_features, -plus_ext_features, plus_ext2_features, -plus_ext3_features, plus_kvm_features, -plus_svm_features, plus_7_0_ebx_features); +add_flagname_to_bitmaps(featurestr + 1, plus_features); } else if
[PATCH 0/2] Add support for ARMv8 CPUs to kvmtool
Hello again, These two patches add support for ARMv8 processors running an AArch64 instance of kvm to kvmtool. Both AArch32 and AArch64 guests are supported and, in the case of the latter, the guest page size may be either 64k or 4k. This depends on the ARM updates series I just posted: https://lists.cs.columbia.edu/pipermail/kvmarm/2013-January/004505.html Feedback welcome, Will Will Deacon (2): kvm tools: add support for ARMv8 processors kvm tools: arm: align guest memory buffer to maximum page size tools/kvm/Makefile | 14 +- tools/kvm/arm/aarch32/include/kvm/kvm-arch.h | 20 +-- .../kvm/arm/aarch32/include/kvm/kvm-config-arch.h | 8 ++ tools/kvm/arm/aarch64/cortex-a57.c | 95 tools/kvm/arm/aarch64/include/kvm/barrier.h| 8 ++ tools/kvm/arm/aarch64/include/kvm/kvm-arch.h | 17 +++ .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h | 10 ++ tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h | 13 ++ tools/kvm/arm/aarch64/kvm-cpu.c| 160 + tools/kvm/arm/fdt.c| 2 +- tools/kvm/arm/include/arm-common/kvm-arch.h| 32 - .../include/{kvm = arm-common}/kvm-config-arch.h | 8 +- tools/kvm/arm/kvm.c| 26 +++- 13 files changed, 381 insertions(+), 32 deletions(-) create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h create mode 100644 tools/kvm/arm/aarch64/cortex-a57.c create mode 100644 tools/kvm/arm/aarch64/include/kvm/barrier.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/arm/aarch64/kvm-cpu.c rename tools/kvm/arm/include/{kvm = arm-common}/kvm-config-arch.h (61%) -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm tools: arm: align guest memory buffer to maximum page size
If we're running a guest with a larger page size than the host, interesting things start to happen when communicating via a virtio-mmio device because the idea of buffer alignment between the guest and the host will be off by the misalignment of the guest memory buffer allocated by the host. This causes things like the index field of vring.used to be accessed at different addresses on the guest and the host, leading to deadlock. Fix this problem by allocating guest memory aligned to the maximum possible page size for the architecture (64K). Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/arm/include/arm-common/kvm-arch.h | 10 ++ tools/kvm/arm/kvm.c | 24 ++-- 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/tools/kvm/arm/include/arm-common/kvm-arch.h b/tools/kvm/arm/include/arm-common/kvm-arch.h index 46ee7e2..7860e17 100644 --- a/tools/kvm/arm/include/arm-common/kvm-arch.h +++ b/tools/kvm/arm/include/arm-common/kvm-arch.h @@ -37,6 +37,16 @@ static inline bool arm_addr_in_pci_mmio_region(u64 phys_addr) } struct kvm_arch { + /* +* We may have to align the guest memory for virtio, so keep the +* original pointers here for munmap. +*/ + void*ram_alloc_start; + u64 ram_alloc_size; + + /* +* Guest addresses for memory layout. +*/ u64 memory_guest_start; u64 kern_guest_start; u64 initrd_guest_start; diff --git a/tools/kvm/arm/kvm.c b/tools/kvm/arm/kvm.c index 9eff927..1bcfce3 100644 --- a/tools/kvm/arm/kvm.c +++ b/tools/kvm/arm/kvm.c @@ -7,6 +7,7 @@ #include linux/kernel.h #include linux/kvm.h +#include linux/sizes.h struct kvm_ext kvm_req_ext[] = { { DEFINE_KVM_EXT(KVM_CAP_IRQCHIP) }, @@ -41,7 +42,7 @@ void kvm__init_ram(struct kvm *kvm) void kvm__arch_delete_ram(struct kvm *kvm) { - munmap(kvm-ram_start, kvm-ram_size); + munmap(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size); } void kvm__arch_periodic_poll(struct kvm *kvm) @@ -56,13 +57,24 @@ void kvm__arch_set_cmdline(char *cmdline, bool video) void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size) { - /* Allocate guest memory. */ + /* +* Allocate guest memory. We must align out buffer to 64K to +* correlate with the maximum guest page size for virtio-mmio. +*/ kvm-ram_size = min(ram_size, (u64)ARM_MAX_MEMORY(kvm)); - kvm-ram_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path, kvm-ram_size); - if (kvm-ram_start == MAP_FAILED) + kvm-arch.ram_alloc_size = kvm-ram_size + SZ_64K; + kvm-arch.ram_alloc_start = mmap_anon_or_hugetlbfs(kvm, hugetlbfs_path, + kvm-arch.ram_alloc_size); + + if (kvm-arch.ram_alloc_start == MAP_FAILED) die(Failed to map %lld bytes for guest memory (%d), - kvm-ram_size, errno); - madvise(kvm-ram_start, kvm-ram_size, MADV_MERGEABLE); + kvm-arch.ram_alloc_size, errno); + + kvm-ram_start = (void *)ALIGN((unsigned long)kvm-arch.ram_alloc_start, + SZ_64K); + + madvise(kvm-arch.ram_alloc_start, kvm-arch.ram_alloc_size, + MADV_MERGEABLE); /* Initialise the virtual GIC. */ if (gic__init_irqchip(kvm)) -- 1.8.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm tools: add support for ARMv8 processors
This patch adds support for ARMv8 processors (more specifically, Cortex-A57) to kvmtool. Both AArch64 and AArch32 guests are supported, so the existing AArch32 code is slightly restructured to allow for re-use of much of the current code. The implementation closely follows the ARMv7 code and reuses much of the work written there. Tested-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Will Deacon will.dea...@arm.com --- tools/kvm/Makefile | 14 +- tools/kvm/arm/aarch32/include/kvm/kvm-arch.h | 20 +-- .../kvm/arm/aarch32/include/kvm/kvm-config-arch.h | 8 ++ tools/kvm/arm/aarch64/cortex-a57.c | 95 tools/kvm/arm/aarch64/include/kvm/barrier.h| 8 ++ tools/kvm/arm/aarch64/include/kvm/kvm-arch.h | 17 +++ .../kvm/arm/aarch64/include/kvm/kvm-config-arch.h | 10 ++ tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h | 13 ++ tools/kvm/arm/aarch64/kvm-cpu.c| 160 + tools/kvm/arm/fdt.c| 2 +- tools/kvm/arm/include/arm-common/kvm-arch.h| 22 ++- .../include/{kvm = arm-common}/kvm-config-arch.h | 8 +- tools/kvm/arm/kvm.c| 2 +- 13 files changed, 353 insertions(+), 26 deletions(-) create mode 100644 tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h create mode 100644 tools/kvm/arm/aarch64/cortex-a57.c create mode 100644 tools/kvm/arm/aarch64/include/kvm/barrier.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-arch.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-config-arch.h create mode 100644 tools/kvm/arm/aarch64/include/kvm/kvm-cpu-arch.h create mode 100644 tools/kvm/arm/aarch64/kvm-cpu.c rename tools/kvm/arm/include/{kvm = arm-common}/kvm-config-arch.h (61%) diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 33aa4d8..0c59faa 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -103,7 +103,7 @@ OBJS+= virtio/mmio.o # Translate uname -m into ARCH string ARCH ?= $(shell uname -m | sed -e s/i.86/i386/ -e s/ppc.*/powerpc/ \ - -e s/armv7.*/arm/) + -e s/armv7.*/arm/ -e s/aarch64.*/arm64/) ifeq ($(ARCH),i386) ARCH := x86 @@ -174,6 +174,18 @@ ifeq ($(ARCH), arm) OTHEROBJS += $(LIBFDT_OBJS) endif +# ARM64 +ifeq ($(ARCH), arm64) + DEFINES += -DCONFIG_ARM64 + OBJS+= $(OBJS_ARM_COMMON) + OBJS+= arm/aarch64/cortex-a57.o + OBJS+= arm/aarch64/kvm-cpu.o + ARCH_INCLUDE:= $(HDRS_ARM_COMMON) + ARCH_INCLUDE+= -Iarm/aarch64/include + CFLAGS += -I../../scripts/dtc/libfdt + OTHEROBJS += $(LIBFDT_OBJS) +endif + ### ifeq (,$(ARCH_INCLUDE)) diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h index ca79b24..1632e3c 100644 --- a/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h +++ b/tools/kvm/arm/aarch32/include/kvm/kvm-arch.h @@ -1,28 +1,12 @@ #ifndef KVM__KVM_ARCH_H #define KVM__KVM_ARCH_H -#include linux/const.h - -#define ARM_LOMAP_MMIO_AREA_AC(0x, UL) -#define ARM_LOMAP_AXI_AREA _AC(0x4000, UL) -#define ARM_LOMAP_MEMORY_AREA _AC(0x8000, UL) -#define ARM_LOMAP_MAX_MEMORY _AC(0x7fff, UL) - #define ARM_GIC_DIST_SIZE 0x1000 -#define ARM_GIC_DIST_BASE (ARM_LOMAP_AXI_AREA - ARM_GIC_DIST_SIZE) #define ARM_GIC_CPUI_SIZE 0x2000 -#define ARM_GIC_CPUI_BASE (ARM_GIC_DIST_BASE - ARM_GIC_CPUI_SIZE) - -#define ARM_KERN_OFFSET0x8000 - -#define ARM_VIRTIO_MMIO_SIZE (ARM_GIC_DIST_BASE - ARM_LOMAP_MMIO_AREA) -#define ARM_PCI_MMIO_SIZE (ARM_LOMAP_MEMORY_AREA - ARM_LOMAP_AXI_AREA) -#define ARM_MEMORY_AREAARM_LOMAP_MEMORY_AREA -#define ARM_MAX_MEMORY ARM_LOMAP_MAX_MEMORY +#define ARM_KERN_OFFSET(...) 0x8000 -#define KVM_PCI_MMIO_AREA ARM_LOMAP_AXI_AREA -#define KVM_VIRTIO_MMIO_AREA ARM_LOMAP_MMIO_AREA +#define ARM_MAX_MEMORY(...)ARM_LOMAP_MAX_MEMORY #include arm-common/kvm-arch.h diff --git a/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h b/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h new file mode 100644 index 000..acf0d23 --- /dev/null +++ b/tools/kvm/arm/aarch32/include/kvm/kvm-config-arch.h @@ -0,0 +1,8 @@ +#ifndef KVM__KVM_CONFIG_ARCH_H +#define KVM__KVM_CONFIG_ARCH_H + +#define ARM_OPT_ARCH_RUN(...) + +#include arm-common/kvm-config-arch.h + +#endif /* KVM__KVM_CONFIG_ARCH_H */ diff --git a/tools/kvm/arm/aarch64/cortex-a57.c b/tools/kvm/arm/aarch64/cortex-a57.c new file mode 100644 index 000..4fd11ba --- /dev/null +++ b/tools/kvm/arm/aarch64/cortex-a57.c @@ -0,0 +1,95 @@ +#include kvm/fdt.h +#include kvm/kvm.h +#include kvm/kvm-cpu.h +#include kvm/util.h + +#include arm-common/gic.h + +#include linux/byteorder.h +#include linux/types.h + +#define CPU_NAME_MAX_LEN 8 +static void generate_cpu_nodes(void *fdt, struct
[PATCH 3/4] KVM: PPC: BookE: Implement EPR exit
The External Proxy Facility in FSL BookE chips allows the interrupt controller to automatically acknowledge an interrupt as soon as a core gets its pending external interrupt delivered. Today, user space implements the interrupt controller, so we need to check on it during such a cycle. This patch implements logic for user space to enable EPR exiting, disable EPR exiting and EPR exiting itself, so that user space can acknowledge an interrupt when an external interrupt has successfully been delivered into the guest vcpu. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - rework update_epr logic - add documentation for ENABLE_CAP on EPR cap v2 - v3: - remove leftover 'allowed==2' logic --- Documentation/virtual/kvm/api.txt | 40 +- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |9 +++ arch/powerpc/kvm/booke.c| 14 +++- arch/powerpc/kvm/powerpc.c | 10 include/linux/kvm_host.h|1 + include/uapi/linux/kvm.h|6 + 7 files changed, 79 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 9cf591d..66bf7cf 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2238,8 +2238,8 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. -NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR - and KVM_EXIT_PAPR the corresponding +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, + KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace @@ -2342,6 +2342,25 @@ The possible hypercalls are defined in the Power Architecture Platform Requirements (PAPR) document available from www.power.org (free developer registration required to access it). + /* KVM_EXIT_EPR */ + struct { + __u32 epr; + } epr; + +On FSL BookE PowerPC chips, the interrupt controller has a fast patch +interrupt acknowledge path to the core. When the core successfully +delivers an interrupt, it automatically populates the EPR register with +the interrupt vector number and acknowledges the interrupt inside +the interrupt controller. + +In case the interrupt controller lives in user space, we need to do +the interrupt acknowledge cycle through it to fetch the next to be +delivered interrupt vector using this exit. + +It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an +external interrupt has just been delivered into the guest. User space +should put the acknowledged interrupt vector into the 'epr' field. + /* Fix the size of the union. */ char padding[256]; }; @@ -2463,3 +2482,20 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: where num_sets is the tlb_sizes[] value divided by the tlb_ways[] value. - The tsize field of mas1 shall be set to 4K on TLB0, even though the hardware ignores this value for TLB0. + +6.4 KVM_CAP_PPC_EPR + +Architectures: ppc +Parameters: args[0] defines whether the proxy facility is active +Returns: 0 on success; -1 on error + +This capability enables or disables the delivery of interrupts through the +external proxy facility. + +When enabled (args[0] != 0), every time the guest gets an external interrupt +delivered, it automatically exits into user space with a KVM_EXIT_EPR exit +to receive the topmost interrupt vector. + +When disabled (args[0] == 0), behavior is as if this facility is unsupported. + +When this capability is enabled, KVM_EXIT_EPR can occur. diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ab49c6c..8a72d59 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -520,6 +520,8 @@ struct kvm_vcpu_arch { u8 sane; u8 cpu_type; u8 hcall_needed; + u8 epr_enabled; + u8 epr_needed; u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 5f5f69a..493630e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -264,6 +264,15 @@ static inline void kvm_linear_init(void) {} #endif +static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr) +{ +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GEPR, epr); +#elif defined(CONFIG_BOOKE) + vcpu-arch.epr = epr; +#endif +} + int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
[PATCH 0/4] KVM: PPC: BookE: Add EPR user space support v3
The FSL MPIC implementation contains a feature called external proxy facility which allows for interrupts to be acknowledged in the MPIC as soon as a core accepts its pending external interrupt. This patch set implements all the necessary pieces to support this from the kernel space side. v1 - v2: - do an explicit requests check rather than play with return values - rework update_epr logic - add documentation for ENABLE_CAP on EPR cap v2 - v3: - remove leftover 'allowed==2' logic Alexander Graf (3): KVM: PPC: BookE: Emulate mfspr on EPR KVM: PPC: BookE: Implement EPR exit KVM: PPC: BookE: Add EPR ONE_REG sync Mihai Caraman (1): KVM: PPC: BookE: Allow irq deliveries to inject requests Documentation/virtual/kvm/api.txt | 41 +- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |9 +++ arch/powerpc/include/uapi/asm/kvm.h |6 - arch/powerpc/kvm/booke.c| 40 +- arch/powerpc/kvm/booke_emulate.c|3 ++ arch/powerpc/kvm/powerpc.c | 10 include/linux/kvm_host.h|1 + include/uapi/linux/kvm.h|6 + 9 files changed, 114 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] KVM: PPC: BookE: Emulate mfspr on EPR
The EPR register is potentially valid for PR KVM as well, so we need to emulate accesses to it. It's only defined for reading, so only handle the mfspr case. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/booke_emulate.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 4685b8c..27a4b28 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -269,6 +269,9 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val) case SPRN_ESR: *spr_val = vcpu-arch.shared-esr; break; + case SPRN_EPR: + *spr_val = vcpu-arch.epr; + break; case SPRN_CSRR0: *spr_val = vcpu-arch.csrr0; break; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] KVM: PPC: BookE: Allow irq deliveries to inject requests
From: Mihai Caraman mihai.cara...@freescale.com When injecting an interrupt into guest context, we usually don't need to check for requests anymore. At least not until today. With the introduction of EPR, we will have to create a request when the guest has successfully accepted an external interrupt though. So we need to prepare the interrupt delivery to abort guest entry gracefully. Otherwise we'd delay the EPR request. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - do an explicit requests check rather than play with return values --- arch/powerpc/kvm/booke.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 69f1140..964f447 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -581,6 +581,11 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu) kvmppc_core_check_exceptions(vcpu); + if (vcpu-requests) { + /* Exception delivery raised request; start over */ + return 1; + } + if (vcpu-arch.shared-msr MSR_WE) { local_irq_enable(); kvm_vcpu_block(vcpu); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM: PPC: BookE: Add EPR ONE_REG sync
We need to be able to read and write the contents of the EPR register from user space. This patch implements that logic through the ONE_REG API and declares its (never implemented) SREGS counterpart as deprecated. Signed-off-by: Alexander Graf ag...@suse.de --- Documentation/virtual/kvm/api.txt |1 + arch/powerpc/include/uapi/asm/kvm.h |6 +- arch/powerpc/kvm/booke.c| 21 + 3 files changed, 27 insertions(+), 1 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 66bf7cf..6601973 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1774,6 +1774,7 @@ registers, find a list below: PPC | KVM_REG_PPC_VPA_SLB | 128 PPC | KVM_REG_PPC_VPA_DTL | 128 PPC | KVM_REG_PPC_EPCR | 32 + PPC | KVM_REG_PPC_EPR | 32 4.69 KVM_GET_ONE_REG diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 2fba8a6..16064d0 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -114,7 +114,10 @@ struct kvm_regs { /* Embedded Floating Point (SPE) -- IVOR32-34 if KVM_SREGS_E_IVOR */ #define KVM_SREGS_E_SPE(1 9) -/* External Proxy (EXP) -- EPR */ +/* + * DEPRECATED! USE ONE_REG FOR THIS ONE! + * External Proxy (EXP) -- EPR + */ #define KVM_SREGS_EXP (1 10) /* External PID (E.PD) -- EPSC/EPLC */ @@ -412,5 +415,6 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_VPA_DTL(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x84) #define KVM_REG_PPC_EPCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x85) +#define KVM_REG_PPC_EPR(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x86) #endif /* __LINUX_KVM_POWERPC_H */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 940ec80..8779cd4 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -300,6 +300,15 @@ static void set_guest_esr(struct kvm_vcpu *vcpu, u32 esr) #endif } +static unsigned long get_guest_epr(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_KVM_BOOKE_HV + return mfspr(SPRN_GEPR); +#else + return vcpu-arch.epr; +#endif +} + /* Deliver the interrupt of the corresponding priority, if possible. */ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, unsigned int priority) @@ -1405,6 +1414,11 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) vcpu-arch.dbg_reg.dac[dac], sizeof(u64)); break; } + case KVM_REG_PPC_EPR: { + u32 epr = get_guest_epr(vcpu); + r = put_user(epr, (u32 __user *)(long)reg-addr); + break; + } #if defined(CONFIG_64BIT) case KVM_REG_PPC_EPCR: r = put_user(vcpu-arch.epcr, (u32 __user *)(long)reg-addr); @@ -1437,6 +1451,13 @@ int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) (u64 __user *)(long)reg-addr, sizeof(u64)); break; } + case KVM_REG_PPC_EPR: { + u32 new_epr; + r = get_user(new_epr, (u32 __user *)(long)reg-addr); + if (!r) + kvmppc_set_epr(vcpu, new_epr); + break; + } #if defined(CONFIG_64BIT) case KVM_REG_PPC_EPCR: { u32 new_epcr; -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] KVM: Write protect the updated slot only when we start dirty logging
On Tue, Dec 18, 2012 at 04:26:47PM +0900, Takuya Yoshikawa wrote: This is needed to make kvm_mmu_slot_remove_write_access() rmap based: otherwise we may end up using invalid rmap's. Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp Why? memslot-arch.rmap[] has been properly allocated at this point. --- arch/x86/kvm/x86.c |9 - virt/kvm/kvm_main.c |1 - 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1c9c834..9451efa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6897,7 +6897,14 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, spin_lock(kvm-mmu_lock); if (nr_mmu_pages) kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages); - kvm_mmu_slot_remove_write_access(kvm, mem-slot); + /* + * Write protect all pages for dirty logging. + * Existing largepage mappings are destroyed here and new ones will + * not be created until the end of the logging. + */ + if ((mem-flags KVM_MEM_LOG_DIRTY_PAGES) + !(old.flags KVM_MEM_LOG_DIRTY_PAGES)) + kvm_mmu_slot_remove_write_access(kvm, mem-slot); spin_unlock(kvm-mmu_lock); /* * If memory slot is created, or moved, we need to clear all diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bd31096..0ef5daa 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -805,7 +805,6 @@ int __kvm_set_memory_region(struct kvm *kvm, if ((new.flags KVM_MEM_LOG_DIRTY_PAGES) !new.dirty_bitmap) { if (kvm_create_dirty_bitmap(new) 0) goto out_free; - /* destroy any largepage mappings for dirty tracking */ } if (!npages || base_gfn != old.base_gfn) { -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] KVM: Alleviate mmu_lock hold time when we start dirty logging
On Tue, Dec 18, 2012 at 04:25:58PM +0900, Takuya Yoshikawa wrote: This patch set makes kvm_mmu_slot_remove_write_access() rmap based and adds conditional rescheduling to it. The motivation for this change is of course to reduce the mmu_lock hold time when we start dirty logging for a large memory slot. You may not see the problem if you just give 8GB or less of the memory to the guest with THP enabled on the host -- this is for the worst case. Neat. Looks good, except patch 1 - a) don't understand why it is necessary and b) not confident its safe - isnt clearing necessary for KVM_SET_MEMORY instances other than !(old.flags LOG_DIRTY) (new.flags LOG_DIRTY) IMPORTANT NOTE (not about this patch set): I have hit the following bug many times with the current next branch, even WITHOUT my patches. Although I do not know a way to reproduce this yet, it seems that something was broken around slot-dirty_bitmap. I am now investigating the new code in __kvm_set_memory_region(). The bug: [ 575.238063] BUG: unable to handle kernel paging request at 0002efe83a77 [ 575.238185] IP: [a05f9619] mark_page_dirty_in_slot+0x19/0x20 [kvm] [ 575.238308] PGD 0 [ 575.238343] Oops: 0002 [#1] SMP The call trace: [ 575.241207] Call Trace: [ 575.241257] [a05f96b1] kvm_write_guest_cached+0x91/0xb0 [kvm] [ 575.241370] [a0610db9] kvm_arch_vcpu_ioctl_run+0x1109/0x12c0 [kvm] [ 575.241488] [a060fd55] ? kvm_arch_vcpu_ioctl_run+0xa5/0x12c0 [kvm] [ 575.241595] [81679194] ? mutex_lock_killable_nested+0x274/0x340 [ 575.241706] [a05faf80] ? kvm_set_ioapic_irq+0x20/0x20 [kvm] [ 575.241813] [a05f71c9] kvm_vcpu_ioctl+0x559/0x670 [kvm] [ 575.241913] [a05f8a58] ? kvm_vm_ioctl+0x1b8/0x570 [kvm] [ 575.242007] [8101b9d3] ? native_sched_clock+0x13/0x80 [ 575.242125] [8101ba49] ? sched_clock+0x9/0x10 [ 575.242208] [8109015d] ? sched_clock_cpu+0xbd/0x110 [ 575.242298] [811a914c] ? fget_light+0x3c/0x140 [ 575.242381] [8119dfa8] do_vfs_ioctl+0x98/0x570 [ 575.242463] [811a91b1] ? fget_light+0xa1/0x140 [ 575.246393] [811a914c] ? fget_light+0x3c/0x140 [ 575.250363] [8119e511] sys_ioctl+0x91/0xb0 [ 575.254327] [81684c19] system_call_fastpath+0x16/0x1b Takuya Yoshikawa (7): KVM: Write protect the updated slot only when we start dirty logging KVM: MMU: Remove unused parameter level from __rmap_write_protect() KVM: MMU: Make kvm_mmu_slot_remove_write_access() rmap based KVM: x86: Remove unused slot_bitmap from kvm_mmu_page KVM: Make kvm_mmu_change_mmu_pages() take mmu_lock by itself KVM: Make kvm_mmu_slot_remove_write_access() take mmu_lock by itself KVM: Conditionally reschedule when kvm_mmu_slot_remove_write_access() takes a long time Documentation/virtual/kvm/mmu.txt |7 arch/x86/include/asm/kvm_host.h |5 --- arch/x86/kvm/mmu.c| 56 +++- arch/x86/kvm/x86.c| 13 +--- virt/kvm/kvm_main.c |1 - 5 files changed, 38 insertions(+), 44 deletions(-) -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 5/5] KVM: x86: improve reexecute_instruction
On Sat, Jan 05, 2013 at 04:16:37PM +0800, Xiao Guangrong wrote: On 01/05/2013 06:44 AM, Marcelo Tosatti wrote: index b0a3678..44c6992 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4756,15 +4756,8 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu) static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2) { gpa_t gpa = cr2; + gfn_t gfn; pfn_t pfn; - unsigned int indirect_shadow_pages; - - spin_lock(vcpu-kvm-mmu_lock); - indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages; - spin_unlock(vcpu-kvm-mmu_lock); - - if (!indirect_shadow_pages) - return false; This renders the previous patch obsolete, pretty much (please fold). Will try. if (!vcpu-arch.mmu.direct_map) { /* @@ -4781,13 +4774,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2) return true; } - /* - * if emulation was due to access to shadowed page table - * and it failed try to unshadow page and re-enter the - * guest to let CPU execute the instruction. - */ - if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa))) - return true; + gfn = gpa_to_gfn(gpa); /* * Do not retry the unhandleable instruction if it faults on the @@ -4795,13 +4782,38 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, unsigned long cr2) * retry instruction - write #PF - emulation fail - retry * instruction - ... */ - pfn = gfn_to_pfn(vcpu-kvm, gpa_to_gfn(gpa)); - if (!is_error_noslot_pfn(pfn)) { - kvm_release_pfn_clean(pfn); + pfn = gfn_to_pfn(vcpu-kvm, gfn); + + /* + * If the instruction failed on the error pfn, it can not be fixed, + * report the error to userspace. + */ + if (is_error_noslot_pfn(pfn)) + return false; + + kvm_release_pfn_clean(pfn); + + /* The instructions are well-emulated on direct mmu. */ + if (vcpu-arch.mmu.direct_map) { !direct_map? No. This logic is, if it is direct mmu, we just unprotect the page shadowed by nested mmu, then let guest retry the instruction, no need to detect unhandlable instruction. + unsigned int indirect_shadow_pages; + + spin_lock(vcpu-kvm-mmu_lock); + indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages; + spin_unlock(vcpu-kvm-mmu_lock); + + if (indirect_shadow_pages) + kvm_mmu_unprotect_page(vcpu-kvm, gfn); + return true; } - return false; + kvm_mmu_unprotect_page(vcpu-kvm, gfn); + + /* If the target gfn is used as page table, the fault can + * not be avoided by unprotecting shadow page and it will + * be reported to userspace. + */ + return !vcpu-arch.target_gfn_is_pt; } The idea was How about recording the gfn number for shadow pages that have been shadowed in the current pagefault run? (which is cheap, compared to shadowing these pages). If failed instruction emulation is write to one of these gfns, then fail. If i understood correctly, i do not think it is simpler than the way in this patch. There is the change to apply the idea: diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..2163de8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -502,6 +502,8 @@ struct kvm_vcpu_arch { u64 msr_val; struct gfn_to_hva_cache data; } pv_eoi; + + gfn_t pt_gfns[4]; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0453fa0..ac4210f 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -523,6 +523,18 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, return false; } +static void FNAME(cache_pt_gfns)(struct kvm_vcpu *vcpu, struct guest_walker *walker) +{ + int level; + + /* Reset all gfns to -1, then we can detect the levels which is not used in guest. */ + for (level = 0; level 4; level++) + vcpu-arch.pt_gfns[level] = (gfn_t)(-1); + + for (level = walker-level; level = walker-max_level; level++) + vcpu-arch.pt_gfns[level - 1] = walker-table_gfn[level - 1]; +} + /* * Page fault handler. There are several causes for a page fault: * - there is no shadow pte for the guest pte @@ -576,6 +588,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, return 0; } + FNAME(cache_pt_gfns)(vcpu, walker); + if (walker.level = PT_DIRECTORY_LEVEL) force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn) || FNAME(is_self_change_mapping)(vcpu, walker, user_fault); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
Re: [RFC PATCH 0/4] MSI affinity for assigned devices
On Mon, 2013-01-07 at 20:14 +, Krishna J wrote: Hi Alex, MSI routing updates aren't currently handled by pci-assign or vfio-pci (when using KVM acceleration), which means that trying to set interrupt SMP affinity in the guest has no effect unless MSI is completely disabled and re-enabled. This series fixes this for both device assignment backends using similar schemes. We store the last MSIMessage programmed to KVM and do updates to the MSI route when it changes. pci-assign takes a little bit of refactoring to make this happen cleanly. Thanks, I am using the MSI affinity for assigned devices patch 1 to 4. I have setup the guest such that VCPU0 is pinned to PCPU1, VCPU1 is pinned to PCPU2, VCPU2 is pinned to PCPU3 and VCPU3 is pinned to PCPU4. I do this by taskset after the guest boots. I then start generating interrupts affined to VCPU3. I see all the interrupts directly delivered to VCPU 3. Now i do the same test but interrupt affined to VCPU 2. Although the interrupts are delivered to VCPU2 there are lot of Rescheduling interrupts in VCPU 3. I have checked the smp_affinity and it is updated to VCPU 2. Wanted to know your feedback on this usecase and what might be the impact. CPU0 CPU1 CPU2 CPU3 0:211 0 0 0 IO-APIC-edge timer 4: 60940 0 0 0 IO-APIC-edge serial 8: 65 0 0 0 IO-APIC-edge rtc0 9: 0 0 0 0 IO-APIC-fasteoi acpi 40: 0 0 0 0 PCI-MSI-edge virtio1-config 41: 1910 0 0 0 PCI-MSI-edge virtio1-requests 42: 0 0 0 0 PCI-MSI-edge virtio0-config 43:127 0 0 0 PCI-MSI-edge virtio0-input 44: 1 0 0 0 PCI-MSI-edge virtio0-output 45: 1 0 3377 11194 PCI-MSI-edge FPGA_DEV NMI: 0 0 0 0 Non-maskable interrupts LOC: 225880 231572 223670 223612 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 0 Performance monitoring interrupts IWI: 0 0 0 0 IRQ work interrupts RTR: 0 0 0 0 APIC ICR read retries RES: 14 20 21 3398 Rescheduling interrupts--- Many RES Interrtups!! CAL: 0 14 14 16 Function call interrupts TLB: 0 0 0 0 TLB shootdowns I don't know, but I'll fix the line wrap for anyone else that wants to have a look. The count looks roughly similar to the number of interrupts to VCPU2. Is your application somehow tied to VCPU3? Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: VMX: fix incorrect cached cpl value with real/v8086 modes (v3)
CPL is always 0 when in real mode, and always 3 when virtual 8086 mode. Using values other than those can cause failures on operations that check CPL. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 55dfc37..dd2a85c 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1696,7 +1696,6 @@ static unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu) static void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags) { __set_bit(VCPU_EXREG_RFLAGS, (ulong *)vcpu-arch.regs_avail); - __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); to_vmx(vcpu)-rflags = rflags; if (to_vmx(vcpu)-rmode.vm86_active) { to_vmx(vcpu)-rmode.save_rflags = rflags; @@ -3110,7 +3109,6 @@ static void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) vmcs_writel(CR0_READ_SHADOW, cr0); vmcs_writel(GUEST_CR0, hw_cr0); vcpu-arch.cr0 = cr0; - __clear_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); } static u64 construct_eptp(unsigned long root_hpa) @@ -3220,8 +3218,10 @@ static u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int seg) return vmx_read_guest_seg_base(to_vmx(vcpu), seg); } -static int __vmx_get_cpl(struct kvm_vcpu *vcpu) +static int vmx_get_cpl(struct kvm_vcpu *vcpu) { + struct vcpu_vmx *vmx = to_vmx(vcpu); + if (!is_protmode(vcpu)) return 0; @@ -3229,13 +3229,6 @@ static int __vmx_get_cpl(struct kvm_vcpu *vcpu) (kvm_get_rflags(vcpu) X86_EFLAGS_VM)) /* if virtual 8086 */ return 3; - return vmx_read_guest_seg_selector(to_vmx(vcpu), VCPU_SREG_CS) 3; -} - -static int vmx_get_cpl(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - /* * If we enter real mode with cs.sel 3 != 0, the normal CPL calculations * fail; use the cache instead. @@ -3246,7 +3239,7 @@ static int vmx_get_cpl(struct kvm_vcpu *vcpu) if (!test_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail)) { __set_bit(VCPU_EXREG_CPL, (ulong *)vcpu-arch.regs_avail); - vmx-cpl = __vmx_get_cpl(vcpu); + vmx-cpl = vmx_read_guest_seg_selector(vmx, VCPU_SREG_CS) 3; } return vmx-cpl; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote: ioapic_write (or any other ioapic update) lock() perform update make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*) unlock() (*) Similarly to TLB flush. The advantage is that all work becomes vcpu local. The end result is much simpler code. What complexity will it remove? Synchronization between multiple CPUs (except the KVM_REQ_ bit processing, which is infrastructure shared by other parts of KVM). We agreed that performance is non issue here. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: mmu: remove unused trace event
On Tue, Dec 25, 2012 at 02:34:06PM +0200, Gleb Natapov wrote: trace_kvm_mmu_delay_free_pages() is no longer used. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/7] s390: Host support for channel I/O.
On Thu, Dec 20, 2012 at 03:32:05PM +0100, Cornelia Huck wrote: Hi, here's the next iteration of the host patches to support channel I/O against kvm/next. Changes from v4 are on the style side; mainly using defines instead of magic numbers and using helper functions for decoding instructions. Patches 1 and 2 are new (and can be applied independently of the channel I/O patches); some things Alex pointed out in the patches apply to existing code as well. Please consider for kvm/next. Cornelia Huck (7): KVM: s390: Constify intercept handler tables. KVM: s390: Decoding helper functions. KVM: s390: Support for I/O interrupts. KVM: s390: Add support for machine checks. KVM: s390: In-kernel handling of I/O instructions. KVM: s390: Base infrastructure for enabling capabilities. KVM: s390: Add support for channel I/O instructions. Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] vfio-pci: [NOT FOR COMMIT] Add support for legacy MMIO I/O port towards VGA support
Create two new legacy regions, one for MMIO space below 1MB and another for 64k of I/O port space. For devices of PCI class VGA these ranges will be exposed and allow direct access to the device at the PCI defined VGA addresses, 0xa, 0x3b0, 0x3c0. VFIO makes use of the host VGA arbiter to manage host chipset config to route each access to the correct device. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/pci/vfio_pci.c | 74 --- drivers/vfio/pci/vfio_pci_private.h |6 + drivers/vfio/pci/vfio_pci_rdwr.c| 170 +++ include/uapi/linux/vfio.h |3 + 4 files changed, 197 insertions(+), 56 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index b28e66c..8a09c33 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -223,9 +223,14 @@ static long vfio_pci_ioctl(void *device_data, if (vdev-reset_works) info.flags |= VFIO_DEVICE_FLAGS_RESET; - info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_regions = VFIO_PCI_CONFIG_REGION_INDEX + 1; info.num_irqs = VFIO_PCI_NUM_IRQS; + if ((vdev-pdev-class 8) == PCI_CLASS_DISPLAY_VGA) { + info.flags |= VFIO_DEVICE_FLAGS_VGA; + info.num_regions += 2; + } + return copy_to_user((void __user *)arg, info, minsz); } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { @@ -285,6 +290,26 @@ static long vfio_pci_ioctl(void *device_data, info.flags = VFIO_REGION_INFO_FLAG_READ; break; } + case VFIO_PCI_LEGACY_MMIO_REGION_INDEX: + if ((pdev-class 8) != PCI_CLASS_DISPLAY_VGA) + return -EINVAL; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 1024 * 1024; + info.flags = VFIO_REGION_INFO_FLAG_READ | +VFIO_REGION_INFO_FLAG_WRITE; + + break; + case VFIO_PCI_LEGACY_IOPORT_REGION_INDEX: + if ((pdev-class 8) != PCI_CLASS_DISPLAY_VGA) + return -EINVAL; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 64 * 1024; + info.flags = VFIO_REGION_INFO_FLAG_READ | +VFIO_REGION_INFO_FLAG_WRITE; + + break; default: return -EINVAL; } @@ -376,14 +401,25 @@ static ssize_t vfio_pci_read(void *device_data, char __user *buf, if (index = VFIO_PCI_NUM_REGIONS) return -EINVAL; - if (index == VFIO_PCI_CONFIG_REGION_INDEX) + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: return vfio_pci_config_readwrite(vdev, buf, count, ppos, false); - else if (index == VFIO_PCI_ROM_REGION_INDEX) - return vfio_pci_mem_readwrite(vdev, buf, count, ppos, false); - else if (pci_resource_flags(pdev, index) IORESOURCE_IO) - return vfio_pci_io_readwrite(vdev, buf, count, ppos, false); - else if (pci_resource_flags(pdev, index) IORESOURCE_MEM) + case VFIO_PCI_ROM_REGION_INDEX: return vfio_pci_mem_readwrite(vdev, buf, count, ppos, false); + case VFIO_PCI_LEGACY_MMIO_REGION_INDEX: + return vfio_pci_legacy_mem_readwrite(vdev, buf, count, +ppos, false); + case VFIO_PCI_LEGACY_IOPORT_REGION_INDEX: + return vfio_pci_legacy_io_readwrite(vdev, buf, count, + ppos, false); + default: + if (pci_resource_flags(pdev, index) IORESOURCE_IO) + return vfio_pci_io_readwrite(vdev, buf, count, +ppos, false); + if (pci_resource_flags(pdev, index) IORESOURCE_MEM) + return vfio_pci_mem_readwrite(vdev, buf, count, + ppos, false); + } return -EINVAL; } @@ -398,17 +434,25 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf, if (index = VFIO_PCI_NUM_REGIONS) return -EINVAL; - if (index == VFIO_PCI_CONFIG_REGION_INDEX) + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: return vfio_pci_config_readwrite(vdev, (char __user *)buf, count, ppos, true); - else if (index == VFIO_PCI_ROM_REGION_INDEX) + case VFIO_PCI_ROM_REGION_INDEX: return -EINVAL; - else if
[PATCH 0/1] vfio-pci: Towards VGA support
vfio makes a nice interface to start looking at supporting VGA devices assigned to virtual machines (ie. userspace drivers) because we can so easily add additional ranges for a device. In this patch we add legacy MMIO (below 1MB) and I/O port (64k) to devices with PCI class code VGA. We can then use the kernel VGA arbiter service to change chipset routing for each access to the VGA ranges defined in the PCI spec. The rest of the region space not used by VGA is left inaccessible until we add future feature that needs some other legacy range. There's also a qemu userspace companion series to this which learns how to look for this new feature flag and setup ranges. Together they get a step closer to supporting vfio-based VGA assignment, but it doesn't yet work. I'm posting in this broken state both for archival purposes as well as the hope that someone has ideas of what might be missing or be able to pick up and run with this code. Some cards are able to get through execution of their VGA BIOS with these patches, but none that I've seen sync the monitor to VGA text mode from seabios. With a hack in qemu for a card specific backdoor on a Radeon HD5450 I've been able to get syslinux graphics mode to work and Windows will use it during normal bootup. I have no idea what might be missing for VGA text mode. Thanks, Alex --- Alex Williamson (1): vfio-pci: [NOT FOR COMMIT] Add support for legacy MMIO I/O port towards VGA support drivers/vfio/pci/vfio_pci.c | 74 --- drivers/vfio/pci/vfio_pci_private.h |6 + drivers/vfio/pci/vfio_pci_rdwr.c| 170 +++ include/uapi/linux/vfio.h |3 + 4 files changed, 197 insertions(+), 56 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Towards vfio-base VGA device assignment
This is the companion series to the vfio-pci kernel series for VGA support. Combined, these don't do as much as you'd hope. Patch 1 is simpy a header update for the matching vfio kernel changes. Patch 2 is the meat of the changes, enabling vfio-pci to claim access to VGA ranges and pass them through to the kernel. The last patch is a hard coded hack specific to my system and only known to be needed on a Radeon HD5450 where there seems to be a backdoor for the VGA BIOS to find the physical address of the device (physical devices is at 0x4000, virtual device is at 0xc000). With this and the kernel patch, some devices are able to get through VGA bios execution. The HD5450 can even sync the monitor and show the correct thing on the screen if you run something that uses VGA graphic mode. Seabios seems to think VBE works, but for some reason VGA text mode doesn't work, the monitor turns off. So, like the kernel side, I'm posting these for archival purposes and with hopes that someone may have some ideas on what's still missing. Thanks, Alex --- Alex Williamson (3): qemu: [NOT FOR COMMIT] Update linux headers for vfio VGA vfio-pci: [NOT FOR COMMIT] Add support for VGA MMIO and I/O port access vfio-pci: [NOT FOR COMMIT] Hack around HD5450 I/O port backdoor hw/vfio_pci.c| 182 ++ linux-headers/asm-powerpc/kvm.h | 86 linux-headers/asm-powerpc/kvm_para.h | 13 +- linux-headers/linux/kvm.h| 21 +++- linux-headers/linux/kvm_para.h |6 + linux-headers/linux/vfio.h |9 +- linux-headers/linux/virtio_config.h |6 + linux-headers/linux/virtio_ring.h|6 + 8 files changed, 305 insertions(+), 24 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] qemu: [NOT FOR COMMIT] Update linux headers for vfio VGA
Signed-off-by: Alex Williamson alex.william...@redhat.com --- linux-headers/asm-powerpc/kvm.h | 86 ++ linux-headers/asm-powerpc/kvm_para.h | 13 +++-- linux-headers/linux/kvm.h| 21 ++-- linux-headers/linux/kvm_para.h |6 +- linux-headers/linux/vfio.h |9 ++-- linux-headers/linux/virtio_config.h |6 +- linux-headers/linux/virtio_ring.h|6 +- 7 files changed, 124 insertions(+), 23 deletions(-) diff --git a/linux-headers/asm-powerpc/kvm.h b/linux-headers/asm-powerpc/kvm.h index 1bea4d8..2fba8a6 100644 --- a/linux-headers/asm-powerpc/kvm.h +++ b/linux-headers/asm-powerpc/kvm.h @@ -221,6 +221,12 @@ struct kvm_sregs { __u32 dbsr; /* KVM_SREGS_E_UPDATE_DBSR */ __u32 dbcr[3]; + /* +* iac/dac registers are 64bit wide, while this API +* interface provides only lower 32 bits on 64 bit +* processors. ONE_REG interface is added for 64bit +* iac/dac registers. +*/ __u32 iac[4]; __u32 dac[2]; __u32 dvc[2]; @@ -325,6 +331,86 @@ struct kvm_book3e_206_tlb_params { __u32 reserved[8]; }; +/* For KVM_PPC_GET_HTAB_FD */ +struct kvm_get_htab_fd { + __u64 flags; + __u64 start_index; + __u64 reserved[2]; +}; + +/* Values for kvm_get_htab_fd.flags */ +#define KVM_GET_HTAB_BOLTED_ONLY ((__u64)0x1) +#define KVM_GET_HTAB_WRITE ((__u64)0x2) + +/* + * Data read on the file descriptor is formatted as a series of + * records, each consisting of a header followed by a series of + * `n_valid' HPTEs (16 bytes each), which are all valid. Following + * those valid HPTEs there are `n_invalid' invalid HPTEs, which + * are not represented explicitly in the stream. The same format + * is used for writing. + */ +struct kvm_get_htab_header { + __u32 index; + __u16 n_valid; + __u16 n_invalid; +}; + #define KVM_REG_PPC_HIOR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x1) +#define KVM_REG_PPC_IAC1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x2) +#define KVM_REG_PPC_IAC2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3) +#define KVM_REG_PPC_IAC3 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x4) +#define KVM_REG_PPC_IAC4 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x5) +#define KVM_REG_PPC_DAC1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x6) +#define KVM_REG_PPC_DAC2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x7) +#define KVM_REG_PPC_DABR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x8) +#define KVM_REG_PPC_DSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9) +#define KVM_REG_PPC_PURR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xa) +#define KVM_REG_PPC_SPURR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb) +#define KVM_REG_PPC_DAR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc) +#define KVM_REG_PPC_DSISR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd) +#define KVM_REG_PPC_AMR(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xe) +#define KVM_REG_PPC_UAMOR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xf) + +#define KVM_REG_PPC_MMCR0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10) +#define KVM_REG_PPC_MMCR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11) +#define KVM_REG_PPC_MMCRA (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12) + +#define KVM_REG_PPC_PMC1 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18) +#define KVM_REG_PPC_PMC2 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19) +#define KVM_REG_PPC_PMC3 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1a) +#define KVM_REG_PPC_PMC4 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1b) +#define KVM_REG_PPC_PMC5 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1c) +#define KVM_REG_PPC_PMC6 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1d) +#define KVM_REG_PPC_PMC7 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1e) +#define KVM_REG_PPC_PMC8 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x1f) + +/* 32 floating-point registers */ +#define KVM_REG_PPC_FPR0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x20) +#define KVM_REG_PPC_FPR(n) (KVM_REG_PPC_FPR0 + (n)) +#define KVM_REG_PPC_FPR31 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x3f) + +/* 32 VMX/Altivec vector registers */ +#define KVM_REG_PPC_VR0(KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x40) +#define KVM_REG_PPC_VR(n) (KVM_REG_PPC_VR0 + (n)) +#define KVM_REG_PPC_VR31 (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x5f) + +/* 32 double-width FP registers for VSX */ +/* High-order halves overlap with FP regs */ +#define KVM_REG_PPC_VSR0 (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x60) +#define KVM_REG_PPC_VSR(n) (KVM_REG_PPC_VSR0 + (n)) +#define KVM_REG_PPC_VSR31 (KVM_REG_PPC | KVM_REG_SIZE_U128 | 0x7f) + +/* FP and vector status/control registers */ +#define KVM_REG_PPC_FPSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80) +#define KVM_REG_PPC_VSCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81) + +/* Virtual
[PATCH 2/3] vfio-pci: [NOT FOR COMMIT] Add support for VGA MMIO and I/O port access
With this, some VGA cards can make it through VGA BIOS init, but I have yet to see one sync the monitor in VGA text mode. Only tested with -vga none. This adds a new option to vfio-pci, vga=on, which enables legacy VGA ranges. Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/vfio_pci.c | 173 + 1 file changed, 172 insertions(+), 1 deletion(-) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 94c61ab..846e8de 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -59,6 +59,15 @@ typedef struct VFIOBAR { uint8_t nr; /* cache the BAR number for debug */ } VFIOBAR; +typedef struct VFIOLegacyIO { +off_t fd_offset; +int fd; +MemoryRegion mem; +off_t region_offset; +size_t size; +uint32_t flags; +} VFIOLegacyIO; + typedef struct VFIOINTx { bool pending; /* interrupt pending */ bool kvm_accel; /* set when QEMU bypass through KVM enabled */ @@ -126,10 +135,15 @@ typedef struct VFIODevice { int nr_vectors; /* Number of MSI/MSIX vectors currently in use */ int interrupt; /* Current interrupt type */ VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ +VFIOLegacyIO vga[3]; /* 0xa, 0x3b0, 0x3c0 */ PCIHostDeviceAddress host; QLIST_ENTRY(VFIODevice) next; struct VFIOGroup *group; +uint32_t features; +#define VFIO_FEATURE_ENABLE_VGA_BIT 0 +#define VFIO_FEATURE_ENABLE_VGA (1 VFIO_FEATURE_ENABLE_VGA_BIT) bool reset_works; +bool has_vga; } VFIODevice; typedef struct VFIOGroup { @@ -958,6 +972,87 @@ static const MemoryRegionOps vfio_bar_ops = { .endianness = DEVICE_LITTLE_ENDIAN, }; +static void vfio_legacy_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ +VFIOLegacyIO *io = opaque; +union { +uint8_t byte; +uint16_t word; +uint32_t dword; +uint64_t qword; +} buf; +off_t offset = io-fd_offset + io-region_offset + addr; + +switch (size) { +case 1: +buf.byte = data; +break; +case 2: +buf.word = cpu_to_le16(data); +break; +case 4: +buf.dword = cpu_to_le32(data); +break; +default: +hw_error(vfio: unsupported write size, %d bytes\n, size); +break; +} + +if (pwrite(io-fd, buf, size, offset) != size) { +error_report(%s(,0x%HWADDR_PRIx, 0x%PRIx64, %d) failed: %m\n, + __func__, io-region_offset + addr, data, size); +} + +DPRINTF(%s(0x%HWADDR_PRIx, 0x%PRIx64, %d)\n, +__func__, io-region_offset + addr, data, size); +} + +static uint64_t vfio_legacy_read(void *opaque, hwaddr addr, unsigned size) +{ +VFIOLegacyIO *io = opaque; +union { +uint8_t byte; +uint16_t word; +uint32_t dword; +uint64_t qword; +} buf; +uint64_t data = 0; +off_t offset = io-fd_offset + io-region_offset + addr; + +if (pread(io-fd, buf, size, offset) != size) { +error_report(%s(,0x%HWADDR_PRIx, %d) failed: %m\n, + __func__, io-region_offset + addr, size); +return (uint64_t)-1; +} + +switch (size) { +case 1: +data = buf.byte; +break; +case 2: +data = le16_to_cpu(buf.word); +break; +case 4: +data = le32_to_cpu(buf.dword); +break; +default: +hw_error(vfio: unsupported read size, %d bytes\n, size); +break; +} + +DPRINTF(%s(0x%HWADDR_PRIx, %d) = 0x%PRIx64\n, +__func__, io-region_offset + addr, size, data); + +return data; +} + +static const MemoryRegionOps vfio_legacy_ops = { +.read = vfio_legacy_read, +.write = vfio_legacy_write, +.endianness = DEVICE_LITTLE_ENDIAN, +}; + /* * PCI config space */ @@ -1498,6 +1593,27 @@ static void vfio_map_bars(VFIODevice *vdev) for (i = 0; i PCI_ROM_SLOT; i++) { vfio_map_bar(vdev, i); } + +if (vdev-has_vga (vdev-features VFIO_FEATURE_ENABLE_VGA)) { +memory_region_init_io(vdev-vga[0].mem, vfio_legacy_ops, + vdev-vga[0], vfio-vga-mmio@0xa, + 0xc - 0xa); +memory_region_add_subregion_overlap(pci_address_space(vdev-pdev), +0xa, vdev-vga[0].mem, 1); +memory_region_set_coalescing(vdev-vga[0].mem); + +memory_region_init_io(vdev-vga[1].mem, vfio_legacy_ops, + vdev-vga[1], vfio-vga-io@0x3b0, + 0x3bc - 0x3b0); +memory_region_add_subregion_overlap(pci_address_space_io(vdev-pdev), +0x3b0, vdev-vga[1].mem, 1); + +memory_region_init_io(vdev-vga[2].mem, vfio_legacy_ops, + vdev-vga[2], vfio-vga-io@0x3c0, + 0x3e0 - 0x3c0); +
[PATCH 3/3] vfio-pci: [NOT FOR COMMIT] Hack around HD5450 I/O port backdoor
This is a hack specific to my system which I haven't even attempted to generalize yet. The ATI/AMD Radeon HD5450 VGA BIOS appears to have a backdoor to determine the physical address of the device. It reads a value matching the top byte of the I/O Port BAR from a register in VGA I/O port space then uses in/out to that address during BIOS execution. On my setup the I/O port BAR is at 0x4000 physically and emulated for the guest at 0xc. So I simply look for this access and replace 0x40 with 0xc0. That's enough for it to get through BIOS init, but it's still only partially functional (no VGA text mode). Signed-off-by: Alex Williamson alex.william...@redhat.com --- hw/vfio_pci.c |9 + 1 file changed, 9 insertions(+) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index 846e8de..5db076f 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -1041,6 +1041,15 @@ static uint64_t vfio_legacy_read(void *opaque, hwaddr addr, unsigned size) break; } +/* XXX - Complete hardcoded hack, need to figure out how common this is and + * come up with a device quirk and match host phys to guest phys. This is + * only known to be needed for an ATI/AMD Radeon HD5450 which stores the + * upper byte of the I/O port address in this unused VGA I/O port register. + */ +if (io-region_offset == 0x3c0 addr == 3 size == 1 data == 0x40) { +data = 0xc0; +} + DPRINTF(%s(0x%HWADDR_PRIx, %d) = 0x%PRIx64\n, __func__, io-region_offset + addr, size, data); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: MMU: simplify folding of dirty bit into accessed_dirty
On Thu, Dec 27, 2012 at 02:44:58PM +0200, Gleb Natapov wrote: MMU code tries to avoid if()s HW is not able to predict reliably by using bitwise operation to streamline code execution, but in case of a dirty bit folding this gives us nothing since write_fault is checked right before the folding code. Lets just piggyback onto the if() to make code more clear. Signed-off-by: Gleb Natapov g...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FreeBSD-amd64 fails to start with SMP on quemu-kvm
On Mon, Jan 07, 2013 at 06:13:22PM +0100, Artur Samborski wrote: Hello, When i try to run FreeBSD-amd64 on more than 1 vcpu in quemu-kvm (Fedora Core 17) eg. to run FreeBSD-9.0-RELEASE-amd64 with: qemu-kvm -m 1024m -cpu host -smp 2 -cdrom /storage/iso/FreeBSD-9.0-RELEASE-amd64-dvd1.iso it freezes KVM with: KVM internal error. Suberror: 1 emulation failure RAX=80b0d4c0 RBX=0009f000 RCX=c080 RDX= RSI=d238 RDI= RBP= RSP= R8 = R9 = R10= R11= R12= R13= R14= R15= RIP=0009f076 RFL=00010086 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES = f300 DPL=3 DS16 [-WA] CS =0008 00209900 DPL=0 CS64 [--A] SS =9f00 0009f000 f300 DPL=3 DS16 [-WA] DS =0018 00c09300 DPL=0 DS [-WA] FS = f300 DPL=3 DS16 [-WA] GS = f300 DPL=3 DS16 [-WA] LDT= 8200 DPL=0 LDT TR = 8b00 DPL=0 TSS64-busy GDT= 0009f080 0020 IDT= CR0=8011 CR2= CR3=0009c000 CR4=0030 DR0= DR1= DR2= DR3= DR6=0ff0 DR7=0400 EFER=0501 Code=00 00 00 80 0f 22 c0 ea 70 f0 09 00 08 00 48 b8 c0 d4 b0 80 ff ff ff ff ff e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 99 20 00 ff ff 00 00 Artur, Can you check whether https://patchwork-mail.kernel.org/patch/1942681/ fixes your problem TIA -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RESEND PATCH] pci-assign: Enable MSIX on device to match guest
On Mon, Jan 07, 2013 at 06:01:19PM +0200, Michael S. Tsirkin wrote: On Sun, Jan 06, 2013 at 09:30:31PM -0700, Alex Williamson wrote: When a guest enables MSIX on a device we evaluate the MSIX vector table, typically find no unmasked vectors and don't switch the device to MSIX mode. This generally works fine and the device will be switched once the guest enables and therefore unmasks a vector. Unfortunately some drivers enable MSIX, then use interfaces to send commands between VF PF or PF firmware that act based on the host state of the device. These therefore may break when MSIX is managed lazily. This change re-enables the previous test used to enable MSIX (see qemu-kvm a6b402c9), which basically guesses whether a vector will be used based on the data field of the vector table. Cc: qemu-sta...@nongnu.org Signed-off-by: Alex Williamson alex.william...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com --- Michael has now ack'd this patch as the correct initial first step, so I'm resending with that included. I'm actually not sure what the expected upstream path is for this file now that it's part of qemu. There's no entry for hw/kvm/* in MAINTAINERS nor anything specifically for this file. Is kvm still upstream for this, through the uq branch or is it qemu for anything not specifically part of a kvm interface? Anthony, Gleb, Marcelo, Michael, feel free to add this to your tree, any path is fine by me. Thanks, Alex I can merge this if there are no other takers. Go for it. hw/kvm/pci-assign.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c index 8ee9428..896cfe8 100644 --- a/hw/kvm/pci-assign.c +++ b/hw/kvm/pci-assign.c @@ -1031,6 +1031,19 @@ static bool assigned_dev_msix_masked(MSIXTableEntry *entry) return (entry-ctrl cpu_to_le32(0x1)) != 0; } +/* + * When MSI-X is first enabled the vector table typically has all the + * vectors masked, so we can't use that as the obvious test to figure out + * how many vectors to initially enable. Instead we look at the data field + * because this is what worked for pci-assign for a long time. This makes + * sure the physical MSI-X state tracks the guest's view, which is important + * for some VF/PF and PF/fw communication channels. + */ +static bool assigned_dev_msix_skipped(MSIXTableEntry *entry) +{ +return !entry-data; +} + static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) { AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev); @@ -1041,7 +1054,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) /* Get the usable entry number for allocating */ for (i = 0; i adev-msix_max; i++, entry++) { -if (assigned_dev_msix_masked(entry)) { +if (assigned_dev_msix_skipped(entry)) { continue; } entries_nr++; @@ -1070,7 +1083,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev) for (i = 0; i adev-msix_max; i++, entry++) { adev-msi_virq[i] = -1; -if (assigned_dev_msix_masked(entry)) { +if (assigned_dev_msix_skipped(entry)) { continue; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high
On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov g...@redhat.com wrote: On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote: Reading the spec, it is clear that most modes normally leave the IRQ output line high, and only pulse it low to generate a leading edge. Especially the most commonly used mode 2. The KVM i8254 model does not try to emulate the duration of the pulse at all, so just swap the high/low settings it to leave it high most of the time. This fix is a prerequisite to improving the i8259 model to handle the trailing edge of an interupt request as indicated in its spec: If it gets a trailing edge of an IRQ line before it starts to service the interrupt, the request should be canceled. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Risks: There is a risk that migrating a running guest between versions with and without this patch will lose or gain a single timer interrupt during the migration process. The only case where Can you elaborate on how exactly this can happen? Do not see it. KVM 8254: In the corrected model, when the count expires, the model briefly pulses output low and then high again, with the low to high transition being what triggers the interrupt. In the old model, when the count expires, the model expects the output line to already be low, and briefly pulses it high (triggering the interrupt) and then low again. But if the line was already high (because it migrated from the corrected model), this won't generate a new leading edge (low to high) and won't trigger a new interrupt (the first post-back-migration pulse turns into a simple trailing edge instead of a pulse). Unless there is something I'm missing? The qemu 8254 model actually models each edge at independent clock ticks instead of combining both into a very brief pulse at one time. I've found it handy to draw out old and new timing diagrams on paper (for each mode), and then carefully think about what happens with respect to levels and edges when you transition back and forth between old and new models at various points in the timing cycle. (Note I've spent more time examining the qemu models rather than the kvm models.) this is likely to be serious is probably losing a single-shot (mode 4) interrupt, but if my understanding of how things work is good, then that should only be possible if a whole slew of conditions are all met: 1. The guest is configured to run in a tickless mode (like modern Linux). 2. The guest is for some reason still using the i8254 rather than something more modern like an HPET. (The combination of 1 and 2 should be rare.) This is not so rare. For performance reason it is better to not have HPET at all. In fact -no-hpet is how I would advice anyone to run qemu. In a later email you mention that Linux prefers a timer in the APIC. I don't know much about the APIC (advanced interrupt controller), and wasn't even aware had it's own timer. The big question is if we can safely just fix the i825* models, or if we need something more subtle to avoid breaking commonly used guests like modern Linux (support both corrected and old models, or only fix IRQ2 instead of all IRQs, or similar subtlety). 3. The migration is going from a fixed version back to the old version. (Not sure how common this is, but it should be rarer than migrating from old to new.) 4. There are not going to be any timely events/interrupts (keyboard, network, process sleeps, etc) that cause the guest to reset the PIT mode 4 one-shot counter soon enough. This combination should be rare enough that more complicated solutions are not worth the effort. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index c1d30b2..cd4ec60 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work) } spin_unlock(ps-inject_lock); if (inject) { -kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); +/* Clear previous interrupt, then create a rising + * edge to request another interupt, and leave it at + * level=1 until time to inject another one. + */ kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 0); +kvm_set_irq(kvm, kvm-arch.vpit-irq_source_id, 0, 1); /* * Provides NMI watchdog support via Virtual Wire mode. -- 1.7.10.2.484.gcd07cc5 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes
On Mon, 7 Jan 2013 14:04:03 +0200, Gleb Natapov g...@redhat.com wrote: On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote: Make git_get_out() consistent with spec. Currently pit_get_out() doesn't affect IRQ0, but it can be read by the guest in other ways. This makes it consistent with proposed changes in qemu's i8254 model as well. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 44 ++-- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index cd4ec60..fd38938 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int channel) WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock)); +/* FIXME: Add some way to represent a paused timer and return + * the paused-at counter value, to better model gate pausing, + * wait until next CLK pulse to load counter logic, etc. + */ t = kpit_elapsed(kvm, c, channel); d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC); @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int channel) counter = (c-count - d) 0x; break; case 3: -/* XXX: may be incorrect for odd counts */ -counter = c-count - (mod_64((2 * d), c-count)); +counter = (c-count - (mod_64((2 * d), c-count))) 0xfffe; break; default: counter = c-count - mod_64(d, c-count); @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int channel) switch (c-mode) { default: case 0: -out = (d = c-count); -break; case 1: -out = (d c-count); +out = (d = c-count); break; case 2: -out = ((mod_64(d, c-count) == 0) (d != 0)); +out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0); break; case 3: -out = (mod_64(d, c-count) ((c-count + 1) 1)); +out = (mod_64(d, c-count) ((c-count + 1) 1) || c-gate == 0); break; case 4: case 5: -out = (d == c-count); +out = (d != c-count); break; } @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) /* * The largest possible initial count is 0; this is equivalent - * to 216 for binary counting and 104 for BCD counting. + * to pow(2,16) for binary counting and pow(10,4) for BCD counting. */ if (val == 0) val = 0x1; @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) if (channel != 0) { ps-channels[channel].count_load_time = ktime_get(); + +/* In gate-triggered one-shot modes, + * indirectly model some pit_get_out() + * cases by setting the load time way + * back until gate-triggered. + * (Generally only affects reading status + * from channel 2 speaker, + * due to hard-wired gates on other + * channels.) + * + * FIXME: This might be redesigned if a paused + * timer state is added for pit_get_count(). + */ +if (ps-channels[channel].mode == 1 || +ps-channels[channel].mode == 5) { +u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ); +ps-channels[channel].count_load_time = + ktime_sub(ps-channels[channel].count_load_time, + ns_to_ktime(delta)); I do not understand what are you trying to do here. You assume that trigger will happen 2 clocks after counter is loaded? Modes 1 and 5 are single-shot, and they do not start counting until GATE is triggered, potentially well after count is loaded. So this is attempting to model the start of countdown has not been triggered state as being mostly identical to the already triggered and also expired some number of clocks (2) ago state. It might be clearer to have a way to explicitly model a paused countdown, but such a mechanism doesn't currently exist. Note that modeling modes 1 and 5 is fairly low priority, because channel 0's GATE line is generally hard-wired such that GATE edges/triggers are impossible. But it may still be somewhat relevant to the PC speaker channel, or if someone might want to use this in a model of non-PC hardware. +} return; } @@ -383,7 +404,6 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) * mode 1 is one shot, mode 2 is period, otherwise del timer */ switch
RE: [PATCH v8 2/3] x86, apicv: add virtual interrupt delivery support
Marcelo Tosatti wrote on 2013-01-08: On Mon, Jan 07, 2013 at 07:48:43PM +0200, Gleb Natapov wrote: ioapic_write (or any other ioapic update) lock() perform update make_all_vcpus_request(KVM_REQ_UPDATE_EOI_BITMAP) (*) unlock() (*) Similarly to TLB flush. The advantage is that all work becomes vcpu local. The end result is much simpler code. What complexity will it remove? Synchronization between multiple CPUs (except the KVM_REQ_ bit processing, which is infrastructure shared by other parts of KVM). We agreed that performance is non issue here. The current logic is this: ioapic_write lock() perform update make request on each vcpu kick each vcpu unlock() The only difference is the way to make the request. And the complex part is performing update. With your suggestion, we still need to do the update. Why you think it is much simpler? Best regards, Yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[kvm:linux-next 16/16] arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift'
tree: git://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next head: 908e7d7999bcce70ac52e7f390a8f5cbc55948de commit: 908e7d7999bcce70ac52e7f390a8f5cbc55948de [16/16] KVM: MMU: simplify folding of dirty bit into accessed_dirty config: make ARCH=x86_64 allmodconfig All warnings: In file included from arch/x86/kvm/mmu.c:3482:0: arch/x86/kvm/paging_tmpl.h: In function 'paging64_walk_addr_generic': arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' [-Wunused-variable] In file included from arch/x86/kvm/mmu.c:3486:0: arch/x86/kvm/paging_tmpl.h: In function 'paging32_walk_addr_generic': arch/x86/kvm/paging_tmpl.h:154:57: warning: unused variable 'shift' [-Wunused-variable] vim +/shift +154 arch/x86/kvm/paging_tmpl.h 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 138 walker-ptes[level] = pte; 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 139} 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 140return 0; 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 141 } 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 142 ac79c978 drivers/kvm/paging_tmpl.h Avi Kivity 2007-01-05 143 /* ac79c978 drivers/kvm/paging_tmpl.h Avi Kivity 2007-01-05 144 * Fetch a guest pte for a guest virtual address ac79c978 drivers/kvm/paging_tmpl.h Avi Kivity 2007-01-05 145 */ 1e301feb arch/x86/kvm/paging_tmpl.h Joerg Roedel 2010-09-10 146 static int FNAME(walk_addr_generic)(struct guest_walker *walker, 1e301feb arch/x86/kvm/paging_tmpl.h Joerg Roedel 2010-09-10 147 struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 33770780 arch/x86/kvm/paging_tmpl.h Xiao Guangrong 2010-09-28 148 gva_t addr, u32 access) 6aa8b732 drivers/kvm/paging_tmpl.h Avi Kivity 2006-12-10 149 { 8cbc7069 arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 150int ret; 42bf3f0a drivers/kvm/paging_tmpl.h Avi Kivity 2007-10-17 151 pt_element_t pte; b7233635 arch/x86/kvm/paging_tmpl.h Borislav Petkov 2011-05-30 152 pt_element_t __user *uninitialized_var(ptep_user); cea0f0e7 drivers/kvm/paging_tmpl.h Avi Kivity 2007-01-05 153gfn_t table_gfn; b514c30f arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-16 @154 unsigned index, pt_access, pte_access, accessed_dirty, shift; 42bf3f0a drivers/kvm/paging_tmpl.h Avi Kivity 2007-10-17 155gpa_t pte_gpa; 134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01 156int offset; 134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01 157const int write_fault = access PFERR_WRITE_MASK; 134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01 158const int user_fault = access PFERR_USER_MASK; 134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01 159const int fetch_fault = access PFERR_FETCH_MASK; 134291bf arch/x86/kvm/paging_tmpl.h Takuya Yoshikawa 2011-07-01 160u16 errcode = 0; 13d22b6a arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-12 161gpa_t real_gpa; 13d22b6a arch/x86/kvm/paging_tmpl.h Avi Kivity 2012-09-12 162gfn_t gfn; --- 0-DAY kernel build testing backend Open Source Technology Center Fengguang Wu, Yuanhan Liu Intel Corporation -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/5] virtio: add functions for piecewise addition of buffers
Paolo Bonzini pbonz...@redhat.com writes: Il 07/01/2013 01:02, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: Il 02/01/2013 06:03, Rusty Russell ha scritto: Paolo Bonzini pbonz...@redhat.com writes: The virtqueue_add_buf function has two limitations: 1) it requires the caller to provide all the buffers in a single call; 2) it does not support chained scatterlists: the buffers must be provided as an array of struct scatterlist; Chained scatterlists are a horrible interface, but that doesn't mean we shouldn't support them if there's a need. I think I once even had a patch which passed two chained sgs, rather than a combo sg and two length numbers. It's very old, but I've pasted it below. Duplicating the implementation by having another interface is pretty nasty; I think I'd prefer the chained scatterlists, if that's optimal for you. Unfortunately, that cannot work because not all architectures support chained scatterlists. WHAT? I can't figure out what an arch needs to do to support this? It needs to use the iterator functions in its DMA driver. But we don't care for virtio. All archs we care about support them, though, so I think we can ignore this issue for now. Kind of... In principle all QEMU-supported arches can use virtio, and the speedup can be quite useful. And there is no Kconfig symbol for SG chains that I can use to disable virtio-scsi on unsupported arches. :/ Well, we #error if it's not supported. Then the lazy architectures can fix it. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] ARM updates for kvmtool
On Mon, Jan 7, 2013 at 1:14 PM, Will Deacon will.dea...@arm.com wrote: - virtio mmio fixes to deal with guest page sizes != 4k (in preparation for AArch64, which I will post separately). - .dtb dumping via the lkvm command line - Support for PSCI firmware as a replacement to the spin-table based SMP boot code The last option was implemented after discussion on the linux-arm-kernel list when adding support for the mach-virt platform. I completely missed that, it would have been nice if the kvmarm list were cc'ed on those discussions. I hope to upstream the kernel-side part of the implementation for 3.9 and expect the kvm bits to follow once that has been merged. All feedback welcome. Very cool, I'm looking forward to trying this out, hopefully I'll find cycles this week. -Christoffer -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/5] KVM: MMU: fix Dirty bit missed if CR0.WP = 0
If the write-fault access is from supervisor and CR0.WP is not set on the vcpu, kvm will fix it by adjusting pte access - it sets the W bit on pte and clears U bit. This is the chance that kvm can change pte access from readonly to writable Unfortunately, the pte access is the access of 'direct' shadow page table, means direct sp.role.access = pte_access, then we will create a writable spte entry on the readonly shadow page table. It will cause Dirty bit is not tracked when two guest ptes point to the same large page. Note, it does not have other impact except Dirty bit since cr0.wp is encoded into sp.role It can be fixed by adjusting pte access before establishing shadow page table. Also, after that, no mmu specified code exists in the common function and drop two parameters in set_spte Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/mmu.c | 47 --- arch/x86/kvm/paging_tmpl.h | 30 +++ 2 files changed, 38 insertions(+), 39 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 01d7c2a..2a3c890 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2342,8 +2342,7 @@ static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, } static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, - unsigned pte_access, int user_fault, - int write_fault, int level, + unsigned pte_access, int level, gfn_t gfn, pfn_t pfn, bool speculative, bool can_unsync, bool host_writable) { @@ -2378,9 +2377,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, spte |= (u64)pfn PAGE_SHIFT; - if ((pte_access ACC_WRITE_MASK) - || (!vcpu-arch.mmu.direct_map write_fault -!is_write_protection(vcpu) !user_fault)) { + if (pte_access ACC_WRITE_MASK) { /* * There are two cases: @@ -2399,19 +2396,6 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; - if (!vcpu-arch.mmu.direct_map -!(pte_access ACC_WRITE_MASK)) { - spte = ~PT_USER_MASK; - /* -* If we converted a user page to a kernel page, -* so that the kernel can write to it when cr0.wp=0, -* then we should prevent the kernel from executing it -* if SMEP is enabled. -*/ - if (kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)) - spte |= PT64_NX_MASK; - } - /* * Optimization: for pte sync, if spte was writable the hash * lookup is unnecessary (and expensive). Write protection @@ -2442,18 +2426,15 @@ done: static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pt_access, unsigned pte_access, -int user_fault, int write_fault, -int *emulate, int level, gfn_t gfn, -pfn_t pfn, bool speculative, -bool host_writable) +int write_fault, int *emulate, int level, gfn_t gfn, +pfn_t pfn, bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; - pgprintk(%s: spte %llx access %x write_fault %d - user_fault %d gfn %llx\n, + pgprintk(%s: spte %llx access %x write_fault %d gfn %llx\n, __func__, *sptep, pt_access, -write_fault, user_fault, gfn); +write_fault, gfn); if (is_rmap_spte(*sptep)) { /* @@ -2477,9 +2458,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, was_rmapped = 1; } - if (set_spte(vcpu, sptep, pte_access, user_fault, write_fault, - level, gfn, pfn, speculative, true, - host_writable)) { + if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, + true, host_writable)) { if (write_fault) *emulate = 1; kvm_mmu_flush_tlb(vcpu); @@ -2571,10 +2551,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, ACC_ALL, -access, 0, 0, NULL, -sp-role.level, gfn, -page_to_pfn(pages[i]), true, true); + mmu_set_spte(vcpu, start, ACC_ALL, access, 0, NULL, +sp-role.level, gfn, page_to_pfn(pages[i]), +true, true); return 0; } @@ -2636,8 +2615,8
[PATCH v5 2/5] KVM: MMU: fix infinite fault access retry
We have two issues in current code: - if target gfn is used as its page table, guest will refault then kvm will use small page size to map it. We need two #PF to fix its shadow page table - sometimes, say a exception is triggered during vm-exit caused by #PF (see handle_exception() in vmx.c), we remove all the shadow pages shadowed by the target gfn before go into page fault path, it will cause infinite loop: delete shadow pages shadowed by the gfn - try to use large page size to map the gfn - retry the access -... To fix these, we can adjust page size early if the target gfn is used as page table Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/paging_tmpl.h | 35 ++- 2 files changed, 38 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 2a3c890..54fc61e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2380,15 +2380,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (pte_access ACC_WRITE_MASK) { /* -* There are two cases: -* - the one is other vcpu creates new sp in the window -* between mapping_level() and acquiring mmu-lock. -* - the another case is the new sp is created by itself -* (page-fault path) when guest uses the target gfn as -* its page table. -* Both of these cases can be fixed by allowing guest to -* retry the access, it will refault, then we can establish -* the mapping by using small page. +* Other vcpu creates new sp in the window between +* mapping_level() and acquiring mmu-lock. We can +* allow guest to retry the access, the mapping can +* be fixed if guest refault. */ if (level PT_PAGE_TABLE_LEVEL has_wrprotected_page(vcpu-kvm, gfn, level)) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 7c575e7..67b390d 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -487,6 +487,38 @@ out_gpte_changed: return 0; } + /* + * To see whether the mapped gfn can write its page table in the current + * mapping. + * + * It is the helper function of FNAME(page_fault). When guest uses large page + * size to map the writable gfn which is used as current page table, we should + * force kvm to use small page size to map it because new shadow page will be + * created when kvm establishes shadow page table that stop kvm using large + * page size. Do it early can avoid unnecessary #PF and emulation. + * + * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok + * since the PDPT is always shadowed, that means, we can not use large page + * size to map the gfn which is used as PDPT. + */ +static bool +FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, + struct guest_walker *walker, int user_fault) +{ + int level; + gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker-level) - 1); + + if (!(walker-pte_access ACC_WRITE_MASK || + (!is_write_protection(vcpu) !user_fault))) + return false; + + for (level = walker-level; level = walker-max_level; level++) + if (!((walker-gfn ^ walker-table_gfn[level - 1]) mask)) + return true; + + return false; +} + /* * Page fault handler. There are several causes for a page fault: * - there is no shadow pte for the guest pte @@ -541,7 +573,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, } if (walker.level = PT_DIRECTORY_LEVEL) - force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn); + force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn) + || FNAME(is_self_change_mapping)(vcpu, walker, user_fault); else force_pt_level = 1; if (!force_pt_level) { -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 3/5] KVM: x86: clean up reexecute_instruction
Little cleanup for reexecute_instruction, also use gpa_to_gfn in retry_instruction Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/x86.c | 13 ++--- 1 files changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1c9c834..08cacd9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4761,19 +4761,18 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva) if (tdp_enabled) return false; + gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); + if (gpa == UNMAPPED_GVA) + return true; /* let cpu generate fault */ + /* * if emulation was due to access to shadowed page table * and it failed try to unshadow page and re-enter the * guest to let CPU execute the instruction. */ - if (kvm_mmu_unprotect_page_virt(vcpu, gva)) + if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa))) return true; - gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); - - if (gpa == UNMAPPED_GVA) - return true; /* let cpu generate fault */ - /* * Do not retry the unhandleable instruction if it faults on the * readonly host memory, otherwise it will goto a infinite loop: @@ -4828,7 +4827,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, if (!vcpu-arch.mmu.direct_map) gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL); - kvm_mmu_unprotect_page(vcpu-kvm, gpa PAGE_SHIFT); + kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)); return true; } -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/5] KVM: x86: let reexecute_instruction work for tdp
Currently, reexecute_instruction refused to retry all instructions if tdp is enabled. If nested npt is used, the emulation may be caused by shadow page, it can be fixed by dropping the shadow page. And the only condition that tdp can not retry the instruction is the access fault on error pfn Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/x86.c | 61 --- 1 files changed, 43 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 08cacd9..6f13e03 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4753,25 +4753,25 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu) return r; } -static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva) +static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2) { - gpa_t gpa; + gpa_t gpa = cr2; pfn_t pfn; - if (tdp_enabled) - return false; - - gpa = kvm_mmu_gva_to_gpa_system(vcpu, gva, NULL); - if (gpa == UNMAPPED_GVA) - return true; /* let cpu generate fault */ + if (!vcpu-arch.mmu.direct_map) { + /* +* Write permission should be allowed since only +* write access need to be emulated. +*/ + gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2, NULL); - /* -* if emulation was due to access to shadowed page table -* and it failed try to unshadow page and re-enter the -* guest to let CPU execute the instruction. -*/ - if (kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa))) - return true; + /* +* If the mapping is invalid in guest, let cpu retry +* it to generate fault. +*/ + if (gpa == UNMAPPED_GVA) + return true; + } /* * Do not retry the unhandleable instruction if it faults on the @@ -4780,12 +4780,37 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva) * instruction - ... */ pfn = gfn_to_pfn(vcpu-kvm, gpa_to_gfn(gpa)); - if (!is_error_noslot_pfn(pfn)) { - kvm_release_pfn_clean(pfn); + + /* +* If the instruction failed on the error pfn, it can not be fixed, +* report the error to userspace. +*/ + if (is_error_noslot_pfn(pfn)) + return false; + + kvm_release_pfn_clean(pfn); + + /* The instructions are well-emulated on direct mmu. */ + if (vcpu-arch.mmu.direct_map) { + unsigned int indirect_shadow_pages; + + spin_lock(vcpu-kvm-mmu_lock); + indirect_shadow_pages = vcpu-kvm-arch.indirect_shadow_pages; + spin_unlock(vcpu-kvm-mmu_lock); + + if (indirect_shadow_pages) + kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)); + return true; } - return false; + /* +* if emulation was due to access to shadowed page table +* and it failed try to unshadow page and re-enter the +* guest to let CPU execute the instruction. +*/ + kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)); + return true; } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/5] KVM: x86: improve reexecute_instruction
The current reexecute_instruction can not well detect the failed instruction emulation. It allows guest to retry all the instructions except it accesses on error pfn For example, some cases are nested-write-protect - if the page we want to write is used as PDE but it chains to itself. Under this case, we should stop the emulation and report the case to userspace Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/include/asm/kvm_host.h |7 +++ arch/x86/kvm/paging_tmpl.h | 27 --- arch/x86/kvm/x86.c |8 +++- 3 files changed, 34 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c431b33..d6ab8d2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -502,6 +502,13 @@ struct kvm_vcpu_arch { u64 msr_val; struct gfn_to_hva_cache data; } pv_eoi; + + /* +* Indicate whether the access faults on its page table in guest +* which is set when fix page fault and used to detect unhandeable +* instruction. +*/ + bool write_fault_to_shadow_pgtable; }; struct kvm_lpage_info { diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 67b390d..df50560 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -497,26 +497,34 @@ out_gpte_changed: * created when kvm establishes shadow page table that stop kvm using large * page size. Do it early can avoid unnecessary #PF and emulation. * + * @write_fault_to_shadow_pgtable will return true if the fault gfn is + * currently used as its page table. + * * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok * since the PDPT is always shadowed, that means, we can not use large page * size to map the gfn which is used as PDPT. */ static bool FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, - struct guest_walker *walker, int user_fault) + struct guest_walker *walker, int user_fault, + bool *write_fault_to_shadow_pgtable) { int level; gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker-level) - 1); + bool self_changed = false; if (!(walker-pte_access ACC_WRITE_MASK || (!is_write_protection(vcpu) !user_fault))) return false; - for (level = walker-level; level = walker-max_level; level++) - if (!((walker-gfn ^ walker-table_gfn[level - 1]) mask)) - return true; + for (level = walker-level; level = walker-max_level; level++) { + gfn_t gfn = walker-gfn ^ walker-table_gfn[level - 1]; + + self_changed |= !(gfn mask); + *write_fault_to_shadow_pgtable |= !gfn; + } - return false; + return self_changed; } /* @@ -544,7 +552,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, int level = PT_PAGE_TABLE_LEVEL; int force_pt_level; unsigned long mmu_seq; - bool map_writable; + bool map_writable, is_self_change_mapping; pgprintk(%s: addr %lx err %x\n, __func__, addr, error_code); @@ -572,9 +580,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, return 0; } + vcpu-arch.write_fault_to_shadow_pgtable = false; + + is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, + walker, user_fault, vcpu-arch.write_fault_to_shadow_pgtable); + if (walker.level = PT_DIRECTORY_LEVEL) force_pt_level = mapping_level_dirty_bitmap(vcpu, walker.gfn) - || FNAME(is_self_change_mapping)(vcpu, walker, user_fault); + || is_self_change_mapping; else force_pt_level = 1; if (!force_pt_level) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6f13e03..2957012 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4810,7 +4810,13 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t cr2) * guest to let CPU execute the instruction. */ kvm_mmu_unprotect_page(vcpu-kvm, gpa_to_gfn(gpa)); - return true; + + /* +* If the access faults on its page table, it can not +* be fixed by unprotecting shadow page and it should +* be reported to userspace. +*/ + return !vcpu-arch.write_fault_to_shadow_pgtable; } static bool retry_instruction(struct x86_emulate_ctxt *ctxt, -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
On Sun, Jan 06, 2013 at 02:36:13PM +0800, Asias He wrote: This drops the cmd completion list spin lock and makes the cmd completion queue lock-less. Signed-off-by: Asias He as...@redhat.com Nicholas, any feedback? --- drivers/vhost/tcm_vhost.c | 46 +- drivers/vhost/tcm_vhost.h | 2 +- 2 files changed, 14 insertions(+), 34 deletions(-) diff --git a/drivers/vhost/tcm_vhost.c b/drivers/vhost/tcm_vhost.c index b20df5c..3720604 100644 --- a/drivers/vhost/tcm_vhost.c +++ b/drivers/vhost/tcm_vhost.c @@ -47,6 +47,7 @@ #include linux/vhost.h #include linux/virtio_net.h /* TODO vhost.h currently depends on this */ #include linux/virtio_scsi.h +#include linux/llist.h #include vhost.c #include vhost.h @@ -64,8 +65,7 @@ struct vhost_scsi { struct vhost_virtqueue vqs[3]; struct vhost_work vs_completion_work; /* cmd completion work item */ - struct list_head vs_completion_list; /* cmd completion queue */ - spinlock_t vs_completion_lock;/* protects s_completion_list */ + struct llist_head vs_completion_list; /* cmd completion queue */ }; /* Local pointer to allocated TCM configfs fabric module */ @@ -301,9 +301,7 @@ static void vhost_scsi_complete_cmd(struct tcm_vhost_cmd *tv_cmd) { struct vhost_scsi *vs = tv_cmd-tvc_vhost; - spin_lock_bh(vs-vs_completion_lock); - list_add_tail(tv_cmd-tvc_completion_list, vs-vs_completion_list); - spin_unlock_bh(vs-vs_completion_lock); + llist_add(tv_cmd-tvc_completion_list, vs-vs_completion_list); vhost_work_queue(vs-dev, vs-vs_completion_work); } @@ -347,27 +345,6 @@ static void vhost_scsi_free_cmd(struct tcm_vhost_cmd *tv_cmd) kfree(tv_cmd); } -/* Dequeue a command from the completion list */ -static struct tcm_vhost_cmd *vhost_scsi_get_cmd_from_completion( - struct vhost_scsi *vs) -{ - struct tcm_vhost_cmd *tv_cmd = NULL; - - spin_lock_bh(vs-vs_completion_lock); - if (list_empty(vs-vs_completion_list)) { - spin_unlock_bh(vs-vs_completion_lock); - return NULL; - } - - list_for_each_entry(tv_cmd, vs-vs_completion_list, - tvc_completion_list) { - list_del(tv_cmd-tvc_completion_list); - break; - } - spin_unlock_bh(vs-vs_completion_lock); - return tv_cmd; -} - /* Fill in status and signal that we are done processing this command * * This is scheduled in the vhost work queue so we are called with the owner @@ -377,12 +354,18 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) { struct vhost_scsi *vs = container_of(work, struct vhost_scsi, vs_completion_work); + struct virtio_scsi_cmd_resp v_rsp; struct tcm_vhost_cmd *tv_cmd; + struct llist_node *llnode; + struct se_cmd *se_cmd; + int ret; - while ((tv_cmd = vhost_scsi_get_cmd_from_completion(vs))) { - struct virtio_scsi_cmd_resp v_rsp; - struct se_cmd *se_cmd = tv_cmd-tvc_se_cmd; - int ret; + llnode = llist_del_all(vs-vs_completion_list); + while (llnode) { + tv_cmd = llist_entry(llnode, struct tcm_vhost_cmd, + tvc_completion_list); + llnode = llist_next(llnode); + se_cmd = tv_cmd-tvc_se_cmd; pr_debug(%s tv_cmd %p resid %u status %#02x\n, __func__, tv_cmd, se_cmd-residual_count, se_cmd-scsi_status); @@ -426,7 +409,6 @@ static struct tcm_vhost_cmd *vhost_scsi_allocate_cmd( pr_err(Unable to allocate struct tcm_vhost_cmd\n); return ERR_PTR(-ENOMEM); } - INIT_LIST_HEAD(tv_cmd-tvc_completion_list); tv_cmd-tvc_tag = v_req-tag; tv_cmd-tvc_task_attr = v_req-task_attr; tv_cmd-tvc_exp_data_len = exp_data_len; @@ -859,8 +841,6 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) return -ENOMEM; vhost_work_init(s-vs_completion_work, vhost_scsi_complete_cmd_work); - INIT_LIST_HEAD(s-vs_completion_list); - spin_lock_init(s-vs_completion_lock); s-vqs[VHOST_SCSI_VQ_CTL].handle_kick = vhost_scsi_ctl_handle_kick; s-vqs[VHOST_SCSI_VQ_EVT].handle_kick = vhost_scsi_evt_handle_kick; diff --git a/drivers/vhost/tcm_vhost.h b/drivers/vhost/tcm_vhost.h index 7e87c63..47ee80b 100644 --- a/drivers/vhost/tcm_vhost.h +++ b/drivers/vhost/tcm_vhost.h @@ -34,7 +34,7 @@ struct tcm_vhost_cmd { /* Sense buffer that will be mapped into outgoing status */ unsigned char tvc_sense_buf[TRANSPORT_SENSE_BUFFER]; /* Completed commands list, serviced from vhost worker thread */ - struct list_head tvc_completion_list; + struct llist_node tvc_completion_list; }; struct tcm_vhost_nexus { -- 1.7.11.7 -- To unsubscribe from this
Re: [PATCH 11/12] virtio-net: migration support for multiqueue
On Fri, Dec 28, 2012 at 06:32:03PM +0800, Jason Wang wrote: This patch add migration support for multiqueue virtio-net. The version were bumped to 12. Signed-off-by: Jason Wang jasow...@redhat.com --- hw/virtio-net.c | 45 +++-- 1 files changed, 35 insertions(+), 10 deletions(-) diff --git a/hw/virtio-net.c b/hw/virtio-net.c index aaeef1b..ca4b804 100644 --- a/hw/virtio-net.c +++ b/hw/virtio-net.c @@ -21,7 +21,7 @@ #include virtio-net.h #include vhost_net.h -#define VIRTIO_NET_VM_VERSION11 +#define VIRTIO_NET_VM_VERSION12 Please don't, use a subsection instead. #define MAC_TABLE_ENTRIES64 #define MAX_VLAN(1 12) /* Per 802.1Q definition */ @@ -1058,16 +1058,18 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue, int ctrl) static void virtio_net_save(QEMUFile *f, void *opaque) { +int i; VirtIONet *n = opaque; -VirtIONetQueue *q = n-vqs[0]; -/* At this point, backend must be stopped, otherwise - * it might keep writing to memory. */ -assert(!q-vhost_started); +for (i = 0; i n-max_queues; i++) { +/* At this point, backend must be stopped, otherwise + * it might keep writing to memory. */ +assert(!n-vqs[i].vhost_started); +} virtio_save(n-vdev, f); qemu_put_buffer(f, n-mac, ETH_ALEN); -qemu_put_be32(f, q-tx_waiting); +qemu_put_be32(f, n-vqs[0].tx_waiting); qemu_put_be32(f, n-mergeable_rx_bufs); qemu_put_be16(f, n-status); qemu_put_byte(f, n-promisc); @@ -1083,13 +1085,17 @@ static void virtio_net_save(QEMUFile *f, void *opaque) qemu_put_byte(f, n-nouni); qemu_put_byte(f, n-nobcast); qemu_put_byte(f, n-has_ufo); +qemu_put_be16(f, n-max_queues); Above is specified by user so seems unnecessary in the migration stream. Below should only be put if relevant: check host feature bit set and/or max_queues 1. +qemu_put_be16(f, n-curr_queues); +for (i = 1; i n-curr_queues; i++) { +qemu_put_be32(f, n-vqs[i].tx_waiting); +} } static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) { VirtIONet *n = opaque; -VirtIONetQueue *q = n-vqs[0]; -int ret, i; +int ret, i, link_down; if (version_id 2 || version_id VIRTIO_NET_VM_VERSION) return -EINVAL; @@ -1100,7 +1106,7 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) } qemu_get_buffer(f, n-mac, ETH_ALEN); -q-tx_waiting = qemu_get_be32(f); +n-vqs[0].tx_waiting = qemu_get_be32(f); virtio_net_set_mrg_rx_bufs(n, qemu_get_be32(f)); @@ -1170,6 +1176,22 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) } } +if (version_id = 12) { +if (n-max_queues != qemu_get_be16(f)) { +error_report(virtio-net: different max_queues ); +return -1; +} + +n-curr_queues = qemu_get_be16(f); +for (i = 1; i n-curr_queues; i++) { +n-vqs[i].tx_waiting = qemu_get_be32(f); +} +} + +virtio_net_set_queues(n); +/* Must do this again, since we may have more than one active queues. */ s/queues/queue/ Also I didn't understand why it's here. It seems that virtio has vm running callback, and that will invoke virtio_net_set_status after vm load. No? +virtio_net_set_status(n-vdev, n-status); + /* Find the first multicast entry in the saved MAC filter */ for (i = 0; i n-mac_table.in_use; i++) { if (n-mac_table.macs[i * ETH_ALEN] 1) { @@ -1180,7 +1202,10 @@ static int virtio_net_load(QEMUFile *f, void *opaque, int version_id) /* nc.link_down can't be migrated, so infer link_down according * to link status bit in n-status */ -qemu_get_queue(n-nic)-link_down = (n-status VIRTIO_NET_S_LINK_UP) == 0; +link_down = (n-status VIRTIO_NET_S_LINK_UP) == 0; +for (i = 0; i n-max_queues; i++) { +qemu_get_subqueue(n-nic, i)-link_down = link_down; +} return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 07/12] virtio: introduce virtio_queue_del()
On Fri, Dec 28, 2012 at 06:31:59PM +0800, Jason Wang wrote: Some device (such as virtio-net) needs the ability to destroy or re-order the virtqueues, this patch adds a helper to do this. Signed-off-by: Jason Wang jasowang Actually del_queue unlike what the subject says :) --- hw/virtio.c |9 + hw/virtio.h |2 ++ 2 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/virtio.c b/hw/virtio.c index f40a8c5..bc3c9c3 100644 --- a/hw/virtio.c +++ b/hw/virtio.c @@ -700,6 +700,15 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, return vdev-vq[i]; } +void virtio_del_queue(VirtIODevice *vdev, int n) +{ +if (n 0 || n = VIRTIO_PCI_QUEUE_MAX) { +abort(); +} + +vdev-vq[n].vring.num = 0; +} + void virtio_irq(VirtQueue *vq) { trace_virtio_irq(vq); diff --git a/hw/virtio.h b/hw/virtio.h index 7c17f7b..f6cb0f9 100644 --- a/hw/virtio.h +++ b/hw/virtio.h @@ -138,6 +138,8 @@ VirtQueue *virtio_add_queue(VirtIODevice *vdev, int queue_size, void (*handle_output)(VirtIODevice *, VirtQueue *)); +void virtio_del_queue(VirtIODevice *vdev, int n); + void virtqueue_push(VirtQueue *vq, const VirtQueueElement *elem, unsigned int len); void virtqueue_flush(VirtQueue *vq, unsigned int count); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high
On Mon, Jan 07, 2013 at 06:17:22PM -0600, mmogi...@miniinfo.net wrote: On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov g...@redhat.com wrote: On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote: Reading the spec, it is clear that most modes normally leave the IRQ output line high, and only pulse it low to generate a leading edge. Especially the most commonly used mode 2. The KVM i8254 model does not try to emulate the duration of the pulse at all, so just swap the high/low settings it to leave it high most of the time. This fix is a prerequisite to improving the i8259 model to handle the trailing edge of an interupt request as indicated in its spec: If it gets a trailing edge of an IRQ line before it starts to service the interrupt, the request should be canceled. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Risks: There is a risk that migrating a running guest between versions with and without this patch will lose or gain a single timer interrupt during the migration process. The only case where Can you elaborate on how exactly this can happen? Do not see it. KVM 8254: In the corrected model, when the count expires, the model briefly pulses output low and then high again, with the low to high transition being what triggers the interrupt. In the old model, when the count expires, the model expects the output line to already be low, and briefly pulses it high (triggering the interrupt) and then low again. But if the line was already high (because it migrated from the corrected model), this won't generate a new leading edge (low to high) and won't trigger a new interrupt (the first post-back-migration pulse turns into a simple trailing edge instead of a pulse). Unless there is something I'm missing? No, I missed that pic-last_irr/ioapic-irr will be migrated as 1. But this means that next interrupt after migration from new to old will always be lost. What about clearing pit bit from last_irr/irr before migration? Should not affect new-new migration and should fix new-old one. The only problem is that we may need to consult irq routing table to know how pit is connected to ioapic. Still do not see how can we gain one interrupt. The qemu 8254 model actually models each edge at independent clock ticks instead of combining both into a very brief pulse at one time. I've found it handy to draw out old and new timing diagrams on paper (for each mode), and then carefully think about what happens with respect to levels and edges when you transition back and forth between old and new models at various points in the timing cycle. (Note I've spent more time examining the qemu models rather than the kvm models.) Yes, drawing it definitely helps :) this is likely to be serious is probably losing a single-shot (mode 4) interrupt, but if my understanding of how things work is good, then that should only be possible if a whole slew of conditions are all met: 1. The guest is configured to run in a tickless mode (like modern Linux). 2. The guest is for some reason still using the i8254 rather than something more modern like an HPET. (The combination of 1 and 2 should be rare.) This is not so rare. For performance reason it is better to not have HPET at all. In fact -no-hpet is how I would advice anyone to run qemu. In a later email you mention that Linux prefers a timer in the APIC. I don't know much about the APIC (advanced interrupt controller), and wasn't even aware had it's own timer. The big question is if we can safely just fix the i825* models, or if we need something more subtle to avoid breaking commonly used guests like modern Linux (support both corrected and old models, or only fix IRQ2 instead of all IRQs, or similar subtlety). Migration may happen while guest is running firmaware. Who knows what those are doing. If the fix is as easy as I described above we should go for it. 3. The migration is going from a fixed version back to the old version. (Not sure how common this is, but it should be rarer than migrating from old to new.) 4. There are not going to be any timely events/interrupts (keyboard, network, process sleeps, etc) that cause the guest to reset the PIT mode 4 one-shot counter soon enough. This combination should be rare enough that more complicated solutions are not worth the effort. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index c1d30b2..cd4ec60 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work) } spin_unlock(ps-inject_lock); if (inject) { - kvm_set_irq(kvm,
Re: [PATCH KVM v2 2/4] KVM: additional i8254 output fixes
On Mon, Jan 07, 2013 at 06:35:47PM -0600, mmogi...@miniinfo.net wrote: On Mon, 7 Jan 2013 14:04:03 +0200, Gleb Natapov g...@redhat.com wrote: On Wed, Dec 26, 2012 at 10:39:54PM -0700, Matthew Ogilvie wrote: Make git_get_out() consistent with spec. Currently pit_get_out() doesn't affect IRQ0, but it can be read by the guest in other ways. This makes it consistent with proposed changes in qemu's i8254 model as well. See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz or search the net for 23124406.pdf. Signed-off-by: Matthew Ogilvie mmogilvi_q...@miniinfo.net --- arch/x86/kvm/i8254.c | 44 ++-- 1 file changed, 34 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index cd4ec60..fd38938 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -144,6 +144,10 @@ static int pit_get_count(struct kvm *kvm, int channel) WARN_ON(!mutex_is_locked(kvm-arch.vpit-pit_state.lock)); + /* FIXME: Add some way to represent a paused timer and return + * the paused-at counter value, to better model gate pausing, + * wait until next CLK pulse to load counter logic, etc. + */ t = kpit_elapsed(kvm, c, channel); d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC); @@ -155,8 +159,7 @@ static int pit_get_count(struct kvm *kvm, int channel) counter = (c-count - d) 0x; break; case 3: - /* XXX: may be incorrect for odd counts */ - counter = c-count - (mod_64((2 * d), c-count)); + counter = (c-count - (mod_64((2 * d), c-count))) 0xfffe; break; default: counter = c-count - mod_64(d, c-count); @@ -180,20 +183,18 @@ static int pit_get_out(struct kvm *kvm, int channel) switch (c-mode) { default: case 0: - out = (d = c-count); - break; case 1: - out = (d c-count); + out = (d = c-count); break; case 2: - out = ((mod_64(d, c-count) == 0) (d != 0)); + out = (mod_64(d, c-count) != (c-count - 1) || c-gate == 0); break; case 3: - out = (mod_64(d, c-count) ((c-count + 1) 1)); + out = (mod_64(d, c-count) ((c-count + 1) 1) || c-gate == 0); break; case 4: case 5: - out = (d == c-count); + out = (d != c-count); break; } @@ -367,7 +368,7 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) /* * The largest possible initial count is 0; this is equivalent - * to 216 for binary counting and 104 for BCD counting. + * to pow(2,16) for binary counting and pow(10,4) for BCD counting. */ if (val == 0) val = 0x1; @@ -376,6 +377,26 @@ static void pit_load_count(struct kvm *kvm, int channel, u32 val) if (channel != 0) { ps-channels[channel].count_load_time = ktime_get(); + + /* In gate-triggered one-shot modes, + * indirectly model some pit_get_out() + * cases by setting the load time way + * back until gate-triggered. + * (Generally only affects reading status + * from channel 2 speaker, + * due to hard-wired gates on other + * channels.) + * + * FIXME: This might be redesigned if a paused + * timer state is added for pit_get_count(). + */ + if (ps-channels[channel].mode == 1 || + ps-channels[channel].mode == 5) { + u64 delta = muldiv64(val+2, NSEC_PER_SEC, KVM_PIT_FREQ); + ps-channels[channel].count_load_time = + ktime_sub(ps-channels[channel].count_load_time, + ns_to_ktime(delta)); I do not understand what are you trying to do here. You assume that trigger will happen 2 clocks after counter is loaded? Modes 1 and 5 are single-shot, and they do not start counting until GATE is triggered, potentially well after count is loaded. So this is attempting to model the start of countdown has not been triggered state as being mostly identical to the already triggered and also expired some number of clocks (2) ago state. So this is still not accurate. This assumes that guest loads counter and then immediately triggers the gate. If between loading the counter and triggering the gate guest does something else for a long time the result will still not be accurate. It might be clearer to have a way to explicitly model a paused countdown, but such a mechanism doesn't currently exist. If it worth doing it worth doing right. Should not be hard. Like setting channels[channel].count_load_time on trigger instead of during count loading. Note that modeling modes 1 and 5 is fairly low priority,
Re: [PATCH 3/4] KVM: PPC: BookE: Implement EPR exit
On 01/04/2013 05:41:42 PM, Alexander Graf wrote: @@ -408,6 +411,11 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, set_guest_esr(vcpu, vcpu-arch.queued_esr); if (update_dear == true) set_guest_dear(vcpu, vcpu-arch.queued_dear); + if (update_epr == true) { + kvm_make_request(KVM_REQ_EPR_EXIT, vcpu); + /* Indicate that we want to recheck requests */ + allowed = 2; + } We shouldn't need allowed = 2 anymore. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] KVM: PPC: BookE: Add EPR user space support v3
The FSL MPIC implementation contains a feature called external proxy facility which allows for interrupts to be acknowledged in the MPIC as soon as a core accepts its pending external interrupt. This patch set implements all the necessary pieces to support this from the kernel space side. v1 - v2: - do an explicit requests check rather than play with return values - rework update_epr logic - add documentation for ENABLE_CAP on EPR cap v2 - v3: - remove leftover 'allowed==2' logic Alexander Graf (3): KVM: PPC: BookE: Emulate mfspr on EPR KVM: PPC: BookE: Implement EPR exit KVM: PPC: BookE: Add EPR ONE_REG sync Mihai Caraman (1): KVM: PPC: BookE: Allow irq deliveries to inject requests Documentation/virtual/kvm/api.txt | 41 +- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |9 +++ arch/powerpc/include/uapi/asm/kvm.h |6 - arch/powerpc/kvm/booke.c| 40 +- arch/powerpc/kvm/booke_emulate.c|3 ++ arch/powerpc/kvm/powerpc.c | 10 include/linux/kvm_host.h|1 + include/uapi/linux/kvm.h|6 + 9 files changed, 114 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] KVM: PPC: BookE: Implement EPR exit
The External Proxy Facility in FSL BookE chips allows the interrupt controller to automatically acknowledge an interrupt as soon as a core gets its pending external interrupt delivered. Today, user space implements the interrupt controller, so we need to check on it during such a cycle. This patch implements logic for user space to enable EPR exiting, disable EPR exiting and EPR exiting itself, so that user space can acknowledge an interrupt when an external interrupt has successfully been delivered into the guest vcpu. Signed-off-by: Alexander Graf ag...@suse.de --- v1 - v2: - rework update_epr logic - add documentation for ENABLE_CAP on EPR cap v2 - v3: - remove leftover 'allowed==2' logic --- Documentation/virtual/kvm/api.txt | 40 +- arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h |9 +++ arch/powerpc/kvm/booke.c| 14 +++- arch/powerpc/kvm/powerpc.c | 10 include/linux/kvm_host.h|1 + include/uapi/linux/kvm.h|6 + 7 files changed, 79 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 9cf591d..66bf7cf 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2238,8 +2238,8 @@ executed a memory-mapped I/O instruction which could not be satisfied by kvm. The 'data' member contains the written data if 'is_write' is true, and should be filled by application code otherwise. -NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR - and KVM_EXIT_PAPR the corresponding +NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, + KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace @@ -2342,6 +2342,25 @@ The possible hypercalls are defined in the Power Architecture Platform Requirements (PAPR) document available from www.power.org (free developer registration required to access it). + /* KVM_EXIT_EPR */ + struct { + __u32 epr; + } epr; + +On FSL BookE PowerPC chips, the interrupt controller has a fast patch +interrupt acknowledge path to the core. When the core successfully +delivers an interrupt, it automatically populates the EPR register with +the interrupt vector number and acknowledges the interrupt inside +the interrupt controller. + +In case the interrupt controller lives in user space, we need to do +the interrupt acknowledge cycle through it to fetch the next to be +delivered interrupt vector using this exit. + +It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an +external interrupt has just been delivered into the guest. User space +should put the acknowledged interrupt vector into the 'epr' field. + /* Fix the size of the union. */ char padding[256]; }; @@ -2463,3 +2482,20 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: where num_sets is the tlb_sizes[] value divided by the tlb_ways[] value. - The tsize field of mas1 shall be set to 4K on TLB0, even though the hardware ignores this value for TLB0. + +6.4 KVM_CAP_PPC_EPR + +Architectures: ppc +Parameters: args[0] defines whether the proxy facility is active +Returns: 0 on success; -1 on error + +This capability enables or disables the delivery of interrupts through the +external proxy facility. + +When enabled (args[0] != 0), every time the guest gets an external interrupt +delivered, it automatically exits into user space with a KVM_EXIT_EPR exit +to receive the topmost interrupt vector. + +When disabled (args[0] == 0), behavior is as if this facility is unsupported. + +When this capability is enabled, KVM_EXIT_EPR can occur. diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index ab49c6c..8a72d59 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -520,6 +520,8 @@ struct kvm_vcpu_arch { u8 sane; u8 cpu_type; u8 hcall_needed; + u8 epr_enabled; + u8 epr_needed; u32 cpr0_cfgaddr; /* holds the last set cpr0_cfgaddr */ diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 5f5f69a..493630e 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -264,6 +264,15 @@ static inline void kvm_linear_init(void) {} #endif +static inline void kvmppc_set_epr(struct kvm_vcpu *vcpu, u32 epr) +{ +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GEPR, epr); +#elif defined(CONFIG_BOOKE) + vcpu-arch.epr = epr; +#endif +} + int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,