RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard Sent: Saturday, January 09, 2010 3:30 AM To: Alexander Graf Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 338baf9..e283e44 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + /* BookE does flags in ESR, so ignore those we get here */ kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } Actually, I think Book E prematurely sets ESR, since it's done before the program interrupt is actually delivered. Architecturally, I'm not sure if it's a problem, but philosophically I've always wanted it to work the way you've just implemented for Book S. ESR is updated not only by program but by data_tlb, data_storage, etc. Should we rearrange them all? Also DEAR has the same situation as ESR. Should it be updated when we decide to inject interrupt to guest? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: Fix kvm_coalesced_mmio_ring duplicate allocation
The commit 0953ca73 KVM: Simplify coalesced mmio initialization allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but didn't discard the original allocation... Signed-off-by: Sheng Yang sh...@linux.intel.com --- virt/kvm/kvm_main.c | 17 - 1 files changed, 0 insertions(+), 17 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7c5c873..2b0974a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -371,9 +371,6 @@ static struct kvm *kvm_create_vm(void) { int r = 0, i; struct kvm *kvm = kvm_arch_create_vm(); -#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - struct page *page; -#endif if (IS_ERR(kvm)) goto out; @@ -402,23 +399,9 @@ static struct kvm *kvm_create_vm(void) } } -#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - page = alloc_page(GFP_KERNEL | __GFP_ZERO); - if (!page) { - cleanup_srcu_struct(kvm-srcu); - goto out_err; - } - - kvm-coalesced_mmio_ring = - (struct kvm_coalesced_mmio_ring *)page_address(page); -#endif - r = kvm_init_mmu_notifier(kvm); if (r) { cleanup_srcu_struct(kvm-srcu); -#ifdef KVM_COALESCED_MMIO_PAGE_OFFSET - put_page(page); -#endif goto out_err; } -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On 01/20/2010 07:18 PM, Rik van Riel wrote: On 01/20/2010 07:00 AM, Avi Kivity wrote: On 01/20/2010 12:02 PM, Gleb Natapov wrote: I can inject the event as HW interrupt on vector greater then 32 but not go through APIC so EOI will not be required. This sounds non-architectural and I am not sure kernel has entry point code for this kind of event, it has one for exception and one for interrupts that goes through __do_IRQ() which assumes that interrupts should be ACKed. Further, we start to interact with the TPR; Linux doesn't use the TPR or cr8 but if it does one day we don't want it interfering with apf. That's not an issue is it? The guest will tell the host what vector to use for pseudo page faults. And kill 15 other vectors? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On 01/20/2010 08:45 PM, H. Peter Anvin wrote: On 01/20/2010 04:00 AM, Avi Kivity wrote: On 01/20/2010 12:02 PM, Gleb Natapov wrote: I can inject the event as HW interrupt on vector greater then 32 but not go through APIC so EOI will not be required. This sounds non-architectural and I am not sure kernel has entry point code for this kind of event, it has one for exception and one for interrupts that goes through __do_IRQ() which assumes that interrupts should be ACKed. Further, we start to interact with the TPR; Linux doesn't use the TPR or cr8 but if it does one day we don't want it interfering with apf. I don't think the TPR would be involved unless you involve the APIC (which you absolutely don't want to do.) What I'm trying to figure out is if you could inject this vector as external interrupt and still have it deliver if IF=0, or if it would cause any other funnies. No, and it poses problems further down the line if the hardware virtualizes more and more of the APIC as seems likely to happen. External interrupts are asynchronous events, so they're likely not to be guaranteed to be delivered on an instruction boundary like exceptions. Things like interrupt shadow will affect them as well. As that point, you do not want to go through the do_IRQ path but rather through your own exception vector entry point (it would be an entry point which doesn't get an error code, like #UD.) An error code would actually be useful. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On 01/20/2010 07:43 PM, H. Peter Anvin wrote: On 01/20/2010 02:02 AM, Gleb Natapov wrote: You can have the guest OS take an exception on a vector above 31 just fine; you just need it to tell the hypervisor which vector it, the OS, assigned for this purpose. VMX doesn't allow to inject hardware exception with vector greater then 31. SDM 3B section 23.2.1.3. OK, you're right. I had missed that... I presume it was done for implementation reasons. My expectation is that is was done for forward compatibility reasons. I can inject the event as HW interrupt on vector greater then 32 but not go through APIC so EOI will not be required. This sounds non-architectural and I am not sure kernel has entry point code for this kind of event, it has one for exception and one for interrupts that goes through __do_IRQ() which assumes that interrupts should be ACKed. You can also just emulate the state transition -- since you know you're dealing with a flat protected-mode or long-mode OS (and just make that a condition of enabling the feature) you don't have to deal with all the strange combinations of directions that an unrestricted x86 event can take. Since it's an exception, it is unconditional. Do you mean create the stack frame manually? I'd really like to avoid that for many reasons, one of which is performance (need to do all the virt-to-phys walks manually), the other is that we're certain to end up with something horribly underspecified. I'd really like to keep as close as possible to the hardware. For the alternative approach, see Xen. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On Thu, Jan 21, 2010 at 11:02:19AM +0200, Avi Kivity wrote: On 01/20/2010 07:43 PM, H. Peter Anvin wrote: On 01/20/2010 02:02 AM, Gleb Natapov wrote: You can have the guest OS take an exception on a vector above 31 just fine; you just need it to tell the hypervisor which vector it, the OS, assigned for this purpose. VMX doesn't allow to inject hardware exception with vector greater then 31. SDM 3B section 23.2.1.3. OK, you're right. I had missed that... I presume it was done for implementation reasons. My expectation is that is was done for forward compatibility reasons. I can inject the event as HW interrupt on vector greater then 32 but not go through APIC so EOI will not be required. This sounds non-architectural and I am not sure kernel has entry point code for this kind of event, it has one for exception and one for interrupts that goes through __do_IRQ() which assumes that interrupts should be ACKed. You can also just emulate the state transition -- since you know you're dealing with a flat protected-mode or long-mode OS (and just make that a condition of enabling the feature) you don't have to deal with all the strange combinations of directions that an unrestricted x86 event can take. Since it's an exception, it is unconditional. Do you mean create the stack frame manually? I'd really like to avoid that for many reasons, one of which is performance (need to do all the virt-to-phys walks manually), the other is that we're certain to end up with something horribly underspecified. I'd really like to keep as close as possible to the hardware. For the alternative approach, see Xen. That and our event injection path can't play with guest memory right now since it is done from atomic context. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On 01/21/2010 11:04 AM, Gleb Natapov wrote: Do you mean create the stack frame manually? I'd really like to avoid that for many reasons, one of which is performance (need to do all the virt-to-phys walks manually), the other is that we're certain to end up with something horribly underspecified. I'd really like to keep as close as possible to the hardware. For the alternative approach, see Xen. That and our event injection path can't play with guest memory right now since it is done from atomic context. That's true (I'd like to fix that though, for the real mode stuff). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Flush coalesced MMIO buffer periodly
The default action of coalesced MMIO is, cache the writing in buffer, until: 1. The buffer is full. 2. Or the exit to QEmu due to other reasons. But this would result in a very late writing in some condition. 1. The each time write to MMIO content is small. 2. The writing interval is big. 3. No need for input or accessing other devices frequently. This issue was observed in a experimental embbed system. The test image simply print test every 1 seconds. The output in QEmu meets expectation, but the output in KVM is delayed for seconds. Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to handle this issue. Current synchronize rate is 1/25s. Signed-off-by: Sheng Yang sh...@linux.intel.com --- qemu-kvm.c | 47 +-- qemu-kvm.h |2 ++ 2 files changed, 47 insertions(+), 2 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 599c3d6..38f890c 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id) goto err_fd; } +#ifdef KVM_CAP_COALESCED_MMIO +if (kvm_state-coalesced_mmio !kvm_state-coalesced_mmio_ring) +kvm_state-coalesced_mmio_ring = (void *) env-kvm_run + + kvm_state-coalesced_mmio * PAGE_SIZE; +#endif + return; err_fd: close(env-kvm_fd); @@ -927,8 +933,7 @@ int kvm_run(CPUState *env) #if defined(KVM_CAP_COALESCED_MMIO) if (kvm_state-coalesced_mmio) { -struct kvm_coalesced_mmio_ring *ring = -(void *) run + kvm_state-coalesced_mmio * PAGE_SIZE; +struct kvm_coalesced_mmio_ring *ring = kvm_state-coalesced_mmio_ring; while (ring-first != ring-last) { cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr, ring-coalesced_mmio[ring-first].data[0], @@ -2073,6 +2078,29 @@ static void io_thread_wakeup(void *opaque) } } +#ifdef KVM_CAP_COALESCED_MMIO + +/* flush interval is 1/25 second */ +#define KVM_COALESCED_MMIO_FLUSH_INTERVAL4000LL + +static void flush_coalesced_mmio_buffer(void *opaque) +{ +if (kvm_state-coalesced_mmio_ring) { +struct kvm_coalesced_mmio_ring *ring = +kvm_state-coalesced_mmio_ring; +while (ring-first != ring-last) { +cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr, + ring-coalesced_mmio[ring-first].data[0], + ring-coalesced_mmio[ring-first].len, 1); +smp_wmb(); +ring-first = (ring-first + 1) % KVM_COALESCED_MMIO_MAX; +} +} +qemu_mod_timer(kvm_state-coalesced_mmio_timer, + qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL); +} +#endif + int kvm_main_loop(void) { int fds[2]; @@ -2117,6 +2145,15 @@ int kvm_main_loop(void) io_thread_sigfd = sigfd; cpu_single_env = NULL; +#ifdef KVM_CAP_COALESCED_MMIO +if (kvm_state-coalesced_mmio) { +kvm_state-coalesced_mmio_timer = +qemu_new_timer(host_clock, flush_coalesced_mmio_buffer, NULL); +qemu_mod_timer(kvm_state-coalesced_mmio_timer, +qemu_get_clock(host_clock) + KVM_COALESCED_MMIO_FLUSH_INTERVAL); +} +#endif + while (1) { main_loop_wait(1000); if (qemu_shutdown_requested()) { @@ -2135,6 +2172,12 @@ int kvm_main_loop(void) } } +#ifdef KVM_CAP_COALESCED_MMIO +if (kvm_state-coalesced_mmio) { +qemu_del_timer(kvm_state-coalesced_mmio_timer); +qemu_free_timer(kvm_state-coalesced_mmio_timer); +} +#endif pause_all_threads(); pthread_mutex_unlock(qemu_mutex); diff --git a/qemu-kvm.h b/qemu-kvm.h index 6b3e5a1..17f9d1b 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -1144,6 +1144,8 @@ typedef struct KVMState { int fd; int vmfd; int coalesced_mmio; +struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; +struct QEMUTimer *coalesced_mmio_timer; int broken_set_mem_region; int migration_log; int vcpu_events; -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm: Flush coalesced MMIO buffer periodly
On 01/21/2010 11:37 AM, Sheng Yang wrote: The default action of coalesced MMIO is, cache the writing in buffer, until: 1. The buffer is full. 2. Or the exit to QEmu due to other reasons. But this would result in a very late writing in some condition. 1. The each time write to MMIO content is small. 2. The writing interval is big. 3. No need for input or accessing other devices frequently. This issue was observed in a experimental embbed system. The test image simply print test every 1 seconds. The output in QEmu meets expectation, but the output in KVM is delayed for seconds. Per Avi's suggestion, I add a periodly flushing coalesced MMIO buffer in QEmu IO thread. By this way, We don't need vcpu explicit exit to QEmu to handle this issue. Current synchronize rate is 1/25s. I'm not sure that a new timer is needed. If the only problem case is the display, maybe we can flush coalesced mmio from the vga refresh timer. That ensures that we flash exactly when needed, and don't have extra timers. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)
On 01/20/2010 07:20 PM, Jan Kiszka wrote: Major parts of this series were already posted a while ago during the debug register switch optimizations. This version now comes with an additional fix for VMX (patch 1) and a rework of mov dr emulation for SVM. Looks good. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-s390: fix potential array overrun in intercept handling
Avi, Marcelo, kvm_handle_sie_intercept uses a jump table to get the intercept handler for a SIE intercept. Static code analysis revealed a potential problem: the intercept_funcs jump table was defined to contain (0x48 2) entries, but we only checked for code 0x48 which would cause an off-by-one array overflow if code == 0x48. Since the table is only populated up to (0x28 2), we can reduce the jump table size while fixing the off-by-one. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- (patch was refreshed with -U8 to see the full jump table.) arch/s390/kvm/intercept.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/arch/s390/kvm/intercept.c === --- linux-2.6.orig/arch/s390/kvm/intercept.c +++ linux-2.6/arch/s390/kvm/intercept.c @@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s if (rc == -ENOTSUPP) vcpu-arch.sie_block-icptcode = 0x04; if (rc) return rc; return rc2; } -static const intercept_handler_t intercept_funcs[0x48 2] = { +static const intercept_handler_t intercept_funcs[(0x28 2) + 1] = { [0x00 2] = handle_noop, [0x04 2] = handle_instruction, [0x08 2] = handle_prog, [0x0C 2] = handle_instruction_and_prog, [0x10 2] = handle_noop, [0x14 2] = handle_noop, [0x1C 2] = kvm_s390_handle_wait, [0x20 2] = handle_validity, [0x28 2] = handle_stop, }; int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu) { intercept_handler_t func; u8 code = vcpu-arch.sie_block-icptcode; - if (code 3 || code 0x48) + if (code 3 || code 0x28) return -ENOTSUPP; func = intercept_funcs[code 2]; if (func) return func(vcpu); return -ENOTSUPP; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] kvm-s390: fix potential array overrun in intercept handling
v2: apply Avis suggestions about ARRAY_SIZE. kvm_handle_sie_intercept uses a jump table to get the intercept handler for a SIE intercept. Static code analysis revealed a potential problem: the intercept_funcs jump table was defined to contain (0x48 2) entries, but we only checked for code 0x48 which would cause an off-by-one array overflow if code == 0x48. Use the compiler and ARRAY_SIZE to automatically set the limits. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- (patch was refreshed with -U8 to see the full jump table.) arch/s390/kvm/intercept.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) Index: linux-2.6/arch/s390/kvm/intercept.c === --- linux-2.6.orig/arch/s390/kvm/intercept.c +++ linux-2.6/arch/s390/kvm/intercept.c @@ -208,32 +208,32 @@ static int handle_instruction_and_prog(s if (rc == -ENOTSUPP) vcpu-arch.sie_block-icptcode = 0x04; if (rc) return rc; return rc2; } -static const intercept_handler_t intercept_funcs[0x48 2] = { +static const intercept_handler_t intercept_funcs[] = { [0x00 2] = handle_noop, [0x04 2] = handle_instruction, [0x08 2] = handle_prog, [0x0C 2] = handle_instruction_and_prog, [0x10 2] = handle_noop, [0x14 2] = handle_noop, [0x1C 2] = kvm_s390_handle_wait, [0x20 2] = handle_validity, [0x28 2] = handle_stop, }; int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu) { intercept_handler_t func; u8 code = vcpu-arch.sie_block-icptcode; - if (code 3 || code 0x48) + if (code 3 || (code 2) = ARRAY_SIZE(intercept_funcs)) return -ENOTSUPP; func = intercept_funcs[code 2]; if (func) return func(vcpu); return -ENOTSUPP; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling
- if (code 3 || code 0x48) + if (code 3 || (code 2) = ARRAY_SIZE(intercept_funcs)) return -ENOTSUPP; Not that it matters for this patch, but -ENOTSUPP should not leak to userspace. Not sure if it does somewhere, but it is used all over the place within arch/s390/kvm... Use -EOPNOTSUPP or something similar instead. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling
Am Donnerstag 21 Januar 2010 12:24:18 schrieb Heiko Carstens: - if (code 3 || code 0x48) + if (code 3 || (code 2) = ARRAY_SIZE(intercept_funcs)) return -ENOTSUPP; Not that it matters for this patch, but -ENOTSUPP should not leak to userspace. Not sure if it does somewhere, but it is used all over the place within arch/s390/kvm... Use -EOPNOTSUPP or something similar instead. AFAICS it does not leak to userspace, ENOTSUPP is an internal code. see kvm_arch_vcpu_ioctl_run: [...] if (rc == -ENOTSUPP) { /* intercept cannot be handled in-kernel, prepare kvm-run */ kvm_run-exit_reason = KVM_EXIT_S390_SIEIC; kvm_run-s390_sieic.icptcode = vcpu-arch.sie_block-icptcode; kvm_run-s390_sieic.ipa = vcpu-arch.sie_block-ipa; kvm_run-s390_sieic.ipb = vcpu-arch.sie_block-ipb; rc = 0; } [...] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.
From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:18:46 +0800 Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt. 1. setup madt bios_info structure, so that static dsdt get run-time madt info like checksum address, lapic address, max cpu numbers, with least hardcode magic number (realmode address of bios_info). 2. setup vcpu add/remove dsdt infrastructure, including processor related acpi objects and control methods. vcpu add/remove will trigger SCI and then control method _L02. By matching madt, vcpu number and add/remove action were found, then by notify control method, it will notify OS acpi driver. Signed-off-by: Liu, Jinsong jinsong@intel.com --- src/acpi-dsdt.dsl | 131 - src/acpi-dsdt.hex | 441 ++--- src/acpi.c|7 + src/biosvar.h | 14 ++ src/post.c| 13 ++ 5 files changed, 582 insertions(+), 24 deletions(-) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index cc31112..ed78489 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -700,8 +700,11 @@ DefinitionBlock ( Return (0x01) } +/* + * _L02 method for CPU notification + */ Method(_L02) { -Return(0x01) +Return(\_PR.PRSC()) } Method(_L03) { Return(0x01) @@ -744,4 +747,130 @@ DefinitionBlock ( } } + +Scope (\_PR) +{ +/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */ +OperationRegion(BIOS, SystemMemory, 0xEA000, 16) +Field(BIOS, DwordAcc, NoLock, Preserve) +{ +MSUA, 32, /* MADT checksum address */ +MAPA, 32, /* MADT LAPIC0 address */ +PBYT, 32, /* bytes of max vcpus bitmap */ +PBIT, 32 /* bits of last byte of max vcpus bitmap */ +} + +OperationRegion(MSUM, SystemMemory, MSUA, 1) +Field(MSUM, ByteAcc, NoLock, Preserve) +{ +MSU, 8/* MADT checksum */ +} + +#define gen_processor(nr, name) \ +Processor (C##name, nr, 0xb010, 0x06) { \ +Name (_HID, ACPI0007) \ +OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 8) \ +Field (MATR, ByteAcc, NoLock, Preserve) \ +{ \ +MAT, 64 \ +} \ +Field (MATR, ByteAcc, NoLock, Preserve) \ +{ \ +Offset(4),\ +FLG, 1\ +} \ +Method(_MAT, 0) { \ +Return(ToBuffer(MAT)) \ +} \ +Method (_STA) { \ +If (FLG) { Return(0xF) } Else { Return(0x9) } \ +} \ +Method (_EJ0, 1, NotSerialized) { \ +Sleep (0xC8) \ +} \ +} \ + +gen_processor(0, 0) +gen_processor(1, 1) +gen_processor(2, 2) +gen_processor(3, 3) +gen_processor(4, 4) +gen_processor(5, 5) +gen_processor(6, 6) +gen_processor(7, 7) +gen_processor(8, 8) +gen_processor(9, 9) +gen_processor(10, A) +gen_processor(11, B) +gen_processor(12, C) +gen_processor(13, D) +gen_processor(14, E) + + +Method (NTFY, 2) { +#define gen_ntfy(nr)\ +If (LEqual(Arg0, 0x##nr)) { \ +If (LNotEqual(Arg1, \_PR.C##nr.FLG)) { \ +Store (Arg1, \_PR.C##nr.FLG)\ +If (LEqual(Arg1, 1)) { \ +Notify(C##nr, 1)\ +Subtract(\_PR.MSU, 1, \_PR.MSU) \ +} Else {
[PATCH] Debug vcpu add
From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:30:33 +0800 Subject: [PATCH] Debug vcpu add Add 'kvm_vcpu_inited' check so that when adding vcpu it will not cause segmentation fault. This is especially necessary when vpu hotadd after guestos ready. Signed-off-by: Liu, Jinsong jinsong@intel.com --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 599c3d6..bdf90b4 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env) void kvm_load_mpstate(CPUState *env) { -if (kvm_enabled() qemu_system_ready) +if (kvm_enabled() qemu_system_ready kvm_vcpu_inited(env)) on_vcpu(env, kvm_do_load_mpstate, env); } -- 1.6.5.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
vcpu hotplug support
Avi, I just send 2 patches for KVM vcpu hotplug support. 1 is seabios patch: Setup vcpu add/remove infrastructure, including madt bios_info and dsdt 2 is qemu-kvm patch: Debug vcpu add Thanks, Jinsong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
On 21.01.2010, at 09:09, Liu Yu-B13201 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard Sent: Saturday, January 09, 2010 3:30 AM To: Alexander Graf Cc: kvm@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 338baf9..e283e44 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + /* BookE does flags in ESR, so ignore those we get here */ kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } Actually, I think Book E prematurely sets ESR, since it's done before the program interrupt is actually delivered. Architecturally, I'm not sure if it's a problem, but philosophically I've always wanted it to work the way you've just implemented for Book S. ESR is updated not only by program but by data_tlb, data_storage, etc. Should we rearrange them all? Also DEAR has the same situation as ESR. Should it be updated when we decide to inject interrupt to guest? If that's what the hardware does, then yes. I'm good with taking small steps though. So if you don't have the time to convert all of the handlers, you can easily start off with program interrupts. Alex-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vcpu hotplug support
On 01/21/2010 01:54 PM, Liu, Jinsong wrote: Avi, I just send 2 patches for KVM vcpu hotplug support. 1 is seabios patch: Setup vcpu add/remove infrastructure, including madt bios_info and dsdt 2 is qemu-kvm patch: Debug vcpu add The patches look reasonable (of course I'd like to see Gleb review it), but please send the seabios patch to the seabios mailing list (seab...@seabios.org) so we don't have to diverge. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Debug vcpu add
On Thursday 21 January 2010 19:50:17 Liu, Jinsong wrote: From 479e84d9ce9d7d78d845f438071a4b1a44aca0bb Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:30:33 +0800 Subject: [PATCH] Debug vcpu add Jinsong, this name is pretty strange... I think something like Fix vcpu hot add feature should be more proper... -- regards Yang, Sheng Add 'kvm_vcpu_inited' check so that when adding vcpu it will not cause segmentation fault. This is especially necessary when vpu hotadd after guestos ready. Signed-off-by: Liu, Jinsong jinsong@intel.com --- qemu-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 599c3d6..bdf90b4 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1618,7 +1618,7 @@ static void kvm_do_load_mpstate(void *_env) void kvm_load_mpstate(CPUState *env) { -if (kvm_enabled() qemu_system_ready) +if (kvm_enabled() qemu_system_ready kvm_vcpu_inited(env)) on_vcpu(env, kvm_do_load_mpstate, env); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.
On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote: From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:18:46 +0800 Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt. 1. setup madt bios_info structure, so that static dsdt get run-time madt info like checksum address, lapic address, max cpu numbers, with least hardcode magic number (realmode address of bios_info). 2. setup vcpu add/remove dsdt infrastructure, including processor related acpi objects and control methods. vcpu add/remove will trigger SCI and then control method _L02. By matching madt, vcpu number and add/remove action were found, then by notify control method, it will notify OS acpi driver. Signed-off-by: Liu, Jinsong jinsong@intel.com It looks like AML code is a port of what we had in BOCHS bios with minor changes. Can you detail what is changed and why for easy review please? And this still doesn't work with Windows I assume. --- src/acpi-dsdt.dsl | 131 - src/acpi-dsdt.hex | 441 ++--- src/acpi.c|7 + src/biosvar.h | 14 ++ src/post.c| 13 ++ 5 files changed, 582 insertions(+), 24 deletions(-) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index cc31112..ed78489 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -700,8 +700,11 @@ DefinitionBlock ( Return (0x01) } +/* + * _L02 method for CPU notification + */ Method(_L02) { -Return(0x01) +Return(\_PR.PRSC()) } Method(_L03) { Return(0x01) @@ -744,4 +747,130 @@ DefinitionBlock ( } } + +Scope (\_PR) +{ +/* BIOS_INFO_PHYSICAL_ADDRESS == 0xEA000 */ +OperationRegion(BIOS, SystemMemory, 0xEA000, 16) +Field(BIOS, DwordAcc, NoLock, Preserve) +{ +MSUA, 32, /* MADT checksum address */ +MAPA, 32, /* MADT LAPIC0 address */ +PBYT, 32, /* bytes of max vcpus bitmap */ +PBIT, 32 /* bits of last byte of max vcpus bitmap */ Why do you need PBYT/PBIT? Adds complexity for no apparent reason. +} + +OperationRegion(MSUM, SystemMemory, MSUA, 1) +Field(MSUM, ByteAcc, NoLock, Preserve) +{ +MSU, 8/* MADT checksum */ +} + +#define gen_processor(nr, name) \ +Processor (C##name, nr, 0xb010, 0x06) { \ +Name (_HID, ACPI0007) \ +OperationRegion(MATR, SystemMemory, Add(MAPA, Multiply(nr,8)), 8) \ +Field (MATR, ByteAcc, NoLock, Preserve) \ +{ \ +MAT, 64 \ +} \ +Field (MATR, ByteAcc, NoLock, Preserve) \ +{ \ +Offset(4), \ +FLG, 1 \ +} \ +Method(_MAT, 0) { \ +Return(ToBuffer(MAT)) \ +} \ +Method (_STA) { \ +If (FLG) { Return(0xF) } Else { Return(0x9) } \ +} \ +Method (_EJ0, 1, NotSerialized) { \ +Sleep (0xC8) \ +} \ Why _EJ0 is needed? +} \ + +gen_processor(0, 0) +gen_processor(1, 1) +gen_processor(2, 2) +gen_processor(3, 3) +gen_processor(4, 4) +gen_processor(5, 5) +gen_processor(6, 6) +gen_processor(7, 7) +gen_processor(8, 8) +gen_processor(9, 9) +gen_processor(10, A) +gen_processor(11, B) +gen_processor(12, C) +
[PATCH] fix checking of cr0 validity
Move to/from Control Registers chapter of Intel SDM says. Reserved bits in CR0 remain clear after any load of those registers; attempts to set them have no impact. Control Register chapter says Bits 63:32 of CR0 are reserved and must be written with zeros. Writing a nonzero value to any of the upper 32 bits results in a general-protection exception, #GP(0). This patch tries to implement this twisted logic. Signed-off-by: Gleb Natapov g...@redhat.com Reported-by: Lorenzo Martignoni martig...@gmail.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 47c6e23..1df691d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -430,12 +430,16 @@ void kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { cr0 |= X86_CR0_ET; - if (cr0 CR0_RESERVED_BITS) { +#ifdef CONFIG_X86_64 + if (cr0 0xlu) { printk(KERN_DEBUG set_cr0: 0x%lx #GP, reserved bits 0x%lx\n, cr0, kvm_read_cr0(vcpu)); kvm_inject_gp(vcpu, 0); return; } +#endif + + cr0 = ~CR0_RESERVED_BITS; if ((cr0 X86_CR0_NW) !(cr0 X86_CR0_CD)) { printk(KERN_DEBUG set_cr0: #GP, CD == 0 NW == 1\n); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/8] cr0/cr4/efer/fpu miscellaneous bits
Mostly trivial cleanups with the exception of a patch activating the fpu on clts. Avi Kivity (8): KVM: Allow kvm_load_guest_fpu() even when !vcpu-fpu_active KVM: Drop kvm_{load,put}_guest_fpu() exports KVM: Activate fpu on clts KVM: Add a helper for checking if the guest is in protected mode KVM: Move cr0/cr4/efer related helpers to x86.h KVM: Rename vcpu-shadow_efer to efer KVM: Optimize kvm_read_cr[04]_bits() KVM: trace guest fpu loads and unloads arch/x86/include/asm/kvm_host.h |3 ++- arch/x86/kvm/emulate.c | 10 -- arch/x86/kvm/kvm_cache_regs.h |9 +++-- arch/x86/kvm/mmu.c |3 ++- arch/x86/kvm/mmu.h | 24 arch/x86/kvm/svm.c | 20 +--- arch/x86/kvm/vmx.c | 19 ++- arch/x86/kvm/x86.c | 31 --- arch/x86/kvm/x86.h | 30 ++ include/trace/events/kvm.h | 19 +++ 10 files changed, 103 insertions(+), 65 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/8] KVM: Allow kvm_load_guest_fpu() even when !vcpu-fpu_active
This allows accessing the guest fpu from the instruction emulator, as well as being symmetric with kvm_put_guest_fpu(). Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 47c6e23..e3145d5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4251,7 +4251,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); - kvm_load_guest_fpu(vcpu); + if (vcpu-fpu_active) + kvm_load_guest_fpu(vcpu); local_irq_disable(); @@ -5297,7 +5298,7 @@ EXPORT_SYMBOL_GPL(fx_init); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) { - if (!vcpu-fpu_active || vcpu-guest_fpu_loaded) + if (vcpu-guest_fpu_loaded) return; vcpu-guest_fpu_loaded = 1; -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/8] KVM: Activate fpu on clts
Assume that if the guest executes clts, it knows what it's doing, and load the guest fpu to prevent an #NM exception. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/svm.c |8 +++- arch/x86/kvm/vmx.c |1 + arch/x86/kvm/x86.c |1 + 4 files changed, 10 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index a1f0b5d..bf3ec76 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -512,6 +512,7 @@ struct kvm_x86_ops { void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); unsigned long (*get_rflags)(struct kvm_vcpu *vcpu); void (*set_rflags)(struct kvm_vcpu *vcpu, unsigned long rflags); + void (*fpu_activate)(struct kvm_vcpu *vcpu); void (*fpu_deactivate)(struct kvm_vcpu *vcpu); void (*tlb_flush)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 8d7cb62..0f3738a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1259,12 +1259,17 @@ static int ud_interception(struct vcpu_svm *svm) return 1; } -static int nm_interception(struct vcpu_svm *svm) +static void svm_fpu_activate(struct kvm_vcpu *vcpu) { + struct vcpu_svm *svm = to_svm(vcpu); svm-vmcb-control.intercept_exceptions = ~(1 NM_VECTOR); svm-vcpu.fpu_active = 1; update_cr0_intercept(svm); +} +static int nm_interception(struct vcpu_svm *svm) +{ + svm_fpu_activate(svm-vcpu); return 1; } @@ -2971,6 +2976,7 @@ static struct kvm_x86_ops svm_x86_ops = { .cache_reg = svm_cache_reg, .get_rflags = svm_get_rflags, .set_rflags = svm_set_rflags, + .fpu_activate = svm_fpu_activate, .fpu_deactivate = svm_fpu_deactivate, .tlb_flush = svm_flush_tlb, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7375ae1..372bc38 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3011,6 +3011,7 @@ static int handle_cr(struct kvm_vcpu *vcpu) vmcs_writel(CR0_READ_SHADOW, kvm_read_cr0(vcpu)); trace_kvm_cr_write(0, kvm_read_cr0(vcpu)); skip_emulated_instruction(vcpu); + vmx_fpu_activate(vcpu); return 1; case 1: /*mov from cr*/ switch (cr) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index feca59f..09207ba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3266,6 +3266,7 @@ int emulate_invlpg(struct kvm_vcpu *vcpu, gva_t address) int emulate_clts(struct kvm_vcpu *vcpu) { kvm_x86_ops-set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~X86_CR0_TS)); + kvm_x86_ops-fpu_activate(vcpu); return X86EMUL_CONTINUE; } -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/8] KVM: Add a helper for checking if the guest is in protected mode
Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c |9 - arch/x86/kvm/vmx.c |4 ++-- arch/x86/kvm/x86.c |7 +++ arch/x86/kvm/x86.h |6 ++ 4 files changed, 15 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0f89e32..e46f276 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -32,6 +32,7 @@ #include linux/module.h #include asm/kvm_emulate.h +#include x86.h #include mmu.h /* for is_long_mode() */ /* @@ -1515,7 +1516,7 @@ emulate_syscall(struct x86_emulate_ctxt *ctxt) /* syscall is not available in real mode */ if (c-lock_prefix || ctxt-mode == X86EMUL_MODE_REAL - || !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE)) + || !is_protmode(ctxt-vcpu)) return -1; setup_syscalls_segments(ctxt, cs, ss); @@ -1568,8 +1569,7 @@ emulate_sysenter(struct x86_emulate_ctxt *ctxt) return -1; /* inject #GP if in real mode or paging is disabled */ - if (ctxt-mode == X86EMUL_MODE_REAL || - !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE)) { + if (ctxt-mode == X86EMUL_MODE_REAL || !is_protmode(ctxt-vcpu)) { kvm_inject_gp(ctxt-vcpu, 0); return -1; } @@ -1634,8 +1634,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt) return -1; /* inject #GP if in real mode or paging is disabled */ - if (ctxt-mode == X86EMUL_MODE_REAL - || !kvm_read_cr0_bits(ctxt-vcpu, X86_CR0_PE)) { + if (ctxt-mode == X86EMUL_MODE_REAL || !is_protmode(ctxt-vcpu)) { kvm_inject_gp(ctxt-vcpu, 0); return -1; } diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 372bc38..cd78049 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1853,7 +1853,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu, static int vmx_get_cpl(struct kvm_vcpu *vcpu) { - if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) /* if real mode */ + if (!is_protmode(vcpu)) return 0; if (vmx_get_rflags(vcpu) X86_EFLAGS_VM) /* if virtual 8086 */ @@ -2108,7 +2108,7 @@ static bool cs_ss_rpl_check(struct kvm_vcpu *vcpu) static bool guest_state_valid(struct kvm_vcpu *vcpu) { /* real mode guest state checks */ - if (!kvm_read_cr0_bits(vcpu, X86_CR0_PE)) { + if (!is_protmode(vcpu)) { if (!rmode_segment_valid(vcpu, VCPU_SREG_CS)) return false; if (!rmode_segment_valid(vcpu, VCPU_SREG_SS)) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 09207ba..6cdead0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3798,8 +3798,7 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) * hypercall generates UD from non zero cpl and real mode * per HYPER-V spec */ - if (kvm_x86_ops-get_cpl(vcpu) != 0 || - !kvm_read_cr0_bits(vcpu, X86_CR0_PE)) { + if (kvm_x86_ops-get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { kvm_queue_exception(vcpu, UD_VECTOR); return 0; } @@ -4763,7 +4762,7 @@ int kvm_load_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector, { struct kvm_segment kvm_seg; - if (is_vm86_segment(vcpu, seg) || !(kvm_read_cr0_bits(vcpu, X86_CR0_PE))) + if (is_vm86_segment(vcpu, seg) || !is_protmode(vcpu)) return kvm_load_realmode_segment(vcpu, selector, seg); if (load_segment_descriptor_to_kvm_desct(vcpu, selector, kvm_seg)) return 1; @@ -5115,7 +5114,7 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, /* Older userspace won't unhalt the vcpu on reset. */ if (kvm_vcpu_is_bsp(vcpu) kvm_rip_read(vcpu) == 0xfff0 sregs-cs.selector == 0xf000 sregs-cs.base == 0x - !(kvm_read_cr0_bits(vcpu, X86_CR0_PE))) + !is_protmode(vcpu)) vcpu-arch.mp_state = KVM_MP_STATE_RUNNABLE; vcpu_put(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 5eadea5..f783d8f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -2,6 +2,7 @@ #define ARCH_X86_KVM_X86_H #include linux/kvm_host.h +#include kvm_cache_regs.h static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) { @@ -35,4 +36,9 @@ static inline bool kvm_exception_is_soft(unsigned int nr) struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, u32 function, u32 index); +static inline bool is_protmode(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr0_bits(vcpu, X86_CR0_PE); +} + #endif -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/8] KVM: Drop kvm_{load,put}_guest_fpu() exports
Not used anymore. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e3145d5..feca59f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5305,7 +5305,6 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) kvm_fx_save(vcpu-arch.host_fx_image); kvm_fx_restore(vcpu-arch.guest_fx_image); } -EXPORT_SYMBOL_GPL(kvm_load_guest_fpu); void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) { @@ -5318,7 +5317,6 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) ++vcpu-stat.fpu_reload; set_bit(KVM_REQ_DEACTIVATE_FPU, vcpu-requests); } -EXPORT_SYMBOL_GPL(kvm_put_guest_fpu); void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) { -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/8] KVM: Rename vcpu-shadow_efer to efer
None of the other registers have the shadow_ prefix. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_host.h |2 +- arch/x86/kvm/mmu.c |2 +- arch/x86/kvm/svm.c | 12 ++-- arch/x86/kvm/vmx.c | 14 +++--- arch/x86/kvm/x86.c | 14 +++--- arch/x86/kvm/x86.h |2 +- 6 files changed, 23 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index bf3ec76..76bf686 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -277,7 +277,7 @@ struct kvm_vcpu_arch { unsigned long cr8; u32 hflags; u64 pdptrs[4]; /* pae */ - u64 shadow_efer; + u64 efer; u64 apic_base; struct kvm_lapic *apic;/* kernel irqchip context */ int32_t apic_arb_prio; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 6f7158f..599c422 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -237,7 +237,7 @@ static int is_cpuid_PSE36(void) static int is_nx(struct kvm_vcpu *vcpu) { - return vcpu-arch.shadow_efer EFER_NX; + return vcpu-arch.efer EFER_NX; } static int is_shadow_present_pte(u64 pte) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 0f3738a..0242fdd 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -231,7 +231,7 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) efer = ~EFER_LME; to_svm(vcpu)-vmcb-save.efer = efer | EFER_SVME; - vcpu-arch.shadow_efer = efer; + vcpu-arch.efer = efer; } static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, @@ -990,14 +990,14 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) struct vcpu_svm *svm = to_svm(vcpu); #ifdef CONFIG_X86_64 - if (vcpu-arch.shadow_efer EFER_LME) { + if (vcpu-arch.efer EFER_LME) { if (!is_paging(vcpu) (cr0 X86_CR0_PG)) { - vcpu-arch.shadow_efer |= EFER_LMA; + vcpu-arch.efer |= EFER_LMA; svm-vmcb-save.efer |= EFER_LMA | EFER_LME; } if (is_paging(vcpu) !(cr0 X86_CR0_PG)) { - vcpu-arch.shadow_efer = ~EFER_LMA; + vcpu-arch.efer = ~EFER_LMA; svm-vmcb-save.efer = ~(EFER_LMA | EFER_LME); } } @@ -1361,7 +1361,7 @@ static int vmmcall_interception(struct vcpu_svm *svm) static int nested_svm_check_permissions(struct vcpu_svm *svm) { - if (!(svm-vcpu.arch.shadow_efer EFER_SVME) + if (!(svm-vcpu.arch.efer EFER_SVME) || !is_paging(svm-vcpu)) { kvm_queue_exception(svm-vcpu, UD_VECTOR); return 1; @@ -1764,7 +1764,7 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) hsave-save.ds = vmcb-save.ds; hsave-save.gdtr = vmcb-save.gdtr; hsave-save.idtr = vmcb-save.idtr; - hsave-save.efer = svm-vcpu.arch.shadow_efer; + hsave-save.efer = svm-vcpu.arch.efer; hsave-save.cr0= kvm_read_cr0(svm-vcpu); hsave-save.cr4= svm-vcpu.arch.cr4; hsave-save.rflags = vmcb-save.rflags; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cd78049..d4a6260 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -618,7 +618,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) u64 guest_efer; u64 ignore_bits; - guest_efer = vmx-vcpu.arch.shadow_efer; + guest_efer = vmx-vcpu.arch.efer; /* * NX is emulated; LMA and LME handled by hardware; SCE meaninless @@ -963,7 +963,7 @@ static void setup_msrs(struct vcpu_vmx *vmx) * if efer.sce is enabled. */ index = __find_msr_index(vmx, MSR_K6_STAR); - if ((index = 0) (vmx-vcpu.arch.shadow_efer EFER_SCE)) + if ((index = 0) (vmx-vcpu.arch.efer EFER_SCE)) move_msr_up(vmx, index, save_nmsrs++); } #endif @@ -1608,7 +1608,7 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer) * of this msr depends on is_long_mode(). */ vmx_load_host_state(to_vmx(vcpu)); - vcpu-arch.shadow_efer = efer; + vcpu-arch.efer = efer; if (!msr) return; if (efer EFER_LMA) { @@ -1640,13 +1640,13 @@ static void enter_lmode(struct kvm_vcpu *vcpu) (guest_tr_ar ~AR_TYPE_MASK) | AR_TYPE_BUSY_64_TSS); } - vcpu-arch.shadow_efer |= EFER_LMA; - vmx_set_efer(vcpu, vcpu-arch.shadow_efer); + vcpu-arch.efer |= EFER_LMA; + vmx_set_efer(vcpu, vcpu-arch.efer); } static void exit_lmode(struct kvm_vcpu *vcpu) { - vcpu-arch.shadow_efer = ~EFER_LMA; + vcpu-arch.efer = ~EFER_LMA;
[PATCH 8/8] KVM: trace guest fpu loads and unloads
Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/x86.c |2 ++ include/trace/events/kvm.h | 19 +++ 2 files changed, 21 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8b42c19..06a03c1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5304,6 +5304,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) vcpu-guest_fpu_loaded = 1; kvm_fx_save(vcpu-arch.host_fx_image); kvm_fx_restore(vcpu-arch.guest_fx_image); + trace_kvm_fpu(1); } void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) @@ -5316,6 +5317,7 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) kvm_fx_restore(vcpu-arch.host_fx_image); ++vcpu-stat.fpu_reload; set_bit(KVM_REQ_DEACTIVATE_FPU, vcpu-requests); + trace_kvm_fpu(0); } void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu) diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h index dbe1084..8abdc12 100644 --- a/include/trace/events/kvm.h +++ b/include/trace/events/kvm.h @@ -145,6 +145,25 @@ TRACE_EVENT(kvm_mmio, __entry-len, __entry-gpa, __entry-val) ); +#define kvm_fpu_load_symbol\ + {0, unload}, \ + {1, load} + +TRACE_EVENT(kvm_fpu, + TP_PROTO(int load), + TP_ARGS(load), + + TP_STRUCT__entry( + __field(u32,load) + ), + + TP_fast_assign( + __entry-load = load; + ), + + TP_printk(%s, __print_symbolic(__entry-load, kvm_fpu_load_symbol)) +); + #endif /* _TRACE_KVM_MAIN_H */ /* This part must be outside protection */ -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/8] KVM: Move cr0/cr4/efer related helpers to x86.h
They have more general scope than the mmu. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c |1 - arch/x86/kvm/mmu.c |1 + arch/x86/kvm/mmu.h | 24 arch/x86/kvm/x86.h | 24 4 files changed, 25 insertions(+), 25 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index e46f276..a2adec8 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -33,7 +33,6 @@ #include asm/kvm_emulate.h #include x86.h -#include mmu.h /* for is_long_mode() */ /* * Opcode effective-address decode tables. diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ff2b2e8..6f7158f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -18,6 +18,7 @@ */ #include mmu.h +#include x86.h #include kvm_cache_regs.h #include linux/kvm_host.h diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 599159f..61ef5a6 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -58,30 +58,6 @@ static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) return kvm_mmu_load(vcpu); } -static inline int is_long_mode(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_X86_64 - return vcpu-arch.shadow_efer EFER_LMA; -#else - return 0; -#endif -} - -static inline int is_pae(struct kvm_vcpu *vcpu) -{ - return kvm_read_cr4_bits(vcpu, X86_CR4_PAE); -} - -static inline int is_pse(struct kvm_vcpu *vcpu) -{ - return kvm_read_cr4_bits(vcpu, X86_CR4_PSE); -} - -static inline int is_paging(struct kvm_vcpu *vcpu) -{ - return kvm_read_cr0_bits(vcpu, X86_CR0_PG); -} - static inline int is_present_gpte(unsigned long pte) { return pte PT_PRESENT_MASK; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index f783d8f..2dc24a7 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -41,4 +41,28 @@ static inline bool is_protmode(struct kvm_vcpu *vcpu) return kvm_read_cr0_bits(vcpu, X86_CR0_PE); } +static inline int is_long_mode(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_X86_64 + return vcpu-arch.shadow_efer EFER_LMA; +#else + return 0; +#endif +} + +static inline int is_pae(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr4_bits(vcpu, X86_CR4_PAE); +} + +static inline int is_pse(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr4_bits(vcpu, X86_CR4_PSE); +} + +static inline int is_paging(struct kvm_vcpu *vcpu) +{ + return kvm_read_cr0_bits(vcpu, X86_CR0_PG); +} + #endif -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 7/8] KVM: Optimize kvm_read_cr[04]_bits()
'mask' is always a constant, so we can check whether it includes a bit that might be owned by the guest very cheaply, and avoid the decache call. Saves a few hundred bytes of module text. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/kvm_cache_regs.h |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 6b419a3..5a109c6 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -1,6 +1,9 @@ #ifndef ASM_KVM_CACHE_REGS_H #define ASM_KVM_CACHE_REGS_H +#define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS +#define KVM_POSSIBLE_CR4_GUEST_BITS X86_CR4_PGE + static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, enum kvm_reg reg) { @@ -40,7 +43,8 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index) static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask) { - if (mask vcpu-arch.cr0_guest_owned_bits) + ulong tmask = mask KVM_POSSIBLE_CR0_GUEST_BITS; + if (tmask vcpu-arch.cr0_guest_owned_bits) kvm_x86_ops-decache_cr0_guest_bits(vcpu); return vcpu-arch.cr0 mask; } @@ -52,7 +56,8 @@ static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu) static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask) { - if (mask vcpu-arch.cr4_guest_owned_bits) + ulong tmask = mask KVM_POSSIBLE_CR4_GUEST_BITS; + if (tmask vcpu-arch.cr4_guest_owned_bits) kvm_x86_ops-decache_cr4_guest_bits(vcpu); return vcpu-arch.cr4 mask; } -- 1.6.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How to debug Ubuntu 8.04 LTS guest crash during install?
Hello: I am using kvm on a CentOS 5.4 server. I am trying to install the TunkeyLinux Core appliance found here: http://www.turnkeylinux.org/core I downloaded the ISO file from the web site. Then, I used this command to intall it: virt-install -n tkl-core -r 512 --vcpus=1 --check-cpu --os-type=linux --os-variant=ubuntuhardy -v --accelerate -c /tmp/turnkey-core-2009.10-hardy-x86.iso -f /var/lib/libvirt/images/tkl-core.img -s 15 -b br0 --vnc noautoconsole When I connect to the VNC console, I get the Turnkey linux options screen. I select Install to hard disk from there and it seems to start the install but crashes during the installer startup. This is repeatable so there has to be a way to debug it. I tried turning on the debug option for virt-install but that did not give me any useful info. Any ideas how to debug this? Thanks, Neil -- Neil Aggarwal, (281)846-8957, http://UnmeteredVPS.net/cpanel cPanel/WHM preinstalled on a virtual server for only $40/month! No overage charges, 7 day free trial, PayPal, Google Checkout -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Luvalley-5 has been released (with whitepaper!): enables arbitrary OS to run VMs without any modification
Luvalley is a lightweight type-1 Virtual Machine Monitor (VMM). Its part of source codes are derived from KVM to virtualize CPU instructions and memory management unit (MMU). However, its overall architecture is completely different from KVM, but somewhat like Xen. Luvalley runs outside of Linux, just like Xen's architecture. Any operating system, including Linux, could be used as Luvalley's scheduler, memory manager, physical device driver provider and virtual IO device emulator. Currently, Luvalley supports Linux and Windows. That is to say, one may run Luvalley to boot a Linux or Windows, and then run multiple virtualized operating systems on such Linux or Windows. If you are interested in Luvalley project, you may download the source codes as well as the whitepaper from http://sourceforge.net/projects/luvalley/ The main changes of this release (Luvalley-5) are: * The code derived is updated from KVM-83 to KVM-88 * Supports both Intel and AMD CPUs * Automatically identify Intel and AMD CPUs This release (Luvalley-5) includes: * Luvalley whitepaper (the first edition) * Luvalley binary and source code tarball * Readme, changelog and release notes files -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
john cooper wrote: Chris Wright wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: To be honest all possible naming schemes for '-cpu name' are just as unfriendly as each other. The only user friendly option is '-cpu host'. IMHO, we should just pick a concise naming scheme document it. Given they are all equally unfriendly, the one that has consistency with vmware naming seems like a mild winner. Heh, I completely agree, and was just saying the same thing to John earlier today. May as well be -cpu {foo,bar,baz} since the meaning for those command line options must be well-documented in the man page. I can appreciate the concern of wanting to get this as correct as possible. But ultimately we just need three unique tags which ideally have some relation to their associated architectures. The diatribes available from /proc/cpuinfo while generally accurate don't really offer any more of a clue to the model group, and in their unmodified form are rather unwieldy as command line flags. I agree. I'd underline that this patch is for migration purposes only, so you don't want to specify an exact CPU, but more like a class of CPUs. If you look into the available CPUID features in each CPU, you will find that there are only a few groups, with currently three for each vendor being a good guess. /proc/cpuinfo just prints out marketing names, which have only a mild relationship to a feature-related technical CPU model. Maybe we can use a generation approach like the AMD Opteron ones for Intel, too. These G1/G2/G3 names are just arbitrary and have no roots within AMD. I think that an exact CPU model specification is out of scope for this patch and maybe even for QEMU. One could create a database with CPU names and associated CPUID flags and provide an external tool to generate a QEMU command line out of this. Keeping this database up-to-date (especially for desktop CPU models) is a burden that the QEMU project does not want to bear. This is from an EVC kb article[1]: Here is a pointer to a more detailed version: http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1003212 We probably should also add an option to dump out the full set of qemu-side cpuid flags for the benefit of users and upper level tools. You mean like this one? http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html Resending this patch set is on my plan for next week. What is the state of this patch? Will it go in soon? Then I'd rebase my patch set on top of it. Regards, Andre. -- Andre Przywara AMD-OSRC (Dresden) Tel: x29712 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
On 01/20/2010 07:18 PM, john cooper wrote: Chris Wright wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: To be honest all possible naming schemes for '-cpuname' are just as unfriendly as each other. The only user friendly option is '-cpu host'. IMHO, we should just pick a concise naming scheme document it. Given they are all equally unfriendly, the one that has consistency with vmware naming seems like a mild winner. Heh, I completely agree, and was just saying the same thing to John earlier today. May as well be -cpu {foo,bar,baz} since the meaning for those command line options must be well-documented in the man page. I can appreciate the concern of wanting to get this as correct as possible. This is the root of the trouble. At the qemu layer, we try to focus on being correct. Management tools are typically the layer that deals with being correct. A good compromise is making things user tunable which means that a downstream can make correctness decisions without forcing those decisions on upstream. In this case, the idea would be to introduce a new option, say something like -cpu-def. The syntax would be: -cpu-def name=coreduo,level=10,family=6,model=14,stepping=8,features=+vme+mtrr+clflush+mca+sse3+monitor,xlevel=0x8008,model_id=Genuine Intel(R) CPU T2600 @ 2.16GHz Which is not that exciting since it just lets you do -cpu coreduo in a much more complex way. However, if we take advantage of the current config support, you can have: [cpu-def] name=coreduo level=10 family=6 model=14 stepping=8 features=+vme+mtrr+clflush+mca+sse3.. model_id=Genuine Intel... And that can be stored in a config file. We should then parse /etc/qemu/target-targetname.conf by default. We'll move the current x86_defs table into this config file and then downstreams/users can define whatever compatibility classes they want. With this feature, I'd be inclined to take correct compatibility classes like Nehalem as part of the default qemurc that we install because it's easily overridden by a user. It then becomes just a suggestion on our part verses a guarantee. It should just be a matter of adding qemu_cpudefs_opts to qemu-config.[ch], taking a new command line that parses the argument via QemuOpts, then passing the parsed options to a target-specific function that then builds the table of supported cpus. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On 01/21/2010 01:02 AM, Avi Kivity wrote: You can also just emulate the state transition -- since you know you're dealing with a flat protected-mode or long-mode OS (and just make that a condition of enabling the feature) you don't have to deal with all the strange combinations of directions that an unrestricted x86 event can take. Since it's an exception, it is unconditional. Do you mean create the stack frame manually? I'd really like to avoid that for many reasons, one of which is performance (need to do all the virt-to-phys walks manually), the other is that we're certain to end up with something horribly underspecified. I'd really like to keep as close as possible to the hardware. For the alternative approach, see Xen. I obviously didn't mean to do something which didn't look like a hardware-delivered exception. That by itself provides a tight spec. The performance issue is real, of course. Obviously, the design of VT-x was before my time at Intel, so I'm not familiar with why the tradeoffs that were done they way they were. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce modified in a way that introduces some code duplication on the one hand, but reduces the risk of regressing existing eventfd users on the other hand. KVM needs a wait to atomically remove themselves from the eventfd -poll() wait queue head, in order to handle correctly their IRQfd deassign operation. This patch introduces such API, plus a way to read an eventfd from its context. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Avi, Davidel, how about only including the following part for -stable then? Reason is, I still would like to be able to use irqfd there, and getting spurious interrupts 100% of times unmask is done isn't a very good idea IMO ... fs/eventfd.c| 35 +++ include/linux/eventfd.h |9 + 2 files changed, 44 insertions(+), 0 deletions(-) diff --git a/fs/eventfd.c b/fs/eventfd.c index 8b47e42..ea9c18a 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -135,6 +135,41 @@ static unsigned int eventfd_poll(struct file *file, poll_table *wait) return events; } +static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt) +{ + *cnt = (ctx-flags EFD_SEMAPHORE) ? 1 : ctx-count; + ctx-count -= *cnt; +} + +/** + * eventfd_ctx_remove_wait_queue - Read the current counter and removes wait queue. + * @ctx: [in] Pointer to eventfd context. + * @wait: [in] Wait queue to be removed. + * @cnt: [out] Pointer to the 64bit conter value. + * + * Returns zero if successful, or the following error codes: + * + * -EAGAIN : The operation would have blocked. + * + * This is used to atomically remove a wait queue entry from the eventfd wait + * queue head, and read/reset the counter value. + */ +int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait, + __u64 *cnt) +{ + unsigned long flags; + + spin_lock_irqsave(ctx-wqh.lock, flags); + eventfd_ctx_do_read(ctx, cnt); + __remove_wait_queue(ctx-wqh, wait); + if (*cnt != 0 waitqueue_active(ctx-wqh)) + wake_up_locked_poll(ctx-wqh, POLLOUT); + spin_unlock_irqrestore(ctx-wqh.lock, flags); + + return *cnt != 0 ? 0 : -EAGAIN; +} +EXPORT_SYMBOL_GPL(eventfd_ctx_remove_wait_queue); + static ssize_t eventfd_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) { diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 94dd103..85eac48 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -10,6 +10,7 @@ #include linux/fcntl.h #include linux/file.h +#include linux/wait.h /* * CAREFUL: Check include/asm-generic/fcntl.h when defining @@ -34,6 +35,8 @@ struct file *eventfd_fget(int fd); struct eventfd_ctx *eventfd_ctx_fdget(int fd); struct eventfd_ctx *eventfd_ctx_fileget(struct file *file); int eventfd_signal(struct eventfd_ctx *ctx, int n); +int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_t *wait, + __u64 *cnt); #else /* CONFIG_EVENTFD */ @@ -61,6 +64,12 @@ static inline void eventfd_ctx_put(struct eventfd_ctx *ctx) } +static inline int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, + wait_queue_t *wait, __u64 *cnt) +{ + return -ENOSYS; +} + #endif #endif /* _LINUX_EVENTFD_H */ -- 1.6.6.144.g5c3af -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Anthony Liguori wrote: On 01/20/2010 07:18 PM, john cooper wrote: I can appreciate the concern of wanting to get this as correct as possible. This is the root of the trouble. At the qemu layer, we try to focus on being correct. Management tools are typically the layer that deals with being correct. A good compromise is making things user tunable which means that a downstream can make correctness decisions without forcing those decisions on upstream. Conceptually I agree with such a malleable approach -- actually I prefer it. I thought however it was too much infrastructure to foist on the problem just to add a few more models into the mix. The only reservation which comes to mind is that of logistics. This may ruffle the code some and impact others such as Andre who seem to have existing patches relative to the current structure. Anyone have strong objections to this approach before I have a look at an implementation? Thanks, -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, 21 Jan 2010, Michael S. Tsirkin wrote: This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce modified in a way that introduces some code duplication on the one hand, but reduces the risk of regressing existing eventfd users on the other hand. KVM needs a wait to atomically remove themselves from the eventfd -poll() wait queue head, in order to handle correctly their IRQfd deassign operation. This patch introduces such API, plus a way to read an eventfd from its context. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- Avi, Davidel, how about only including the following part for -stable then? Reason is, I still would like to be able to use irqfd there, and getting spurious interrupts 100% of times unmask is done isn't a very good idea IMO ... It's the same thing. Unless there are *real* problems in KVM due to the spurious ints, I still think this is .33 material. - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
On Thu, Jan 21, 2010 at 2:39 PM, Andre Przywara andre.przyw...@amd.com wrote: john cooper wrote: Chris Wright wrote: * Daniel P. Berrange (berra...@redhat.com) wrote: To be honest all possible naming schemes for '-cpu name' are just as unfriendly as each other. The only user friendly option is '-cpu host'. IMHO, we should just pick a concise naming scheme document it. Given they are all equally unfriendly, the one that has consistency with vmware naming seems like a mild winner. Heh, I completely agree, and was just saying the same thing to John earlier today. May as well be -cpu {foo,bar,baz} since the meaning for those command line options must be well-documented in the man page. I can appreciate the concern of wanting to get this as correct as possible. But ultimately we just need three unique tags which ideally have some relation to their associated architectures. The diatribes available from /proc/cpuinfo while generally accurate don't really offer any more of a clue to the model group, and in their unmodified form are rather unwieldy as command line flags. I agree. I'd underline that this patch is for migration purposes only, so you don't want to specify an exact CPU, but more like a class of CPUs. If you look into the available CPUID features in each CPU, you will find that there are only a few groups, with currently three for each vendor being a good guess. /proc/cpuinfo just prints out marketing names, which have only a mild relationship to a feature-related technical CPU model. Maybe we can use a generation approach like the AMD Opteron ones for Intel, too. These G1/G2/G3 names are just arbitrary and have no roots within AMD. I think that an exact CPU model specification is out of scope for this patch and maybe even for QEMU. One could create a database with CPU names and associated CPUID flags and provide an external tool to generate a QEMU command line out of this. Keeping this database up-to-date (especially for desktop CPU models) is a burden that the QEMU project does not want to bear. This is from an EVC kb article[1]: Here is a pointer to a more detailed version: http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1003212 We probably should also add an option to dump out the full set of qemu-side cpuid flags for the benefit of users and upper level tools. You mean like this one? http://lists.gnu.org/archive/html/qemu-devel/2009-09/msg01228.html Resending this patch set is on my plan for next week. What is the state of this patch? Will it go in soon? Then I'd rebase my patch set on top of it. FYI, a similar CPU flag mechanism has been implemented for Sparc and x86, unifying these would be cool. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 06:58 PM, Davide Libenzi wrote: On Thu, 21 Jan 2010, Michael S. Tsirkin wrote: This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce modified in a way that introduces some code duplication on the one hand, but reduces the risk of regressing existing eventfd users on the other hand. KVM needs a wait to atomically remove themselves from the eventfd -poll() wait queue head, in order to handle correctly their IRQfd deassign operation. This patch introduces such API, plus a way to read an eventfd from its context. Signed-off-by: Michael S. Tsirkinm...@redhat.com --- Avi, Davidel, how about only including the following part for -stable then? Reason is, I still would like to be able to use irqfd there, and getting spurious interrupts 100% of times unmask is done isn't a very good idea IMO ... It's the same thing. Unless there are *real* problems in KVM due to the spurious ints, I still think this is .33 material. I agree. But I think we can solve this in another way in .32: we can clear the eventfd from irqfd-inject work, which is in process context. The new stuff is only needed for lockless clearing, no? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 07:13 PM, Avi Kivity wrote: But I think we can solve this in another way in .32: we can clear the eventfd from irqfd-inject work, which is in process context. The new stuff is only needed for lockless clearing, no? I meant atomic clearing, when we inject interrupts from the irqfd atomic context. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, Jan 21, 2010 at 07:13:13PM +0200, Avi Kivity wrote: On 01/21/2010 06:58 PM, Davide Libenzi wrote: On Thu, 21 Jan 2010, Michael S. Tsirkin wrote: This is a backport of commit: 03db343a6320f780937078433fa7d8da955e6fce modified in a way that introduces some code duplication on the one hand, but reduces the risk of regressing existing eventfd users on the other hand. KVM needs a wait to atomically remove themselves from the eventfd -poll() wait queue head, in order to handle correctly their IRQfd deassign operation. This patch introduces such API, plus a way to read an eventfd from its context. Signed-off-by: Michael S. Tsirkinm...@redhat.com --- Avi, Davidel, how about only including the following part for -stable then? Reason is, I still would like to be able to use irqfd there, and getting spurious interrupts 100% of times unmask is done isn't a very good idea IMO ... It's the same thing. Unless there are *real* problems in KVM due to the spurious ints, I still think this is .33 material. I agree. But I think we can solve this in another way in .32: we can clear the eventfd from irqfd-inject work, which is in process context. The new stuff is only needed for lockless clearing, no? No, AFAIK there's no way to clear the counter from kernel without this patch. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote: I agree. But I think we can solve this in another way in .32: we can clear the eventfd from irqfd-inject work, which is in process context. The new stuff is only needed for lockless clearing, no? No, AFAIK there's no way to clear the counter from kernel without this patch. Can't you read from the file? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
repeatable hang with loop mount and heavy IO in guest
I've tried various guests, including most recent Fedora12 kernels, custom 2.6.32.x All of them hang around the same point (~1GB written) when I do heavy IO write inside the guest. I have waited 30 minutes to see if the guest would recover, but it just sits there, not writing back any data, not doing anything - but certainly not allowing any new IO writes. The host has some load on it, but nothing heavy enough to completely hand a guest for that long. mount -o loop some_image.fs ./somewhere bs=512 dd if=/dev/zero of=/somewhere/zero then after ~1GB: sync Host is running: 2.6.31.4 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88) Guests are booted with elevator=noop as the filesystems are stored as files, accessed as virtio disks. The hung backtraces always look similar to these: [ 361.460136] INFO: task loop0:2097 blocked for more than 120 seconds. [ 361.460139] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 361.460142] loop0 D 88000b92c848 0 2097 2 0x0080 [ 361.460148] 88000b92c5d0 0046 880008c1f810 880009829fd8 [ 361.460153] 880009829fd8 880009829fd8 88000a21ee80 88000b92c5d0 [ 361.460157] 880009829610 8181b768 880001af33b0 0002 [ 361.460161] Call Trace: [ 361.460216] [8105bf12] ? sync_page+0x0/0x43 [ 361.460253] [8151383e] ? io_schedule+0x2c/0x43 [ 361.460257] [8105bf50] ? sync_page+0x3e/0x43 [ 361.460261] [81513a2a] ? __wait_on_bit+0x41/0x71 [ 361.460264] [8105c092] ? wait_on_page_bit+0x6a/0x70 [ 361.460283] [810385a7] ? wake_bit_function+0x0/0x23 [ 361.460287] [81064975] ? shrink_page_list+0x3e5/0x61e [ 361.460291] [81513992] ? schedule_timeout+0xa3/0xbe [ 361.460305] [81038579] ? autoremove_wake_function+0x0/0x2e [ 361.460308] [8106538f] ? shrink_zone+0x7e1/0xaf6 [ 361.460310] [81061725] ? determine_dirtyable_memory+0xd/0x17 [ 361.460314] [810637da] ? isolate_pages_global+0xa3/0x216 [ 361.460316] [81062712] ? mark_page_accessed+0x2a/0x39 [ 361.460335] [810a61db] ? __find_get_block+0x13b/0x15c [ 361.460337] [81065ed4] ? try_to_free_pages+0x1ab/0x2c9 [ 361.460340] [81063737] ? isolate_pages_global+0x0/0x216 [ 361.460343] [81060baf] ? __alloc_pages_nodemask+0x394/0x564 [ 361.460350] [8108250c] ? __slab_alloc+0x137/0x44f [ 361.460371] [812cc4c1] ? radix_tree_preload+0x1f/0x6a [ 361.460374] [81082a08] ? kmem_cache_alloc+0x5d/0x88 [ 361.460376] [812cc4c1] ? radix_tree_preload+0x1f/0x6a [ 361.460379] [8105c0b5] ? add_to_page_cache_locked+0x1d/0xf1 [ 361.460381] [8105c1b0] ? add_to_page_cache_lru+0x27/0x57 [ 361.460384] [8105c25a] ? grab_cache_page_write_begin+0x7a/0xa0 [ 361.460399] [81104620] ? ext3_write_begin+0x7e/0x201 [ 361.460417] [8134648f] ? do_lo_send_aops+0xa1/0x174 [ 361.460420] [81081948] ? virt_to_head_page+0x9/0x2a [ 361.460422] [8134686b] ? loop_thread+0x309/0x48a [ 361.460425] [813463ee] ? do_lo_send_aops+0x0/0x174 [ 361.460427] [81038579] ? autoremove_wake_function+0x0/0x2e [ 361.460430] [81346562] ? loop_thread+0x0/0x48a [ 361.460432] [8103819b] ? kthread+0x78/0x80 [ 361.460441] [810238df] ? finish_task_switch+0x2b/0x78 [ 361.460454] [81002f6a] ? child_rip+0xa/0x20 [ 361.460460] [81012ac3] ? native_pax_close_kernel+0x0/0x32 [ 361.460463] [81038123] ? kthread+0x0/0x80 [ 361.460469] [81002f60] ? child_rip+0x0/0x20 [ 361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds. [ 361.460473] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 361.460474] kjournald D 88000b92e558 0 2098 2 0x0080 [ 361.460477] 88000b92e2e0 0046 88000aad9840 88000983ffd8 [ 361.460480] 88000983ffd8 88000983ffd8 81808e00 88000b92e2e0 [ 361.460483] 88000983fcf0 8181b768 880001af3c40 0002 [ 361.460486] Call Trace: [ 361.460488] [810a6b16] ? sync_buffer+0x0/0x3c [ 361.460491] [8151383e] ? io_schedule+0x2c/0x43 [ 361.460494] [810a6b4e] ? sync_buffer+0x38/0x3c [ 361.460496] [81513a2a] ? __wait_on_bit+0x41/0x71 [ 361.460499] [810a6b16] ? sync_buffer+0x0/0x3c [ 361.460501] [81513ac4] ? out_of_line_wait_on_bit+0x6a/0x76 [ 361.460504] [810385a7] ? wake_bit_function+0x0/0x23 [ 361.460514] [8113edad] ? journal_commit_transaction+0x769/0xbb8 [ 361.460517] [810238df] ? finish_task_switch+0x2b/0x78 [ 361.460519] [815137d9] ? thread_return+0x40/0x79 [ 361.460522] [8114162d] ? kjournald+0xc7/0x1cb [ 361.460525] [81038579] ? autoremove_wake_function+0x0/0x2e [
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, Jan 21, 2010 at 07:33:02PM +0200, Avi Kivity wrote: On 01/21/2010 07:23 PM, Michael S. Tsirkin wrote: I agree. But I think we can solve this in another way in .32: we can clear the eventfd from irqfd-inject work, which is in process context. The new stuff is only needed for lockless clearing, no? No, AFAIK there's no way to clear the counter from kernel without this patch. Can't you read from the file? IMO no, the read could block. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote: Can't you read from the file? IMO no, the read could block. But you're in process context. An eventfd never blocks. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, Jan 21, 2010 at 07:47:40PM +0200, Avi Kivity wrote: On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote: Can't you read from the file? IMO no, the read could block. But you're in process context. An eventfd never blocks. Yes it blocks if counter is 0. And we don't know it's not 0 unless we read :) catch-22. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, 21 Jan 2010, Avi Kivity wrote: On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote: Can't you read from the file? IMO no, the read could block. But you're in process context. An eventfd never blocks. Can you control the eventfd flags? Because if yes, O_NONBLOCK will never block. - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
john cooper wrote: kvm itself can modify flags exported from qemu to a guest. I would hope for an option to request that qemu doesn't run if the guest won't get the cpuid flags requested on the command line. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, Jan 21, 2010 at 09:50:34AM -0800, Davide Libenzi wrote: On Thu, 21 Jan 2010, Avi Kivity wrote: On 01/21/2010 07:32 PM, Michael S. Tsirkin wrote: Can't you read from the file? IMO no, the read could block. But you're in process context. An eventfd never blocks. Can you control the eventfd flags? Because if yes, O_NONBLOCK will never block. Userspace can but kvm can't. - Davide -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
john cooper wrote: I foresee wanting to iterate over the models and pick the latest one which a host supports - on the grounds that you have done the hard work of ensuring it is a reasonably good performer, while probably working on another host of similar capability when a new host is made available. That's a fairly close use case to that of safe migration which was one of the primary motivations to identify the models being discussed. Although presentation and administration of such was considered the domain of management tools. My hypothetical script which iterates over models in that way is a management tool, and would use qemu to help do its job. Do you mean that more powerful management tools to support safe migration will maintain _their own_ processor model tables, and perform their calculations using their own tables instead of querying qemu, and therefore not have any need of qemu's built in table? If so, I favour more strongly Anthony's suggestion that the processor model table lives in a config file (eventually), as that file could be shared between management tools and qemu itself without duplication. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote: But you're in process context. An eventfd never blocks. Yes it blocks if counter is 0. And we don't know it's not 0 unless we read :) catch-22. Ah yes, I forgot. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On 01/21/2010 07:56 PM, Avi Kivity wrote: On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote: But you're in process context. An eventfd never blocks. Yes it blocks if counter is 0. And we don't know it's not 0 unless we read :) catch-22. Ah yes, I forgot. Well, you can poll it and then read it... this introduces a new race (if userspace does a read in parallel) but it's limited to kvm and buggy userspace. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use compile_prog as rest of configure
On Wed, Jan 20, 2010 at 12:46:28PM +0100, Juan Quintela wrote: This substitution got missed somehow Signed-off-by: Juan Quintela quint...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Fix kvm_coalesced_mmio_ring duplicate allocation
On Thu, Jan 21, 2010 at 04:20:04PM +0800, Sheng Yang wrote: The commit 0953ca73 KVM: Simplify coalesced mmio initialization allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but didn't discard the original allocation... Signed-off-by: Sheng Yang sh...@linux.intel.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] qemu-kvm: Use kvm-kmod headers if available
On Tue, Jan 12, 2010 at 10:21:27PM +0100, Jan Kiszka wrote: Since kvm-kmod-2.6.32.2 we have an alternative source for recent KVM kernel headers. Use it when available and not overruled by --kerneldir. If there is no kvm-kmod and no --kerneldir, we continue to fall back to the qemu-kvm's kernel headers. Applied both, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling
On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote: v2: apply Avis suggestions about ARRAY_SIZE. kvm_handle_sie_intercept uses a jump table to get the intercept handler for a SIE intercept. Static code analysis revealed a potential problem: the intercept_funcs jump table was defined to contain (0x48 2) entries, but we only checked for code 0x48 which would cause an off-by-one array overflow if code == 0x48. Use the compiler and ARRAY_SIZE to automatically set the limits. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Applied and queued for .33, CC: stable, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH] Use macros for x86_emulate_ops to avoid future mistakes
On Wed, Jan 20, 2010 at 04:47:21PM +0900, Takuya Yoshikawa wrote: The return values from x86_emulate_ops are defined in kvm_emulate.h as macros X86EMUL_*. But in emulate.c, we are comparing the return values from these ops with 0 to check if they're X86EMUL_CONTINUE or not: X86EMUL_CONTINUE is defined as 0 now. To avoid possible mistakes in the future, this patch substitutes X86EMUL_CONTINUE for 0 that are being compared with the return values from x86_emulate_ops. We think that there are more places we should use these macros, but the meanings of rc values in x86_emulate_insn() were not so clear at a glance. If we use proper macros in this function, we would be able to follow the flow of each emulation more easily and, maybe, more securely. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH] pci passthrough: zap option rom scanning.
On Wed, Jan 20, 2010 at 11:58:48AM +0100, Gerd Hoffmann wrote: Nowdays (qemu 0.12) seabios loads option roms from pci rom bars. So there is no need any more to scan for option roms and have qemu load them. Zap the code. Signed-off-by: Gerd Hoffmann kra...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 1/3] eventfd: allow atomic read and waitqueue remove
On Thu, Jan 21, 2010 at 07:57:22PM +0200, Avi Kivity wrote: On 01/21/2010 07:56 PM, Avi Kivity wrote: On 01/21/2010 07:45 PM, Michael S. Tsirkin wrote: But you're in process context. An eventfd never blocks. Yes it blocks if counter is 0. And we don't know it's not 0 unless we read :) catch-22. Ah yes, I forgot. Well, you can poll it and then read it... this introduces a new race (if userspace does a read in parallel) but it's limited to kvm and buggy userspace. I would rather not require that userspace never reads this fd. You are right that it does not now, but adding this as requirement looks like exporting an implementation bug to userspace. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
john cooper wrote: I can appreciate the argument above, however the goal was choosing names with some basis in reality. These were recommended by our contacts within Intel, are used by VmWare to describe their similar cpu models, and arguably have fallen to defacto usage as evidenced by such sources as: http://en.wikipedia.org/wiki/Conroe_(microprocessor) http://en.wikipedia.org/wiki/Penryn_(microprocessor) http://en.wikipedia.org/wiki/Nehalem_(microarchitecture) (Aside: I can confirm they haven't fallen into de facto usage anywhere in my vicinity :-) I wonder if the contact within Intel are living in a bit of a bubble where these names are more familiar than the outside world.) I think we can all agree that there is no point looking for a familiar -cpu naming scheme because there aren't any familiar and meaningful names these days. used by VmWare to describe their similar cpu models If the same names are being used, I see some merit in qemu's list matching VMware's cpu models *exactly* (in capabilities, not id strings), to aid migration from VMware. Is that feasible? Do they match already? I suspect whatever we choose of reasonable length as a model tag for -cpu some further detail is going to be required. That was the motivation to augment the table as above with an instance of a LCD for that associated class. I'm not a typical user: I know quite a lot about x86 architecture; I just haven't kept up to date enough to know the code/model names. Typical users will know less about them. Understood. One thought I had to further clarify what is going on under the hood was to dump the cpuid flags for each model as part of (or in addition to) the above table. But this seems a bit extreme and kvm itself can modify flags exported from qemu to a guest. Here's another idea. It would be nice if qemu could tell the user which of the built-in -cpu choices is the most featureful subset of their own host. With -cpu host implemented, finding that is probably quite easy. Users with multiple hosts will get a better feel for what the -cpu names mean that way, probably better than any documentation would give them, because they probably have not much idea what CPU families they have anyway. (cat /proc/cpuinfo doesn't clarify, as I found). And it would give a simple, effective, quick indication of what they must choose if they want an VM image that runs on more than one of their hosts without a management tool. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: Do you mean that more powerful management tools to support safe migration will maintain _their own_ processor model tables, and perform their calculations using their own tables instead of querying qemu, and therefore not have any need of qemu's built in table? I would expect so. IIRC that is what the libvirt folks have in mind for example. But we're also trying to simplify the use case of the lonesome user at one with the qemu CLI. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
Jamie Lokier wrote: I think we can all agree that there is no point looking for a familiar -cpu naming scheme because there aren't any familiar and meaningful names these days. Even if we dismiss the Intel coined names as internal code names, there is still VMW's use of them in this space which we can either align with or attempt to displace. All considered I don't see any motivation nor gain in doing the latter. Anyway it doesn't appear likely we're going to resolve this to our collective satisfaction with a hard-wired naming scheme. It would be nice if qemu could tell the user which of the built-in -cpu choices is the most featureful subset of their own host. With -cpu host implemented, finding that is probably quite easy. This should be doable although it may not be as simple as traversing a hierarchy of features and picking one with the most host flags present. In any case this should be fairly detachable from settling the immediate issue. -john -- john.coo...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM Virtual CPU time profiling
Hi All, Is there a way in KVM to measure the real physical (CPU) time consumed by each running Virtual CPU? (I want to do time profiling of the virtual machines running on host system) Also, is there an explanation somewhere on how Virtual CPU scheduling is achieved in KVM? Thanks Abhishek -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Add definitions for current cpu models..
On 01/21/2010 10:43 AM, john cooper wrote: Anthony Liguori wrote: On 01/20/2010 07:18 PM, john cooper wrote: I can appreciate the concern of wanting to get this as correct as possible. This is the root of the trouble. At the qemu layer, we try to focus on being correct. Management tools are typically the layer that deals with being correct. A good compromise is making things user tunable which means that a downstream can make correctness decisions without forcing those decisions on upstream. Conceptually I agree with such a malleable approach -- actually I prefer it. I thought however it was too much infrastructure to foist on the problem just to add a few more models into the mix. See list for patches. I didn't do the cpu bits but it should be very obvious how to do that now. Regards, Anthony Liguori The only reservation which comes to mind is that of logistics. This may ruffle the code some and impact others such as Andre who seem to have existing patches relative to the current structure. Anyone have strong objections to this approach before I have a look at an implementation? Thanks, -john -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: repeatable hang with loop mount and heavy IO in guest
Some months ago I also thought elevator=noop should be a good idea. But it isn't. It works good as long as you only do short IO requests. Try using deadline in host and guest. Robert On 01/21/10 18:26, Antoine Martin wrote: I've tried various guests, including most recent Fedora12 kernels, custom 2.6.32.x All of them hang around the same point (~1GB written) when I do heavy IO write inside the guest. I have waited 30 minutes to see if the guest would recover, but it just sits there, not writing back any data, not doing anything - but certainly not allowing any new IO writes. The host has some load on it, but nothing heavy enough to completely hand a guest for that long. mount -o loop some_image.fs ./somewhere bs=512 dd if=/dev/zero of=/somewhere/zero then after ~1GB: sync Host is running: 2.6.31.4 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88) Guests are booted with elevator=noop as the filesystems are stored as files, accessed as virtio disks. The hung backtraces always look similar to these: [ 361.460136] INFO: task loop0:2097 blocked for more than 120 seconds. [ 361.460139] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 361.460142] loop0 D 88000b92c848 0 2097 2 0x0080 [ 361.460148] 88000b92c5d0 0046 880008c1f810 880009829fd8 [ 361.460153] 880009829fd8 880009829fd8 88000a21ee80 88000b92c5d0 [ 361.460157] 880009829610 8181b768 880001af33b0 0002 [ 361.460161] Call Trace: [ 361.460216] [8105bf12] ? sync_page+0x0/0x43 [ 361.460253] [8151383e] ? io_schedule+0x2c/0x43 [ 361.460257] [8105bf50] ? sync_page+0x3e/0x43 [ 361.460261] [81513a2a] ? __wait_on_bit+0x41/0x71 [ 361.460264] [8105c092] ? wait_on_page_bit+0x6a/0x70 [ 361.460283] [810385a7] ? wake_bit_function+0x0/0x23 [ 361.460287] [81064975] ? shrink_page_list+0x3e5/0x61e [ 361.460291] [81513992] ? schedule_timeout+0xa3/0xbe [ 361.460305] [81038579] ? autoremove_wake_function+0x0/0x2e [ 361.460308] [8106538f] ? shrink_zone+0x7e1/0xaf6 [ 361.460310] [81061725] ? determine_dirtyable_memory+0xd/0x17 [ 361.460314] [810637da] ? isolate_pages_global+0xa3/0x216 [ 361.460316] [81062712] ? mark_page_accessed+0x2a/0x39 [ 361.460335] [810a61db] ? __find_get_block+0x13b/0x15c [ 361.460337] [81065ed4] ? try_to_free_pages+0x1ab/0x2c9 [ 361.460340] [81063737] ? isolate_pages_global+0x0/0x216 [ 361.460343] [81060baf] ? __alloc_pages_nodemask+0x394/0x564 [ 361.460350] [8108250c] ? __slab_alloc+0x137/0x44f [ 361.460371] [812cc4c1] ? radix_tree_preload+0x1f/0x6a [ 361.460374] [81082a08] ? kmem_cache_alloc+0x5d/0x88 [ 361.460376] [812cc4c1] ? radix_tree_preload+0x1f/0x6a [ 361.460379] [8105c0b5] ? add_to_page_cache_locked+0x1d/0xf1 [ 361.460381] [8105c1b0] ? add_to_page_cache_lru+0x27/0x57 [ 361.460384] [8105c25a] ? grab_cache_page_write_begin+0x7a/0xa0 [ 361.460399] [81104620] ? ext3_write_begin+0x7e/0x201 [ 361.460417] [8134648f] ? do_lo_send_aops+0xa1/0x174 [ 361.460420] [81081948] ? virt_to_head_page+0x9/0x2a [ 361.460422] [8134686b] ? loop_thread+0x309/0x48a [ 361.460425] [813463ee] ? do_lo_send_aops+0x0/0x174 [ 361.460427] [81038579] ? autoremove_wake_function+0x0/0x2e [ 361.460430] [81346562] ? loop_thread+0x0/0x48a [ 361.460432] [8103819b] ? kthread+0x78/0x80 [ 361.460441] [810238df] ? finish_task_switch+0x2b/0x78 [ 361.460454] [81002f6a] ? child_rip+0xa/0x20 [ 361.460460] [81012ac3] ? native_pax_close_kernel+0x0/0x32 [ 361.460463] [81038123] ? kthread+0x0/0x80 [ 361.460469] [81002f60] ? child_rip+0x0/0x20 [ 361.460471] INFO: task kjournald:2098 blocked for more than 120 seconds. [ 361.460473] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 361.460474] kjournald D 88000b92e558 0 2098 2 0x0080 [ 361.460477] 88000b92e2e0 0046 88000aad9840 88000983ffd8 [ 361.460480] 88000983ffd8 88000983ffd8 81808e00 88000b92e2e0 [ 361.460483] 88000983fcf0 8181b768 880001af3c40 0002 [ 361.460486] Call Trace: [ 361.460488] [810a6b16] ? sync_buffer+0x0/0x3c [ 361.460491] [8151383e] ? io_schedule+0x2c/0x43 [ 361.460494] [810a6b4e] ? sync_buffer+0x38/0x3c [ 361.460496] [81513a2a] ? __wait_on_bit+0x41/0x71 [ 361.460499] [810a6b16] ? sync_buffer+0x0/0x3c [ 361.460501] [81513ac4] ? out_of_line_wait_on_bit+0x6a/0x76 [ 361.460504] [810385a7] ? wake_bit_function+0x0/0x23 [ 361.460514] [8113edad] ?
Re: repeatable hang with loop mount and heavy IO in guest
On Thursday 21 January 2010 21:08:38 RW wrote: Some months ago I also thought elevator=noop should be a good idea. But it isn't. It works good as long as you only do short IO requests. Try using deadline in host and guest. Robert @Robert: I've been using noop on all of my KVMs and didn't have any problems so far, never had any crash too. Do you have any performance data or comparisons between noop and deadline io schedulers? Cheers, Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: repeatable hang with loop mount and heavy IO in guest
No sorry, I haven't any performance data with noop. I even don't have had a crash. BUT I've experienced serve I/O degradation with noop. Once I've written a big chunk of data (e.g. a simple rsync -av /usr /opt) with noop it works for a while and after a few seconds I saw heavy writes which made the VM virtually unusable. As far as I remember it was kjournald which cases the writes. I've written a mail to the list some months ago with some benchmarks: http://article.gmane.org/gmane.comp.emulators.kvm.devel/41112/match=benchmark There're some I/O benchmarks in there. You can't get the graphs currently since tauceti.net is offline until monday. I haven't tested noop in these benchmarks because of the problems mentioned above. But it compares deadline and cfq a little bit on a HP DL 380 G6 server. Robert On 01/21/10 22:08, Thomas Beinicke wrote: On Thursday 21 January 2010 21:08:38 RW wrote: Some months ago I also thought elevator=noop should be a good idea. But it isn't. It works good as long as you only do short IO requests. Try using deadline in host and guest. Robert @Robert: I've been using noop on all of my KVMs and didn't have any problems so far, never had any crash too. Do you have any performance data or comparisons between noop and deadline io schedulers? Cheers, Thomas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] kvm-s390: fix potential array overrun in intercept handling
On 21.01.2010, at 18:36, Marcelo Tosatti wrote: On Thu, Jan 21, 2010 at 12:19:07PM +0100, Christian Borntraeger wrote: v2: apply Avis suggestions about ARRAY_SIZE. kvm_handle_sie_intercept uses a jump table to get the intercept handler for a SIE intercept. Static code analysis revealed a potential problem: the intercept_funcs jump table was defined to contain (0x48 2) entries, but we only checked for code 0x48 which would cause an off-by-one array overflow if code == 0x48. Use the compiler and ARRAY_SIZE to automatically set the limits. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Applied and queued for .33, CC: stable, thanks. Yes. Christian, please get this into 2.6.32-stable. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Some keys don't repeat in 64 bit Widows 7 kvm guest
I am now running qemu-kvm 0.11.1: $ kvm -h | head -1 QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 Fabrice Bellard My Windows 7 guest detected a lot of new hardware, but I still have the same key repeating problem. I think I will just leave this alone for now since I am going to be away from my office (and this machine) for several weeks. When I return, I plan on doing a clean install of everything. If I still have this issue, I will report back. Thanks to everyone for your help. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Jimmy Crossley Sent: Saturday, January 16, 2010 21:33 To: 'Jim Paris' Cc: 'Gleb Natapov'; kvm@vger.kernel.org Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris Sent: Saturday, January 16, 2010 20:40 To: Jimmy Crossley Cc: 'Gleb Natapov'; kvm@vger.kernel.org Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest Jimmy Crossley wrote: Thanks for the quick response, Gleb. You are right - we should not spend our time troubleshooting an issue with something this old. I'll try downloading all the sources and headers I need to build kvm-88. I think I'll need another Debian install, since this is a production machine and I don't want to destabilize it. Go ahead and laugh - I ran Debian stable for years before finally deciding I could risk running testing. Debian testing still has the kvm package at version 72, but the new package name qemu-kvm is at version 0.11.0 which is quite a bit newer. -jim It looks like I need to switch to qemu-kvm. That kvm package that I have Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any more. It sure is hard to keep up with everything. Thanks, Jim. Jimmy Crossley CoNetrix 5214 68th Street Suite 200 Lubbock TX 79424 jcross...@conetrix.com http://www.conetrix.com tel: 806-687-8600 800-356-6568 fax: 806-687-8511 This e-mail message (and attachments) may contain confidential CoNetrix information. If you are not the intended recipient, you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return e-mail immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to official business are neither given nor endorsed by CoNetrix. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm-0.12.2 hangs when booting grub, when kvm is disabled
Hi, With this small disk image: http://psy.jim.sh/~jim/tmp/diskimage.gz and the new qemu-kvm-0.12.2: $ kvm --version QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 2003-2008 Fabrice Bellard I can successfully boot to a grub prompt with: $ kvm -drive file=diskimage,boot=on However, if kvm gets disabled: $ kvm -no-kvm -drive file=diskimage,boot=on then the boot hangs at GRUB Loading, please wait... and consumes 100% CPU. -jim -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] Debug register emulation fixes and optimizations (reloaded)
On Wed, Jan 20, 2010 at 06:20:20PM +0100, Jan Kiszka wrote: Major parts of this series were already posted a while ago during the debug register switch optimizations. This version now comes with an additional fix for VMX (patch 1) and a rework of mov dr emulation for SVM. Find this series also at git://git.kiszka.org/linux-kvm.git queues/debugregs Jan Kiszka (5): KVM: VMX: Fix exceptions of mov to dr KVM: VMX: Fix emulation of DR4 and DR5 KVM: VMX: Clean up DR6 emulation KVM: SVM: Clean up and enhance mov dr emulation KVM: SVM: Trap all debug register accesses arch/x86/include/asm/kvm_host.h |5 +- arch/x86/kvm/svm.c | 78 +-- arch/x86/kvm/vmx.c | 67 +++-- arch/x86/kvm/x86.c | 19 + 4 files changed, 84 insertions(+), 85 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: vcpu hotplug support
Avi Kivity wrote: On 01/21/2010 01:54 PM, Liu, Jinsong wrote: Avi, I just send 2 patches for KVM vcpu hotplug support. 1 is seabios patch: Setup vcpu add/remove infrastructure, including madt bios_info and dsdt 2 is qemu-kvm patch: Debug vcpu add The patches look reasonable (of course I'd like to see Gleb review it), but please send the seabios patch to the seabios mailing list (seab...@seabios.org) so we don't have to diverge. Thanks for remind! I have sent to seabios. Jinsong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.
Gleb Natapov wrote: On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote: From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:18:46 +0800 Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt. 1. setup madt bios_info structure, so that static dsdt get run-time madt info like checksum address, lapic address, max cpu numbers, with least hardcode magic number (realmode address of bios_info). 2. setup vcpu add/remove dsdt infrastructure, including processor related acpi objects and control methods. vcpu add/remove will trigger SCI and then control method _L02. By matching madt, vcpu number and add/remove action were found, then by notify control method, it will notify OS acpi driver. Signed-off-by: Liu, Jinsong jinsong@intel.com It looks like AML code is a port of what we had in BOCHS bios with minor changes. Can you detail what is changed and why for easy review please? And this still doesn't work with Windows I assume. Yes, my work is based on BOCHS infrastructure, thanks BOCHS :) I just change some minor points: 1. explicitly define returen value of '_MAT' as 'buffer', otherwise some linux acpi driver (i.e. linux 2.6.30) would parse error which will handle it as 'integer' not 'buffer'; 2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will report 'checksum error' when using acpi tools to get madt info if we add/remove vcpu; 3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, which is need for vcpu remove; 4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, it will result error; 5. use 1 hardcode address bios_info structure to replace '0x514', so that it can transfer more madt info to dsdt; Thanks, Jinsong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm: Flush coalesced MMIO buffer periodly
The default action of coalesced MMIO is, cache the writing in buffer, until: 1. The buffer is full. 2. Or the exit to QEmu due to other reasons. But this would result in a very late writing in some condition. 1. The each time write to MMIO content is small. 2. The writing interval is big. 3. No need for input or accessing other devices frequently. This issue was observed in a experimental embbed system. The test image simply print test every 1 seconds. The output in QEmu meets expectation, but the output in KVM is delayed for seconds. Per Avi's suggestion, I hooked a flushing for coalesced MMIO buffer in VGA update handler. By this way, We don't need vcpu explicit exit to QEmu to handle this issue. Signed-off-by: Sheng Yang sh...@linux.intel.com --- Like this? qemu-kvm.c | 26 -- qemu-kvm.h |6 ++ vl.c |2 ++ 3 files changed, 32 insertions(+), 2 deletions(-) diff --git a/qemu-kvm.c b/qemu-kvm.c index 599c3d6..a9b5107 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -463,6 +463,12 @@ static void kvm_create_vcpu(CPUState *env, int id) goto err_fd; } +#ifdef KVM_CAP_COALESCED_MMIO +if (kvm_state-coalesced_mmio !kvm_state-coalesced_mmio_ring) +kvm_state-coalesced_mmio_ring = (void *) env-kvm_run + + kvm_state-coalesced_mmio * PAGE_SIZE; +#endif + return; err_fd: close(env-kvm_fd); @@ -927,8 +933,7 @@ int kvm_run(CPUState *env) #if defined(KVM_CAP_COALESCED_MMIO) if (kvm_state-coalesced_mmio) { -struct kvm_coalesced_mmio_ring *ring = -(void *) run + kvm_state-coalesced_mmio * PAGE_SIZE; +struct kvm_coalesced_mmio_ring *ring = kvm_state-coalesced_mmio_ring; while (ring-first != ring-last) { cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr, ring-coalesced_mmio[ring-first].data[0], @@ -2073,6 +2078,23 @@ static void io_thread_wakeup(void *opaque) } } +#ifdef KVM_CAP_COALESCED_MMIO +void kvm_flush_coalesced_mmio_buffer(void) +{ +if (kvm_state-coalesced_mmio_ring) { +struct kvm_coalesced_mmio_ring *ring = +kvm_state-coalesced_mmio_ring; +while (ring-first != ring-last) { +cpu_physical_memory_rw(ring-coalesced_mmio[ring-first].phys_addr, + ring-coalesced_mmio[ring-first].data[0], + ring-coalesced_mmio[ring-first].len, 1); +smp_wmb(); +ring-first = (ring-first + 1) % KVM_COALESCED_MMIO_MAX; +} +} +} +#endif + int kvm_main_loop(void) { int fds[2]; diff --git a/qemu-kvm.h b/qemu-kvm.h index 6b3e5a1..8188ff6 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -1125,6 +1125,11 @@ static inline int kvm_set_migration_log(int enable) return kvm_physical_memory_set_dirty_tracking(enable); } +#ifdef KVM_CAP_COALESCED_MMIO +void kvm_flush_coalesced_mmio_buffer(void); +#else +void kvm_flush_coalesced_mmio_buffer(void) {} +#endif int kvm_irqchip_in_kernel(void); #ifdef CONFIG_KVM @@ -1144,6 +1149,7 @@ typedef struct KVMState { int fd; int vmfd; int coalesced_mmio; +struct kvm_coalesced_mmio_ring *coalesced_mmio_ring; int broken_set_mem_region; int migration_log; int vcpu_events; diff --git a/vl.c b/vl.c index 9edea10..64902f2 100644 --- a/vl.c +++ b/vl.c @@ -3235,6 +3235,7 @@ static void gui_update(void *opaque) interval = dcl-gui_timer_interval; dcl = dcl-next; } +kvm_flush_coalesced_mmio_buffer(); qemu_mod_timer(ds-gui_timer, interval + qemu_get_clock(rt_clock)); } @@ -3242,6 +3243,7 @@ static void nographic_update(void *opaque) { uint64_t interval = GUI_REFRESH_INTERVAL; +kvm_flush_coalesced_mmio_buffer(); qemu_mod_timer(nographic_timer, interval + qemu_get_clock(rt_clock)); } -- 1.5.4.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM Virtual CPU time profiling
On Friday 22 January 2010 02:41:35 Saksena, Abhishek wrote: Hi All, Is there a way in KVM to measure the real physical (CPU) time consumed by each running Virtual CPU? (I want to do time profiling of the virtual machines running on host system) Also, is there an explanation somewhere on how Virtual CPU scheduling is achieved in KVM? Thanks Each VM is a QEmu process, and each vcpu is a thread of it(but not all the threads are vcpus). Currently the KVM related scheduler algorithm is the same as other host threads/processes. You can get thread_id for each vcpu in QEmu monitor, by: (qemu) info cpus Then, you can do anything you want with it, e.g. using top to got each thread/vcpu's CPU time. :) -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unable to single-step in kvm, always results in a resume
So now I can step instruction but my breakpoints do not work. I have verified that disabling kvm restores the breakpoint functionality. Any suggestions? Thanks, Nicholas Jan Kiszka wrote: Hi Nicholas, please don't drop CCs on reply. Nicholas Amon wrote: Hi Jan, Thanks for responding. Yes, I am able to step instruction when I disable kvm w/ the no-kvm option. My host kernel is 64bit 2.6.27 and the program that I am debugging is 32 bit but starts in real mode. But the KVM module I am running is from kvm-88. Is there anyway I can check the version definitively? kvm modules issue a message when being loaded, check your kernel log. qemu-kvm gives you the version via -version. OK, the problems you see is likely related to the very old versions you use. Update to recent kvm-kmod (2.6.32 series) and qemu-kvm (0.12 series) and retry. Jan Thanks, Nicholas Jan Kiszka wrote: Jan Kiszka wrote: Nicholas Amon wrote: Hi All, I am trying to single-step through my kernel using qemu and kvm. I have run qemu via: qemu-system-x86_64 -s -S -hda /home/nickamon/lab1/obj/kernel.img and also connected to the process using gdb. Problem is that whenever I try and step instruction, it seems to resume my kernel rather than allowing me to progress instruction by instruction. I have built the kvm snapshot from git and still no luck. Tried following the code for a few hours and have no luck. Any suggestions? What's you host kernel or kvm-kmod version? ...and does -no-kvm make any difference (except that it's much slower)? Jan -- Nicholas Amon Senior Software Engineer Xceedium Inc. Office: 201-536-1000 x127 Cell: 732-236-7698 na...@xceedium.com See How to Control Track High-Risk Users: Join our Webinar on Tuesday, June 2 Network World Names Xceedium GateKeeper RSA 2009 Best of Show -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PCI Passthrough Problem
I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu 9.10), and I'm getting this error : LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid 76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on -drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1 -net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0 -net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice host=0a:01.0 char device redirected to /dev/pts/0 get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied init_assigned_device: Error: Couldn't get real device (0a:01.0)! Failed to initialize assigned device host=0a:01.0 Any thoughts? -- Aaron Clausen mightymartia...@gmail.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt.
On Fri, Jan 22, 2010 at 10:15:44AM +0800, Liu, Jinsong wrote: Gleb Natapov wrote: On Thu, Jan 21, 2010 at 07:48:23PM +0800, Liu, Jinsong wrote: From cb997030cba02e7e74a29b3d942aeba9808ed293 Mon Sep 17 00:00:00 2001 From: Liu, Jinsong jinsong@intel.com Date: Fri, 22 Jan 2010 03:18:46 +0800 Subject: [PATCH] Setup vcpu add/remove infrastructure, including madt bios_info and dsdt. 1. setup madt bios_info structure, so that static dsdt get run-time madt info like checksum address, lapic address, max cpu numbers, with least hardcode magic number (realmode address of bios_info). 2. setup vcpu add/remove dsdt infrastructure, including processor related acpi objects and control methods. vcpu add/remove will trigger SCI and then control method _L02. By matching madt, vcpu number and add/remove action were found, then by notify control method, it will notify OS acpi driver. Signed-off-by: Liu, Jinsong jinsong@intel.com It looks like AML code is a port of what we had in BOCHS bios with minor changes. Can you detail what is changed and why for easy review please? And this still doesn't work with Windows I assume. Yes, my work is based on BOCHS infrastructure, thanks BOCHS :) I just change some minor points: 1. explicitly define returen value of '_MAT' as 'buffer', otherwise some linux acpi driver (i.e. linux 2.6.30) would parse error which will handle it as 'integer' not 'buffer'; 2. keep correct 'checksum' of madt when vcpu add/remove, otherwise it will report 'checksum error' when using acpi tools to get madt info if we add/remove vcpu; 3. add '_EJ0' so that linux has acpi obj under /sys/devices/LNXSYSTM:00, which is need for vcpu remove; 4. on Method(PRSC, 0), just scan 'xxx' vcpus that qemu get from cmdline para 'maxcpus=xxx', not all 256 vcpus, otherwise under some dsdt processor define, it will result error; What kind of errors? Qemu should never set bit over maxcpus in PRS. 5. use 1 hardcode address bios_info structure to replace '0x514', so that it can transfer more madt info to dsdt; Thanks, Jinsong -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PCI Passthrough Problem
On Thu, Jan 21, 2010 at 09:24:36PM -0800, Aaron Clausen wrote: I'm trying once again to get PCI passthrough working (KVM 84 on Ubuntu 9.10), and I'm getting this error : LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin /usr/bin/kvm -S -M pc-0.11 -m 4096 -smp 4 -name mailserver -uuid 76a83471-e94a-3658-fa61-8eceaa74ffc2 -monitor unix:/var/run/libvirt/qemu/mailserver.monitor,server,nowait -localtime -boot c -drive file=,if=ide,media=cdrom,index=2 -drive file=/var/lib/libvirt/images/mailserver.img,if=virtio,index=0,boot=on -drive file=/var/lib/libvirt/images/mailserver-2.img,if=virtio,index=1 -net nic,macaddr=54:52:00:1b:b2:56,vlan=0,model=virtio,name=virtio.0 -net tap,fd=17,vlan=0,name=tap.0 -serial pty -parallel none -usb -usbdevice tablet -vnc 127.0.0.1:0 -k en-us -vga cirrus -pcidevice host=0a:01.0 char device redirected to /dev/pts/0 get_real_device: /sys/bus/pci/devices/:0a:01.0/config: Permission denied init_assigned_device: Error: Couldn't get real device (0a:01.0)! Failed to initialize assigned device host=0a:01.0 Seems libvirt initialize the PCI devices problem, you could manually unbind this device from host kernel driver and try above command again. For unbind this device please refer to : http://www.linux-kvm.org/page/How_to_assign_devices_with_VT-d_in_KVM Any thoughts? -- Aaron Clausen mightymartia...@gmail.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86: Fix probable memory leak of vcpu-arch.mce_banks
vcpu-arch.mce_banks is malloc in kvm_arch_vcpu_init(), but never free in any place, this may cause memory leak. So this patch fixed to free it in kvm_arch_vcpu_uninit(). Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/x86.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f25b52e..1ddcad4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5089,6 +5089,7 @@ fail: void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { + kfree(vcpu-arch.mce_banks); kvm_free_lapic(vcpu); down_read(vcpu-kvm-slots_lock); kvm_mmu_destroy(vcpu); -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()
In function kvm_arch_vcpu_init(), if the memory malloc for vcpu-arch.mce_banks is fail, it does not free the memory of lapic date. This patch fixed it. Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/x86.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6651dbf..f25b52e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5072,12 +5072,13 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) GFP_KERNEL); if (!vcpu-arch.mce_banks) { r = -ENOMEM; - goto fail_mmu_destroy; + goto fail_free_lapic; } vcpu-arch.mcg_cap = KVM_MAX_MCE_BANKS; return 0; - +fail_free_lapic: + kvm_free_lapic(vcpu); fail_mmu_destroy: kvm_mmu_destroy(vcpu); fail_free_pio_data: -- 1.6.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2 v2] KVM: x86: Fix probable memory leak of vcpu-arch.mce_banks
vcpu-arch.mce_banks is malloc in kvm_arch_vcpu_init(), but never free in any place, this may cause memory leak. So this patch fixed to free it in kvm_arch_vcpu_uninit(). Signed-off-by: Wei Yongjun yj...@cn.fujitsu.com --- arch/x86/kvm/x86.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 56a90a6..c27ebb1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5470,6 +5470,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { int idx; + kfree(vcpu-arch.mce_banks); kvm_free_lapic(vcpu); idx = srcu_read_lock(vcpu-kvm-srcu); kvm_mmu_destroy(vcpu); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Some keys don't repeat in 64 bit Widows 7 kvm guest
On Thu, Jan 21, 2010 at 05:35:08PM -0600, Jimmy Crossley wrote: I am now running qemu-kvm 0.11.1: $ kvm -h | head -1 QEMU PC emulator version 0.11.1 (qemu-kvm-0.11.1), Copyright (c) 2003-2008 Fabrice Bellard My Windows 7 guest detected a lot of new hardware, but I still have the same key repeating problem. I think I will just leave this alone for now since I am going to be away from my office (and this machine) for several weeks. When I return, I plan on doing a clean install of everything. If I still have this issue, I will report back. qemu-kvm-0.11.1 is still pretty old. The latest version is qemu-kvm-0.12 and you need to update you kernel modules too. Similarly sounding problem was fixed by kernel changes a while ago. Thanks to everyone for your help. -Original Message- From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of Jimmy Crossley Sent: Saturday, January 16, 2010 21:33 To: 'Jim Paris' Cc: 'Gleb Natapov'; kvm@vger.kernel.org Subject: RE: Some keys don't repeat in 64 bit Widows 7 kvm guest From: j...@jim.sh [mailto:j...@jim.sh] On Behalf Of Jim Paris Sent: Saturday, January 16, 2010 20:40 To: Jimmy Crossley Cc: 'Gleb Natapov'; kvm@vger.kernel.org Subject: Re: Some keys don't repeat in 64 bit Widows 7 kvm guest Jimmy Crossley wrote: Thanks for the quick response, Gleb. You are right - we should not spend our time troubleshooting an issue with something this old. I'll try downloading all the sources and headers I need to build kvm-88. I think I'll need another Debian install, since this is a production machine and I don't want to destabilize it. Go ahead and laugh - I ran Debian stable for years before finally deciding I could risk running testing. Debian testing still has the kvm package at version 72, but the new package name qemu-kvm is at version 0.11.0 which is quite a bit newer. -jim It looks like I need to switch to qemu-kvm. That kvm package that I have Installed (72+dfsg=5+squeeze1) is not in the squeeze repositories any more. It sure is hard to keep up with everything. Thanks, Jim. Jimmy Crossley CoNetrix 5214 68th Street Suite 200 Lubbock TX 79424 jcross...@conetrix.com http://www.conetrix.com tel: 806-687-8600 800-356-6568 fax: 806-687-8511 This e-mail message (and attachments) may contain confidential CoNetrix information. If you are not the intended recipient, you cannot use, distribute or copy the message or attachments. In such a case, please notify the sender by return e-mail immediately and erase all copies of the message and attachments. Opinions, conclusions and other information in this message and attachments that do not relate to official business are neither given nor endorsed by CoNetrix. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 04/12] Add handle page fault PV helper.
On Thu, Jan 21, 2010 at 07:47:22AM -0800, H. Peter Anvin wrote: On 01/21/2010 01:02 AM, Avi Kivity wrote: You can also just emulate the state transition -- since you know you're dealing with a flat protected-mode or long-mode OS (and just make that a condition of enabling the feature) you don't have to deal with all the strange combinations of directions that an unrestricted x86 event can take. Since it's an exception, it is unconditional. Do you mean create the stack frame manually? I'd really like to avoid that for many reasons, one of which is performance (need to do all the virt-to-phys walks manually), the other is that we're certain to end up with something horribly underspecified. I'd really like to keep as close as possible to the hardware. For the alternative approach, see Xen. I obviously didn't mean to do something which didn't look like a hardware-delivered exception. That by itself provides a tight spec. The performance issue is real, of course. Obviously, the design of VT-x was before my time at Intel, so I'm not familiar with why the tradeoffs that were done they way they were. Is it so out of question to reserver exception below 32 for PV use? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: repeatable hang with loop mount and heavy IO in guest
Antoine Martin wrote: I've tried various guests, including most recent Fedora12 kernels, custom 2.6.32.x All of them hang around the same point (~1GB written) when I do heavy IO write inside the guest. [] Host is running: 2.6.31.4 QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88) Please update to last version and repeat. kvm-88 is ancient and _lots_ of stuff fixed and changed since that time, I doubt anyone here will try to dig into kvm-88 problems. Current kvm is qemu-kvm-0.12.2, released yesterday. /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
-Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard Sent: Saturday, January 09, 2010 3:30 AM To: Alexander Graf Cc: k...@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 338baf9..e283e44 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + /* BookE does flags in ESR, so ignore those we get here */ kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } Actually, I think Book E prematurely sets ESR, since it's done before the program interrupt is actually delivered. Architecturally, I'm not sure if it's a problem, but philosophically I've always wanted it to work the way you've just implemented for Book S. ESR is updated not only by program but by data_tlb, data_storage, etc. Should we rearrange them all? Also DEAR has the same situation as ESR. Should it be updated when we decide to inject interrupt to guest? -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] kvmppc/e500: fix tlbcfg emulation
On 21.01.2010, at 04:22, Liu Yu-B13201 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, January 20, 2010 6:47 PM To: Liu Yu-B13201 Cc: kvm-ppc@vger.kernel.org; a...@redhat.com; hol...@penguinppc.org Subject: Re: [PATCH 3/3] kvmppc/e500: fix tlbcfg emulation Importance: High On 20.01.2010, at 09:03, Liu Yu wrote: Signed-off-by: Liu Yu yu@freescale.com --- arch/powerpc/kvm/e500_emulate.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c index 95f8ec8..97337dd 100644 --- a/arch/powerpc/kvm/e500_emulate.c +++ b/arch/powerpc/kvm/e500_emulate.c @@ -165,7 +165,7 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt) case SPRN_TLB0CFG: { - ulong tmp = SPRN_TLB0CFG; + ulong tmp = mfspr(SPRN_TLB0CFG); Does this SPR value change? I hope not :-). If not, better read it once on init and then use it from there. Out of curiousity. Does read it once in order to get better performance? If yes, I think read from register is faster than read from mem. Well, performance and clean structure. Nothing should keep us from having different parameters in the guest than we have in the host. Also, as soon as nesting comes into play, reads from memory are definitely faster. But if you think it's not worth the effort, keep it as it is. Alex-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvmppc/e500: Add PVR/PIR init for E500
On 21.01.2010, at 04:30, Liu Yu-B13201 wrote: -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Wednesday, January 20, 2010 6:45 PM To: Liu Yu-B13201 Cc: kvm-ppc@vger.kernel.org; a...@redhat.com; hol...@penguinppc.org Subject: Re: [PATCH 2/3] kvmppc/e500: Add PVR/PIR init for E500 Importance: High On 20.01.2010, at 09:03, Liu Yu wrote: Signed-off-by: Liu Yu yu@freescale.com --- arch/powerpc/kvm/e500.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c index 64949ee..fd3683d 100644 --- a/arch/powerpc/kvm/e500.c +++ b/arch/powerpc/kvm/e500.c @@ -60,6 +60,10 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu) kvmppc_e500_tlb_setup(vcpu_e500); + /* Registers init */ + vcpu-arch.pvr = mfspr(SPRN_PVR); + vcpu-vcpu_id = mfspr(SPRN_PIR); Is this correct? IIUC this should be the number of the vcpu. So if you virtualize a 2-core system, but both vcpu init functions run on core 1, this will break, right? Since kvm booke doesn't support more than 1 core virtualization. Can we put a comment here for now? Sure. I'll need to do something clever about it on Book3S as well anyways. Also, do you really need to set vcpu_id? If you just don't touch it it'll be 0. Shouldn't that be enough if you're only running a single guest core? Alex-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly
On 21.01.2010, at 09:09, Liu Yu-B13201 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard Sent: Saturday, January 09, 2010 3:30 AM To: Alexander Graf Cc: k...@vger.kernel.org; kvm-ppc; Benjamin Herrenschmidt; Liu Yu Subject: Re: [PATCH 7/9] KVM: PPC: Emulate trap SRR1 flags properly diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 338baf9..e283e44 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -82,8 +82,9 @@ static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu, set_bit(priority, vcpu-arch.pending_exceptions); } -void kvmppc_core_queue_program(struct kvm_vcpu *vcpu) +void kvmppc_core_queue_program(struct kvm_vcpu *vcpu, ulong flags) { + /* BookE does flags in ESR, so ignore those we get here */ kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_PROGRAM); } Actually, I think Book E prematurely sets ESR, since it's done before the program interrupt is actually delivered. Architecturally, I'm not sure if it's a problem, but philosophically I've always wanted it to work the way you've just implemented for Book S. ESR is updated not only by program but by data_tlb, data_storage, etc. Should we rearrange them all? Also DEAR has the same situation as ESR. Should it be updated when we decide to inject interrupt to guest? If that's what the hardware does, then yes. I'm good with taking small steps though. So if you don't have the time to convert all of the handlers, you can easily start off with program interrupts. Alex-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html