Re: [PATCH 6/6] vhost_net: remove the max pending check
On 08/25/2013 07:53 PM, Michael S. Tsirkin wrote: On Fri, Aug 23, 2013 at 04:55:49PM +0800, Jason Wang wrote: On 08/20/2013 10:48 AM, Jason Wang wrote: On 08/16/2013 06:02 PM, Michael S. Tsirkin wrote: On Fri, Aug 16, 2013 at 01:16:30PM +0800, Jason Wang wrote: We used to limit the max pending DMAs to prevent guest from pinning too many pages. But this could be removed since: - We have the sk_wmem_alloc check in both tun/macvtap to do the same work - This max pending check were almost useless since it was one done when there's no new buffers coming from guest. Guest can easily exceeds the limitation. - We've already check upend_idx != done_idx and switch to non zerocopy then. So even if all vq-heads were used, we can still does the packet transmission. We can but performance will suffer. The check were in fact only done when no new buffers submitted from guest. So if guest keep sending, the check won't be done. If we really want to do this, we should do it unconditionally. Anyway, I will do test to see the result. There's a bug in PATCH 5/6, the check: nvq-upend_idx != nvq-done_idx makes the zerocopy always been disabled since we initialize both upend_idx and done_idx to zero. So I change it to: (nvq-upend_idx + 1) % UIO_MAXIOV != nvq-done_idx. But what I would really like to try is limit ubuf_info to VHOST_MAX_PEND. I think this has a chance to improve performance since we'll be using less cache. Maybe, but it in fact decrease the vq size to VHOST_MAX_PEND. Of course this means we must fix the code to really never submit more than VHOST_MAX_PEND requests. Want to try? Ok, sure. With this change on top, I didn't see performance difference w/ and w/o this patch. Did you try small message sizes btw (like 1K)? Or just netperf default of 64K? I just test multiple sessions of TCP_RR. Will test TCP_STREAM also. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ] Documentation/kvm: Update cpuid documentation for steal time and pv eoi
On Fri, Aug 23, 2013 at 05:34:47PM +0530, Raghavendra K T wrote: Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- While adding documentation for pvspinlock, I found that these two should be updated. I have based this on top of pvspinlock kvm host patchset (V12) I would change the description to merely say what the CPUID bits mean, and what they mean is exactly that an MSR is valid. Use KVM_FEATURE_ASYNC_PF as a template. Documentation/virtual/kvm/cpuid.txt | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 22ff659..15a5ac20 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -43,6 +43,15 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 -- +KVM_FEATURE_STEAL_TIME || 5 || guest accounts fine granularity + || || task steal time. I'm not sure what this phrase means. Steal time is a host feature, not a guest feature: IIUC if this bit is set, the hypervisor can pass the guest information about how much time was spent running other processes outside the VM. enabled when + || || shedstat or task delay accounting + || || is supported by the host. I think it's enabled by guest, not by host. +-- +KVM_FEATURE_PV_EOI || 6 || overrides the generic EOI + || || implementation with an optimized + || || version. More exactly with a paravirtualized version. +-- KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit || || before enabling paravirtualized || || spinlock support. -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer
On 2013-08-25 17:26, Arthur Chunqi Li wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 49 - 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..6aa320e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_fix_preempt(struct kvm_vcpu *vcpu) nested_adjust_preemption_timer - just preempt can be misleading. +{ + u64 delta_guest_tsc; + u32 preempt_val, preempt_bit, delta_preempt_val; + + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC) 0x1F; This is rather preemption_timer_scale. And if there is no symbolic value for the bitmask, please introduce one. + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + delta_preempt_val = delta_guest_tsc preempt_bit; + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + if (preempt_val - delta_preempt_val 0) + preempt_val = 0; + else + preempt_val -= delta_preempt_val; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val); The rest unfortunately wrong. It has to be split into two parts: Part one, the calculation of L1's TSC value and its storing in nested_vmx, has to be done on vmexit. Part two, reading the current TSC, calculating the time spent in L0 and converting it into L1 TSC time, this has to be done right before vmentry of L2. Arthur, please make sure that your test case detects the current breakage of preemption timer emulation properly, both /wrt to missing save/restore and also regarding missing L0 time compensation, and then check that your KVM patch fixes it based on the unit test results. Jan +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); - return 1; + if (is_guest_mode(vcpu)) { + if (nested_vmx_exit_handled(vcpu)) { + nested_vmx_vmexit(vcpu); + return 1; + } else + nested_fix_preempt(vcpu); } if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER * bits are further modified by vmx_set_efer() below. */ - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); + exit_control = vmcs_config.vmexit_ctrl; + if (vmcs12-pin_based_vm_exec_control PIN_BASED_VMX_PREEMPTION_TIMER) + exit_control |= VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + vmcs_write32(VM_EXIT_CONTROLS, exit_control); /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are * emulated by vmx_set_efer(), below. @@ -8089,6 +8119,15 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) vmcs12-guest_pending_dbg_exceptions =
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Il 25/08/2013 17:04, Alexander Graf ha scritto: On 24.08.2013, at 21:14, Yann Droneaud wrote: KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Does it make sense to use non-O_CLOEXEC file descriptors with KVM at all? Besides fork() not being supported by KVM, as described in Documentation/virtual/kvm/api.txt, the VMAs of the parent process go away as soon as you exec(). I'm not sure how you can use the inherited file descriptor in a sensible way after exec(). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended
Il 23/08/2013 13:33, Andrew Jones ha scritto: Does smp_cpus map to the current number of cpus, or to the number of possible cpus? If it maps to the number of possible cpus, then this is the right place. If the former, then I guess it'll take more thought. I'ved added Igor (still on vacation) to this reply, but regardless I vote we worry about hot-plug limit checking in different patch. smp_cpus is the initial number, max_cpus is the number of possible cpus. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] kvm: warn if num cpus is greater than num recommended
- Original Message - Il 23/08/2013 13:33, Andrew Jones ha scritto: Does smp_cpus map to the current number of cpus, or to the number of possible cpus? If it maps to the number of possible cpus, then this is the right place. If the former, then I guess it'll take more thought. I'ved added Igor (still on vacation) to this reply, but regardless I vote we worry about hot-plug limit checking in different patch. smp_cpus is the initial number, max_cpus is the number of possible cpus. Yeah, I noticed that this issue is at least partially addressed already with my v2, which incorporates Marcelo's check against the number of hotpluggable cpus (max_cpus). drew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Partial huge page backing with KVM/qemu
On Mon, Aug 26, 2013 at 02:09:57AM +, Chris Leduc wrote: -Original Message- From: Gleb Natapov [mailto:g...@redhat.com] Sent: Sunday, August 25, 2013 1:52 AM To: Chris Leduc Cc: kvm@vger.kernel.org Subject: Re: Partial huge page backing with KVM/qemu On Sat, Aug 24, 2013 at 12:32:07AM +, Chris Leduc wrote: Hi - In a KVM/qemu environment is it possible for the host to back only a portion of the guests memory with huge pages? In some situations it may not be desirable to back the entirety of a guest's memory with huge pages (as can be done via libvirt memoryBacking option). What are those situations? For example to limit a guest with 64GB of total memory to use 4GB of huge pages for fast lookup memory. This takes advantage of the 4 TLB entries for 1G pages on a Sandy/Ivy Bridge processor to ensure a page walk is never necessary for this fast memory. An example is a high performance data plane application. The remainder of the less frequently accessed memory can be in normal pages. When two level paging (EPT) is in use combined mappings are stored in TLB, not linear mappings (see 28.3.1). I am not sure those will ever use 1G TLB. Not with KVM anyway since KVM does not use 1G pages for EPT tables since the chance to get as much of contiguous memory on a running system is close to zero. What would be very useful is to request huge pages in the guest, either at boot time or dynamically, and have the host back them with physical huge pages, but not back the rest of the normal page guest memory with huge pages from the host. The equivalent in Xen is setting allowsuperpage=1 on the hypervisor boot line. As far as I can tell this disables/enables use of huge pages by XEN vm, not something you say you want. The Xen documentation is not clear on this, but in practice this flag allows the host to back up guest huge page requests with physical huge pages. So the guest could for example add hugepages=N to its boot line and these pages would be backed in the host with corresponding physical huge pages. Allow me to be sceptical on this :) With shadow paging sure, same is true for KVM: if guest maps memory with huge page and memory is contiguous on a host too KVM will create huge shadow page, but with two level paging hypervisor has no idea how guest's page tables look. The best it can do is to map entire guest physical memory using huge pages. From experimentation with KVM, requesting hugepages at guest boot time (without memory backing enabled) will result in guest hugepages backed by host normal pages. What do you mean by requesting hugepages at guest boot time and how have you checked that guest hugepages backed by host normal pages? Do you have THP enabled? Without THP you need to back guest's memory with huge pages using –mem-path /hugepagesfs. But again only 2MB pages are supported. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Le 26.08.2013 09:39, Paolo Bonzini a écrit : Il 25/08/2013 17:04, Alexander Graf ha scritto: On 24.08.2013, at 21:14, Yann Droneaud wrote: This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Does it make sense to use non-O_CLOEXEC file descriptors with KVM at all? Besides fork() not being supported by KVM, as described in Documentation/virtual/kvm/api.txt, the VMAs of the parent process go away as soon as you exec(). I'm not sure how you can use the inherited file descriptor in a sensible way after exec(). Sounds a lot like InfiniBand subsystem behavor: IB file descriptors are of no use accross exec() since memory mappings tied to those fds won't be available in the new process: https://lkml.org/lkml/2013/7/8/380 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Il 26/08/2013 10:23, Yann Droneaud ha scritto: Sounds a lot like InfiniBand subsystem behavor: IB file descriptors are of no use accross exec() since memory mappings tied to those fds won't be available in the new process: https://lkml.org/lkml/2013/7/8/380 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org Yes, it is very similar. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V13 1/4] kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi
this is needed by both guest and host. Originally-from: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Acked-by: Gleb Natapov g...@redhat.com Acked-by: Ingo Molnar mi...@kernel.org --- arch/x86/include/uapi/asm/kvm_para.h | 1 + include/uapi/linux/kvm_para.h| 1 + 2 files changed, 2 insertions(+) diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h index 06fdbd9..94dc8ca 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -23,6 +23,7 @@ #define KVM_FEATURE_ASYNC_PF 4 #define KVM_FEATURE_STEAL_TIME 5 #define KVM_FEATURE_PV_EOI 6 +#define KVM_FEATURE_PV_UNHALT 7 /* The last 8 bits are used to indicate how to interpret the flags field * in pvclock structure. If no bits are set, all flags are ignored. diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h index cea2c5c..2841f86 100644 --- a/include/uapi/linux/kvm_para.h +++ b/include/uapi/linux/kvm_para.h @@ -19,6 +19,7 @@ #define KVM_HC_MMU_OP 2 #define KVM_HC_FEATURES3 #define KVM_HC_PPC_MAP_MAGIC_PAGE 4 +#define KVM_HC_KICK_CPU5 /* * hypercalls use architecture specific -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V13 3/4] kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic
Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by: Gleb Natapov g...@redhat.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Acked-by: Gleb Natapov g...@redhat.com Acked-by: Ingo Molnar mi...@kernel.org --- arch/x86/kvm/lapic.c | 5 - arch/x86/kvm/x86.c | 25 ++--- 2 files changed, 10 insertions(+), 20 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index afc1124..48c13c9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -706,7 +706,10 @@ out: break; case APIC_DM_REMRD: - apic_debug(Ignoring delivery mode 3\n); + result = 1; + vcpu-arch.pv.pv_unhalted = 1; + kvm_make_request(KVM_REQ_EVENT, vcpu); + kvm_vcpu_kick(vcpu); break; case APIC_DM_SMI: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1e73dab..640d112 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5502,27 +5502,14 @@ int kvm_hv_hypercall(struct kvm_vcpu *vcpu) */ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned long flags, int apicid) { - struct kvm_vcpu *vcpu = NULL; - int i; + struct kvm_lapic_irq lapic_irq; - kvm_for_each_vcpu(i, vcpu, kvm) { - if (!kvm_apic_present(vcpu)) - continue; + lapic_irq.shorthand = 0; + lapic_irq.dest_mode = 0; + lapic_irq.dest_id = apicid; - if (kvm_apic_match_dest(vcpu, 0, 0, apicid, 0)) - break; - } - if (vcpu) { - /* -* Setting unhalt flag here can result in spurious runnable -* state when unhalt reset does not happen in vcpu_block. -* But that is harmless since that should soon result in halt. -*/ - vcpu-arch.pv.pv_unhalted = true; - /* We need everybody see unhalt before vcpu unblocks */ - smp_wmb(); - kvm_vcpu_kick(vcpu); - } + lapic_irq.delivery_mode = APIC_DM_REMRD; + kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL); } int kvm_emulate_hypercall(struct kvm_vcpu *vcpu) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host
This series forms the kvm host part of paravirtual spinlock based against kvm tree. Please refer to https://lkml.org/lkml/2013/8/9/265 for kvm guest and Xen, x86 part merged to -tip spinlocks. Please note that: kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch for both guest and host. Changes since V12: fold the patch 3 into patch 2 for bisection. (Eric Northup) Raghavendra K T (3): kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Srivatsa Vaddagiri (1): kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Documentation/virtual/kvm/cpuid.txt | 4 Documentation/virtual/kvm/hypercalls.txt | 14 ++ arch/x86/include/asm/kvm_host.h | 5 + arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/lapic.c | 5 - arch/x86/kvm/x86.c | 31 ++- include/uapi/linux/kvm_para.h| 1 + 8 files changed, 61 insertions(+), 3 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V13 4/4] Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock
KVM_HC_KICK_CPU hypercall added to wakeup halted vcpu in paravirtual spinlock enabled guest. KVM_FEATURE_PV_UNHALT enables guest to check whether pv spinlock can be enabled in guest. Thanks Vatsa for rewriting KVM_HC_KICK_CPU Cc: Rob Landley r...@landley.net Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Acked-by: Gleb Natapov g...@redhat.com Acked-by: Ingo Molnar mi...@kernel.org --- Documentation/virtual/kvm/cpuid.txt | 4 Documentation/virtual/kvm/hypercalls.txt | 14 ++ 2 files changed, 18 insertions(+) diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/kvm/cpuid.txt index 83afe65..22ff659 100644 --- a/Documentation/virtual/kvm/cpuid.txt +++ b/Documentation/virtual/kvm/cpuid.txt @@ -43,6 +43,10 @@ KVM_FEATURE_CLOCKSOURCE2 || 3 || kvmclock available at msrs KVM_FEATURE_ASYNC_PF || 4 || async pf can be enabled by || || writing to msr 0x4b564d02 -- +KVM_FEATURE_PV_UNHALT || 7 || guest checks this feature bit + || || before enabling paravirtualized + || || spinlock support. +-- KVM_FEATURE_CLOCKSOURCE_STABLE_BIT ||24 || host will warn if no guest-side || || per-cpu warps are expected in || || kvmclock. diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt index ea113b5..022198e 100644 --- a/Documentation/virtual/kvm/hypercalls.txt +++ b/Documentation/virtual/kvm/hypercalls.txt @@ -64,3 +64,17 @@ Purpose: To enable communication between the hypervisor and guest there is a shared page that contains parts of supervisor visible register state. The guest can map this shared page to access its supervisor register through memory using this hypercall. + +5. KVM_HC_KICK_CPU + +Architecture: x86 +Status: active +Purpose: Hypercall used to wakeup a vcpu from HLT state +Usage example : A vcpu of a paravirtualized guest that is busywaiting in guest +kernel mode for an event to occur (ex: a spinlock to become available) can +execute HLT instruction once it has busy-waited for more than a threshold +time-interval. Execution of HLT instruction would cause the hypervisor to put +the vcpu to sleep until occurence of an appropriate event. Another vcpu of the +same guest can wakeup the sleeping vcpu by issuing KVM_HC_KICK_CPU hypercall, +specifying APIC ID (a1) of the vcpu to be woken up. An additional argument (a0) +is used in the hypercall for future use. -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] KVM: nVMX: Fully support of nested VMX preemption timer
On Mon, Aug 26, 2013 at 3:23 PM, Jan Kiszka jan.kis...@web.de wrote: On 2013-08-25 17:26, Arthur Chunqi Li wrote: This patch contains the following two changes: 1. Fix the bug in nested preemption timer support. If vmexit L2-L0 with some reasons not emulated by L1, preemption timer value should be save in such exits. 2. Add support of Save VMX-preemption timer value VM-Exit controls to nVMX. With this patch, nested VMX preemption timer features are fully supported. Signed-off-by: Arthur Chunqi Li yzt...@gmail.com --- arch/x86/kvm/vmx.c | 49 - 1 file changed, 44 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 57b4e12..6aa320e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2204,7 +2204,14 @@ static __init void nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; + if (!(nested_vmx_pinbased_ctls_high PIN_BASED_VMX_PREEMPTION_TIMER)) + nested_vmx_exit_ctls_high = + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); + if (!(nested_vmx_exit_ctls_high VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) + nested_vmx_pinbased_ctls_high = + (~PIN_BASED_VMX_PREEMPTION_TIMER); nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | VM_EXIT_LOAD_IA32_EFER); @@ -6706,6 +6713,22 @@ static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2) *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } +static void nested_fix_preempt(struct kvm_vcpu *vcpu) nested_adjust_preemption_timer - just preempt can be misleading. +{ + u64 delta_guest_tsc; + u32 preempt_val, preempt_bit, delta_preempt_val; + + preempt_bit = native_read_msr(MSR_IA32_VMX_MISC) 0x1F; This is rather preemption_timer_scale. And if there is no symbolic value for the bitmask, please introduce one. + delta_guest_tsc = kvm_x86_ops-read_l1_tsc(vcpu, + native_read_tsc()) - vcpu-arch.last_guest_tsc; + delta_preempt_val = delta_guest_tsc preempt_bit; + preempt_val = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); + if (preempt_val - delta_preempt_val 0) + preempt_val = 0; + else + preempt_val -= delta_preempt_val; + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, preempt_val); The rest unfortunately wrong. It has to be split into two parts: Part one, the calculation of L1's TSC value and its storing in nested_vmx, has to be done on vmexit. Part two, reading the current TSC, calculating the time spent in L0 and converting it into L1 TSC time, this has to be done right before vmentry of L2. As what we discussed yesterday, the calculation of L1's TSC value is not saved in nested_vmx, however, to avoid adding codes to the hot patch of vmexit. Instead, we use vcpu-arch.last_guest_tsc as the value stored on vmexit (which has been done already). And the value of part two is calculated in nested_fix_preempt() above (see variant delta_guest_tsc, which stores the consumed TSC value in L0). Since vmx_handle_exit is the last function called in vmexit path, I think it's OK to put part two here. Arthur, please make sure that your test case detects the current breakage of preemption timer emulation properly, both /wrt to missing save/restore and also regarding missing L0 time compensation, and then check that your KVM patch fixes it based on the unit test results. OK, I will commit a patch of kvm-unit-tests to test these changes. Arthur Jan +} /* * The guest has exited. See if we can fix it or if we need userspace * assistance. @@ -6734,9 +6757,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) else vmx-nested.nested_run_pending = 0; - if (is_guest_mode(vcpu) nested_vmx_exit_handled(vcpu)) { - nested_vmx_vmexit(vcpu); - return 1; + if (is_guest_mode(vcpu)) { + if (nested_vmx_exit_handled(vcpu)) { + nested_vmx_vmexit(vcpu); + return 1; + } else + nested_fix_preempt(vcpu); } if (exit_reason VMX_EXIT_REASONS_FAILED_VMENTRY) { @@ -7517,6 +7543,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + u32 exit_control; vmcs_write16(GUEST_ES_SELECTOR, vmcs12-guest_es_selector); vmcs_write16(GUEST_CS_SELECTOR, vmcs12-guest_cs_selector); @@ -7690,7 +7717,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER
Re: [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host
On Mon, Aug 26, 2013 at 02:18:32PM +0530, Raghavendra K T wrote: This series forms the kvm host part of paravirtual spinlock based against kvm tree. Please refer to https://lkml.org/lkml/2013/8/9/265 for kvm guest and Xen, x86 part merged to -tip spinlocks. Please note that: kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch for both guest and host. Thanks, applied. The patchset is not against kvm.git queue though, so I had to fix one minor conflict manually. Changes since V12: fold the patch 3 into patch 2 for bisection. (Eric Northup) Raghavendra K T (3): kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apic Documentation/kvm : Add documentation on Hypercalls and features used for PV spinlock Srivatsa Vaddagiri (1): kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocks Documentation/virtual/kvm/cpuid.txt | 4 Documentation/virtual/kvm/hypercalls.txt | 14 ++ arch/x86/include/asm/kvm_host.h | 5 + arch/x86/include/uapi/asm/kvm_para.h | 1 + arch/x86/kvm/cpuid.c | 3 ++- arch/x86/kvm/lapic.c | 5 - arch/x86/kvm/x86.c | 31 ++- include/uapi/linux/kvm_para.h| 1 + 8 files changed, 61 insertions(+), 3 deletions(-) -- 1.7.11.7 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag
On Sat, Aug 24, 2013 at 10:14:06PM +0200, Yann Droneaud wrote: Hi, Following a patchset asking to change calls to get_unused_flag() [1] to use O_CLOEXEC, Alex Williamson [2][3] decided to change VFIO to use the flag. Since it's a related subsystem to KVM, using O_CLOEXEC for file descriptors created by KVM might be applicable too. I'm suggesting to change calls to anon_inode_getfd() to use O_CLOEXEC as default flag. This patchset should be reviewed to not break existing userspace program. BTW, if it's not applicable, I would suggest that new ioctls be added to KVM subsystem, those ioctls would have a flag field added to their arguments. Such flag would let userspace choose the open flag to use. See for example other APIs using anon_inode_getfd() such as fanotify, inotify, signalfd and timerfd. You might be interested to read: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html Applied, thanks. Regards. [1] http://lkml.kernel.org/r/cover.1376327678.git.ydrone...@opteya.com [2] http://lkml.kernel.org/r/1377186804.25163.17.ca...@ul30vt.home [3] http://lkml.kernel.org/r/20130822171744.1297.13711.st...@bling.home Yann Droneaud (2): kvm: use anon_inode_getfd() with O_CLOEXEC flag ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_vio.c| 2 +- arch/powerpc/kvm/book3s_hv.c| 2 +- virt/kvm/kvm_main.c | 6 +++--- 4 files changed, 6 insertions(+), 6 deletions(-) -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V13 0/4] Paravirtualized ticket spinlocks for KVM host
On 08/26/2013 03:34 PM, Gleb Natapov wrote: On Mon, Aug 26, 2013 at 02:18:32PM +0530, Raghavendra K T wrote: This series forms the kvm host part of paravirtual spinlock based against kvm tree. Please refer to https://lkml.org/lkml/2013/8/9/265 for kvm guest and Xen, x86 part merged to -tip spinlocks. Please note that: kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapi is a common patch for both guest and host. Thanks, applied. The patchset is not against kvm.git queue though, so I had to fix one minor conflict manually. Thank you Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] KVM: ARM: Get rid of KVM_HPAGE_ defines
On Sun, Aug 25, 2013 at 04:27:14PM +0100, Alexander Graf wrote: On 25.08.2013, at 16:18, Peter Maydell wrote: On 25 August 2013 15:48, Gleb Natapov g...@redhat.com wrote: On Sun, Aug 25, 2013 at 03:29:17PM +0100, Peter Maydell wrote: Smiley noted, but this is pretty unlikely since it's not possible to lie to the guest about which mode it's in, so you can't make a guest think it's in Hyp mode. I suspected this, but forgot most that I read about Hyp mode by now. Need to refresh my memory ASAP. Is it impossible even with a lot of emulation? Can guest detect that it is not in a Hyp mode without trapping into hypervisor? Yes. The current mode is in the the low bits of the CPSR, which is readable without causing a trap. This is just the most obvious roadblock; I bet there are more. If you really had to run Hyp mode code in a VM you probably have to do it by having it all emulated via TCG. Or in an in-kernel instruction emulator that we have lying around anyways. For kvm-in-kvm that should be good enough, as we only need to execute a few instructions in HYP mode. Will require emulation on each trap to Hyp mode tough. But since you already have ideas about nested Hyp I consider it done :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: PPC: Book3S HV: Implement timebase offset for guests
This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. Therefore this provides a new per-vcpu value accessed via the one_reg interface using the new KVM_REG_PPC_TB_OFFSET identifier. This value defaults to 0 and is not modified by KVM. On entering the guest, this value is added onto the timebase, and on exiting the guest, it is subtracted from the timebase. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. Writing to the TBU40 register only alters the upper 40 bits of the timebase, leaving the lower 24 bits unchanged. This provides a way to modify the timebase for guest migration without disturbing the synchronization of the timebase registers across CPU cores. This means that userspace must supply a value for the offset that has zeroes in the lower 24 bits. If the lower 24 bits are non-zero, they are ignored and taken as zeroes. Timebase values stored in KVM structures (struct kvm_vcpu, struct kvmppc_vcore, etc.) are stored as host timebase values. The timebase values in the dispatch trace log need to be guest timebase values, however, since that is read directly by the guest. This moves the setting of vcpu-arch.dec_expires on guest exit to a point after we have restored the host timebase so that vcpu-arch.dec_expires is a host timebase value. Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/include/uapi/asm/kvm.h | 3 ++ arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c| 8 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 +++-- 7 files changed, 56 insertions(+), 10 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 8b4d984..88f4653 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1815,6 +1815,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_TB_OFFSET| 64 ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 91b833d..702d88b 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -607,6 +607,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + u64 tb_offset; /* guest timebase - host timebase */ #endif }; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 4a9e408..72f8798 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -243,6 +243,7 @@ #define SPRN_TBRU 0x10D /* Time Base Read Upper Register (user, R/O) */ #define SPRN_TBWL 0x11C /* Time Base Lower Register (super, R/W) */ #define SPRN_TBWU 0x11D /* Time Base Upper Register (super, R/W) */ +#define SPRN_TBU40 0x11E /* Timebase upper 40 bits (hyper, R/W) */ #define SPRN_SPURR 0x134 /* Scaled PURR */ #define SPRN_HSPRG00x130 /* Hypervisor Scratch 0 */ #define SPRN_HSPRG10x131 /* Hypervisor Scratch 1 */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index fb0a8a9..9935321 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -504,6 +504,9 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_TLB3PS (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9a) #define KVM_REG_PPC_EPTCFG (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9b) +/* Timebase offset */ +#define KVM_REG_PPC_TB_OFFSET (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9c) + /* PPC64 eXternal Interrupt Controller Specification */ #define KVM_DEV_XICS_GRP_SOURCES 1 /* 64-bit source attributes */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 822b6ba..62acafd 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -488,6 +488,7 @@ int main(void) DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); + DEFINE(VCPU_TB_OFFSET, offsetof(struct kvm_vcpu, arch.tb_offset)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
[PATCH 0/3] Some fixes for PPC HV-style KVM
Here are 3 patches that add two PMU (performance monitor unit) registers to the set being context-switched on guest entry and exit, and implement a per-guest timebase offset that is needed when we migrate a guest from one host to another that has a different timebase origin. The first patch just adds some one_reg register definitions for extra PMU registers, including some that exist on POWER8. These new registers aren't yet handled by the kernel code, but their definitions are included here so as to reserve the numbers. These patches are against Alex Graf's kvm-ppc-queue branch. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers
Currently we are not saving and restoring the SIAR and SDAR registers in the PMU (performance monitor unit) on guest entry and exit. The result is that performance monitoring tools in the guest could get false information about where a program was executing and what data it was accessing at the time of a performance monitor interrupt. This fixes it by saving and restoring these registers along with the other PMU registers on guest entry/exit. This also provides a way for userspace to access these values for a vcpu via the one_reg interface. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c| 12 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 4 files changed, 28 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3328353..91b833d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -498,6 +498,8 @@ struct kvm_vcpu_arch { u64 mmcr[3]; u32 pmc[8]; + u64 siar; + u64 sdar; #ifdef CONFIG_KVM_EXIT_TIMING struct mutex exit_timing_lock; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a67c76e..822b6ba 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -506,6 +506,8 @@ int main(void) DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded)); DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr)); DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc)); + DEFINE(VCPU_SIAR, offsetof(struct kvm_vcpu, arch.siar)); + DEFINE(VCPU_SDAR, offsetof(struct kvm_vcpu, arch.sdar)); DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb)); DEFINE(VCPU_SLB_MAX, offsetof(struct kvm_vcpu, arch.slb_max)); DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2b95c45..9df824f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -771,6 +771,12 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) } break; #endif /* CONFIG_VSX */ + case KVM_REG_PPC_SIAR: + *val = get_reg_val(id, vcpu-arch.siar); + break; + case KVM_REG_PPC_SDAR: + *val = get_reg_val(id, vcpu-arch.sdar); + break; case KVM_REG_PPC_VPA_ADDR: spin_lock(vcpu-arch.vpa_update_lock); *val = get_reg_val(id, vcpu-arch.vpa.next_gpa); @@ -855,6 +861,12 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) } break; #endif /* CONFIG_VSX */ + case KVM_REG_PPC_SIAR: + vcpu-arch.siar = set_reg_val(id, *val); + break; + case KVM_REG_PPC_SDAR: + vcpu-arch.sdar = set_reg_val(id, *val); + break; case KVM_REG_PPC_VPA_ADDR: addr = set_reg_val(id, *val); r = -EINVAL; diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 60dce5b..2e1dd6c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -198,6 +198,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) ld r6, VCPU_MMCR + 16(r4) mtspr SPRN_MMCR1, r5 mtspr SPRN_MMCRA, r6 +BEGIN_FTR_SECTION + ld r7, VCPU_SIAR(r4) + ld r8, VCPU_SDAR(r4) + mtspr SPRN_SIAR, r7 + mtspr SPRN_SDAR, r8 +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) mtspr SPRN_MMCR0, r3 isync @@ -1125,6 +1131,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) std r4, VCPU_MMCR(r9) std r5, VCPU_MMCR + 8(r9) std r6, VCPU_MMCR + 16(r9) +BEGIN_FTR_SECTION + mfspr r7, SPRN_SIAR + mfspr r8, SPRN_SDAR + std r7, VCPU_SIAR(r9) + std r8, VCPU_SDAR(r9) +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) mfspr r3, SPRN_PMC1 mfspr r4, SPRN_PMC2 mfspr r5, SPRN_PMC3 -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: PPC: Book3S HV: Add one_reg definitions for more PMU registers
This adds one_reg register numbers for two performance monitor registers that exist on POWER7 and later processors (SIAR and SDAR) and three that will be introduced on POWER8 (MMCR2, MMCRS and SIER). Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt | 5 + arch/powerpc/include/uapi/asm/kvm.h | 5 + 2 files changed, 10 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 66dd2aa..8b4d984 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1765,6 +1765,11 @@ registers, find a list below: PPC | KVM_REG_PPC_MMCR0 | 64 PPC | KVM_REG_PPC_MMCR1 | 64 PPC | KVM_REG_PPC_MMCRA | 64 + PPC | KVM_REG_PPC_MMCR2 | 64 + PPC | KVM_REG_PPC_MMCRS | 64 + PPC | KVM_REG_PPC_SIAR | 64 + PPC | KVM_REG_PPC_SDAR | 64 + PPC | KVM_REG_PPC_SIER | 64 PPC | KVM_REG_PPC_PMC1 | 32 PPC | KVM_REG_PPC_PMC2 | 32 PPC | KVM_REG_PPC_PMC3 | 32 diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..fb0a8a9 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -429,6 +429,11 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_MMCR0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10) #define KVM_REG_PPC_MMCR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11) #define KVM_REG_PPC_MMCRA (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12) +#define KVM_REG_PPC_MMCR2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x13) +#define KVM_REG_PPC_MMCRS (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x14) +#define KVM_REG_PPC_SIAR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x15) +#define KVM_REG_PPC_SDAR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x16) +#define KVM_REG_PPC_SIER (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x17) #define KVM_REG_PPC_PMC1 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18) #define KVM_REG_PPC_PMC2 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19) -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH ] Documentation/kvm: Update cpuid documentation for steal time and pv eoi
On 08/26/2013 12:37 PM, Michael S. Tsirkin wrote: I would change the description to merely say what the CPUID bits mean, and what they mean is exactly that an MSR is valid. Use KVM_FEATURE_ASYNC_PF as a template. Thank you for the review. Changing the doc accordingly by adding msr info. Please refer below. +KVM_FEATURE_STEAL_TIME || 5 || guest accounts fine granularity + || || task steal time. I'm not sure what this phrase means. Steal time is a host feature, not a guest feature: IIUC if this bit is set, the hypervisor can pass the guest information about how much time was spent running other processes outside the VM. Okay. I guess I need some help here. I took this from the PARAVIRT_TIME_ACCOUNTING config help. also I saw that guest is actually returning the steal time in kvm_steal_clock(). enabled when + || || shedstat or task delay accounting + || || is supported by the host. I think it's enabled by guest, not by host. true. My understanding was, Guest enables it when host has schedstat or task delay accounting on. I referred to this hunk in kvm/cpuid.c if (sched_info_on()) entry-eax |= (1 KVM_FEATURE_STEAL_TIME); and sched_info_on() is true when schedstat or task delay accounting is on. Does this look good? Enabled by writing to msr 0x4b564d03. The feature is enabled by guest when host has schedstat or task delay accounting support. +KVM_FEATURE_PV_EOI || 6 || overrides the generic EOI + || || implementation with an optimized + || || version. More exactly with a paravirtualized version. Okay. So how does this sound? overrides the generic EOI implementation with a paravirtualized version. This feature is enabled by writing to msr 0x4b564d04. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe kvm -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mmapping physical memory
Hi all I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on QEMU without KVM support enabled, but with KVM i get kernel errors: * (with EPT enabled) [ 746.940720] [ cut here ] [ 746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257! [ 746.949067] invalid opcode: [#1] SMP [ 746.949393] Modules linked in: rte_kni(OF) igb_uio(OF) ebtable_nat(F) xt_CHECKSUM(F) bridge(F) stp(F) llc(F) nf_conntrack_netbios_ns(F) nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_mangle(F) ip6t_REJECT(F) nf_conntrack_ipv6(F) nf_defrag_ipv6(F) bnep(F) bluetooth(F) rfkill(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) iptable_mangle(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) ip6table_filter(F) ip6_tables(F) be2iscsi(F) iscsi_boot_sysfs(F) bnx2i(F) cnic(F) uio(F) cxgb4i(F) cxgb4(F) cxgb3i(F) cxgb3(F) libcxgbi(F) ib_iser(F) rdma_cm(F) ib_addr(F) iw_cm(F) ib_cm(F) ib_sa(F) ib_mad(F) ib_core(F) iscsi_tcp(F) libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) iTCO_wdt(F) iTCO_vendor_support(F) acpi_cpufreq(F) mperf(F) coretemp(F) shpchp(F) [ 747.014963] lpc_ich(F) mfd_core(F) i2c_i801(F) ioatdma(F) microcode(F) joydev(F) i7core_edac(F) edac_core(F) vhost_net(F) tun(F) macvtap(F) macvlan(F) kvm_intel(F) kvm(F) uinput(F) crc32_pclmul(F) crc32c_intel(F) ghash_clmulni_intel(F) ast(F) ixgbe(F) igb(F) drm_kms_helper(F) e1000e(F) dca(F) ttm(F) ptp(F) drm(F) i2c_algo_bit(F) pps_core(F) mdio(F) i2c_core(F) sunrpc(F) [last unloaded: rte_kni] [ 747.136764] CPU 8 [ 747.136909] Pid: 2501, comm: qemu-system-x86 Tainted: GF O 3.9.11-200.no_strict_dev_mem.fc18.x86_64 #1 Intel Corporation S5520HC/S5520HC [ 747.228668] RIP: 0010:[a018c43a] [a018c43a] __gfn_to_pfn_memslot+0x36a/0x3e0 [kvm] [ 747.259705] RSP: 0018:880130d39ae8 EFLAGS: 00010246 [ 747.291580] RAX: RBX: RCX: 8801effeb000 [ 747.322598] RDX: 001c3c00 RSI: 7fd11f00 RDI: ea00070f [ 747.354242] RBP: 880130d39b58 R08: 0126 R09: 880130d39c2f [ 747.385123] R10: R11: 7fd14000 R12: 7fd11f01 [ 747.415981] R13: 880130d39ba7 R14: 8801c3bcb4f0 R15: 8802b4538001 [ 747.447877] FS: 7fd35c1e9700() GS:8801e9c8() knlGS: [ 747.479010] CS: 0010 DS: ES: CR0: 8005003b [ 747.510220] CR2: 7fe2ffc0 CR3: 0001e66c4000 CR4: 27e0 [ 747.542410] DR0: DR1: DR2: [ 747.573780] DR3: DR6: 0ff0 DR7: 0400 [ 747.604759] Process qemu-system-x86 (pid: 2501, threadinfo 880130d38000, task 8801c3bcb4f0) [ 747.637044] Stack: [ 747.668362] 880130d39af8 81083798 880130d39b48 7fd11f00 [ 747.700654] 001c3c00 00ff8802b3272a90 0380 8802b3272a80 [ 747.731895] 0380 000fc000 880130d39c38 880365fe8000 [ 747.763068] Call Trace: [ 747.793746] [81083798] ? hrtimer_start+0x18/0x20 [ 747.824435] [a018c530] __gfn_to_pfn+0x60/0x70 [kvm] [ 747.855267] [a018c61a] gfn_to_pfn_async+0x1a/0x20 [kvm] [ 747.884586] [a01a703a] try_async_pf+0x4a/0x1d0 [kvm] [ 747.914146] [a01aea2a] tdp_page_fault+0xfa/0x210 [kvm] [ 747.943000] [a01a89a1] kvm_mmu_page_fault+0x31/0x100 [kvm] [ 747.972271] [a02135ce] handle_ept_violation+0x5e/0x100 [kvm_intel] [ 748.000620] [a02189f6] vmx_handle_exit+0xf6/0x7c0 [kvm_intel] [ 748.029860] [a01bbe38] ? kvm_apic_has_interrupt+0x28/0xe0 [kvm] [ 748.058214] [a0210370] ? vmx_invpcid_supported+0x20/0x20 [kvm_intel] [ 748.086496] [a01a281b] kvm_arch_vcpu_ioctl_run+0x2fb/0x11a0 [kvm] [ 748.114711] [a019de67] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm] [ 748.142788] [a018a0ee] kvm_vcpu_ioctl+0x26e/0x5f0 [kvm] [ 748.170647] [810b7340] ? do_futex+0x100/0xad0 [ 748.198558] [811232b4] ? perf_event_context_sched_in+0x94/0xc0 [ 748.226194] [811abe07] do_vfs_ioctl+0x97/0x580 [ 748.253809] [8129d027] ? file_has_perm+0x97/0xb0 [ 748.281110] [811ac381] sys_ioctl+0x91/0xb0 [ 748.307911] [816604d9] system_call_fastpath+0x16/0x1b [ 748.88] Code: ff ff 49 29 d2 4c 89 d2 48 c1 ea 0c 48 03 90 98 00 00 00 48 89 d7 48 89 55 b0 e8 92 d6 ff ff 84 c0 48 8b 55 b0 0f 85 bf fe ff ff 0f 0b 0f 1f 40 00 48 ba 00 00 00 00 00 00 f0 7f e9 aa fe ff ff [ 748.392724] RIP [a018c43a] __gfn_to_pfn_memslot+0x36a/0x3e0 [kvm] [ 748.419435] RSP 880130d39ae8 [ 748.524222] ---[ end trace 854a37c471141217 ]--- *** (with EPT disabled) [ 559.581338] [ cut here ] [ 559.581701] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257!
Re: [PATCH v2] tile: support KVM for tilegx
On Sun, Aug 25, 2013 at 09:26:47PM -0400, Chris Metcalf wrote: On 8/25/2013 7:39 AM, Gleb Natapov wrote: On Mon, Aug 12, 2013 at 04:24:11PM -0400, Chris Metcalf wrote: This change provides the initial framework support for KVM on tilegx. Basic virtual disk and networking is supported. This needs to be broken down to more reviewable patches. I already broke out one pre-requisite patch that wasn't strictly KVM-related: https://lkml.org/lkml/2013/8/12/339 In addition, we've separately arranged to support booting our kernels in a way that is compatible with the Tilera booter running at the highest privilege level, which enables multiple kernel privilege levels: https://lkml.org/lkml/2013/5/2/468 How would you recommend further breaking down this patch? It's pretty much just the basic support for minimal KVM. I suppose I could break out all the I/O related stuff into a separate patch, though it wouldn't amount to much; perhaps the console could also be broken out separately. Any other suggestions? First of all please break out host and guest bits. Also I/O related stuff, like you suggest (so that guest PV bits will be in separate patch) and change to a common code (not much as far as I see) with explanation why it is needed. (Why kvm_vcpu_kick() is not needed for instance?) Also can you describe the implementation a little bit? Does tile arch has vitalization extension this implementation uses, or is it trap and emulate approach? If later does it run unmodified guest kernels? What userspace are you using with this implementation? We could do full virtualization via trap and emulate, but we've elected to do a para-virtualized approach. Userspace runs at PL (privilege level) 0, the guest kernel runs at PL1, and the host runs at PL2. We have available per-PL resources for various things, and take advantage of having two on-chip timers (for example) to handle timing for the host and guest kernels. We run the same userspace with either the host or the guest. OK, thanks for explanation. Why have you decided to do PV over trap and emulate? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support
On Wed, Aug 14, 2013 at 10:51:14AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote: This is to reserve a capablity number for upcoming support of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls which support mulptiple DMA map/unmap operations per one call. Gleb, any chance you can put this (and the next one) into a tree to lock in the numbers ? Applied it. Sorry for slow response, was on vocation and still go through the email backlog. I've been wanting to apply the whole series to powerpc-next, that's stuff has been simmering for way too long and is in a good enough shape imho, but I need the capabilities and ioctl numbers locked in your tree first. Cheers, Ben. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: 2013/07/16: * changed the number Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- include/uapi/linux/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index acccd08..99c2533 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 +#define KVM_CAP_SPAPR_MULTITCE 94 #ifdef KVM_CAP_IRQ_ROUTING -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mmapping physical memory
Hi Anatoly, On Mon, Aug 26, 2013 at 12:58:25PM +0100, Anatoly Burakov wrote: Hi all I am using IVSHMEM to mmap /dev/mem into guest. The mmap works fine on QEMU without KVM support enabled, but with KVM i get kernel errors: * (with EPT enabled) [ 746.940720] [ cut here ] [ 746.948612] kernel BUG at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1257! So the problem is KVM cannot do put_page on a pfn coming from a /dev/mem mapping, but it cannot handle VM_PFNMAP mappings without PageReserved set. During kvm_release_page_* KVM only has the pfn number of the page, and it has to decide if this page is refcounted or not, solely based on the pfn number. So if the page is not set as referenced it cannot allow a mapping to be established, or later during spte teardown put_page would run on the /dev/mem memory leading to memory corruption. The above BUG_ON isn't just a false positive, but it shows a limitation in the KVM page fault ability to map any kind of memory coming from the host (including /dev/mem mappings). So I'm suggesting to drop FOLL_GET in the page fault and kvm_release_page_* after the spte establishment, and to relay entirely on the mmu notifier and the kvm_mmu lock by adding a vcpu-in_progress_fault_addr to set before calling gup hva_to_pfn and to clear in the mmu notifier code within kvm-mmu_lock and to check within the kvm-mmu_lock during spte establishment to know if the page pointer become stale and we shall bail out and repeat the fault or not. We'll still need to use FOLL_GET and set_page_dirty in some cases, like after modifying the page in places like emulator_cmpxchg_emulated. Those places cannot depend on the mmu notifier and the dirty bit set in the pte isn't enough because the page can be swapped out to disk and marked clean before kmap_atomic runs, but the 99% of the hva_to_pfn are coming from the KVM secondary MMU page faults, they're protected by the mmu notifier and they can skip the refcounting completely including FOLL_GET. And then because we won't have to run put_page at all anymore, the above BUG will disappear too. In terms of performance, I estimate the only cons will be a ATOMIC_ONCE(vcpu-in_progress_fault_addr) = addr per-thread cacheline local and lockless initialization before calling gup in hva_to_pfn and the pros will be the removal of all refcounting atomic_inc/dec and set_page_dirty from all the KVM page faults. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Oficiální vyhláení OZNÁMENÍ
To je Vám oznámit, e jste byl vybrán pro pen#283;ní cenu 1,600,000.00 GB liber Chevron ropy / Organizace spojených národ#367; pro rozvoj Program (UNDP) Chcete-li zahájit zpracování vaí cenu, kontaktujte: Mr.D. Matt, E-mail: derickm...@googlemail.com Kontaktujte jej, a poskytne mu na tajnou PIN kódu x5pukg2013 a Vae referen#269;ní #269;íslo pro UNDP: 62082013EA-UK/21. Ms.Tulisa hn#283;dá http://translate.google.com/#auto/en/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Windows Server 2008R2 KVM guest performance issues
I've been trying to track down the cause of some serious performance issues with a Windows 2008R2 KVM guest. So far, I've been unable to determine what exactly is causing the issue. When the guest is under load, I see very high kernel CPU usage, as well as terrible guest performance. The workload on the guest is approximately 1/4 of what we'd run unvirtualized on the same hardware. Even at that level, we max out every vCPU in the guest. While the guest runs, I see very high kernel CPU usage (based on `htop` output). Host setup: Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux CentOS 6 qemu 1.6.0 2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores including hyperthread cores) 24GB memory swap file is enabled, but unused Guest setup: Windows Server 2008R2 (64 bit) 24 vCPUs 16 GB memory VirtIO disk and network drivers installed /qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 24,sockets=1,cores=12,threads=2 -uuid 90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 -vnc 127.0.0.1:100 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 The beginning of `perf top` output: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace So far, I've tried the following with very little effect: * Disable HPET on the guest * Enable hv_relaxed, hv_vapic, hv_spinlocks * Enable SR-IOV * Pin vCPUs to physical CPUs * Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes) * bcdedit /set useplatformclock yes and no Any suggestions as to what I can do to get better performance out of ths guest? Or reasons why I'm seeing such high kernel cpu usage with it? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FAQ on linux-kvm.org has broken link
Hi, 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way you can rule out fixed bugs in vhost_net or tap. For the last two weeks it went fine for a couple of days each time. This evening was really bad again: uptimes of 5-10 minutes. This is with 3.11-rc4. 2. Get the system into the bad state and then do some deeper. Start with outgoing ping, instrument guest driver and host vhost_net functions to see what the drivers are doing, inspect the transmit vring, etc. #1 is probably the best next step. If it fails and you still have time Yup, very much. to work on a solution we can start digging deeper with #2. I had a small script running with showed the amount of traffic coming and going out each second. I did not see any increase or decrease in the amount, only that when the problem happens RX stays but TX goes to 0. Folkert van Heusden -- MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san den commando spiti puru. Piki puru spesrutu sani, wroko nanga difrenti kroru, tya kon makandra, nanga wan lo moro. http://www.vanheusden.com/multitail/ -- Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v3 1/4] idr: Percpu ida
On Tue, Aug 20, 2013 at 02:31:57PM -0700, Andrew Morton wrote: On Fri, 16 Aug 2013 23:09:06 + Nicholas A. Bellinger n...@linux-iscsi.org wrote: From: Kent Overstreet k...@daterainc.com Percpu frontend for allocating ids. With percpu allocation (that works), it's impossible to guarantee it will always be possible to allocate all nr_tags - typically, some will be stuck on a remote percpu freelist where the current job can't get to them. We do guarantee that it will always be possible to allocate at least (nr_tags / 2) tags - this is done by keeping track of which and how many cpus have tags on their percpu freelists. On allocation failure if enough cpus have tags that there could potentially be (nr_tags / 2) tags stuck on remote percpu freelists, we then pick a remote cpu at random to steal from. Note that there's no cpu hotplug notifier - we don't care, because steal_tags() will eventually get the down cpu's tags. We _could_ satisfy more allocations if we had a notifier - but we'll still meet our guarantees and it's absolutely not a correctness issue, so I don't think it's worth the extra code. ... include/linux/idr.h | 53 + lib/idr.c | 316 +-- I don't think this should be in idr.[ch] at all. It has no relationship with the existing code. Apart from duplicating its functionality :( Well, in the full patch series it does make use of the non-percpu ida. I'm still hoping to get the ida/idr rewrites in. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v3 1/4] idr: Percpu ida
On Wed, Aug 21, 2013 at 06:25:58PM +, Christoph Lameter wrote: On Fri, 16 Aug 2013, Nicholas A. Bellinger wrote: + spinlock_t lock; Remove the spinlock. As Andrew noted, the spinlock is needed because of tag stealing. (You don't think I'd stick a spinlock on a percpu data structure without a real reason, would you?) + unsignednr_free; + unsignedfreelist[]; +}; + +static inline void move_tags(unsigned *dst, unsigned *dst_nr, +unsigned *src, unsigned *src_nr, +unsigned nr) +{ + *src_nr -= nr; + memcpy(dst + *dst_nr, src + *src_nr, sizeof(unsigned) * nr); + *dst_nr += nr; +} + +static inline unsigned alloc_local_tag(struct percpu_ida *pool, + struct percpu_ida_cpu *tags) Pass the __percpu offset and not the tags pointer. Why? It just changes where the this_cpu_ptr +{ + int tag = -ENOSPC; + + spin_lock(tags-lock); Interupts are already disabled. Drop the spinlock. + if (tags-nr_free) + tag = tags-freelist[--tags-nr_free]; You can keep this or avoid address calculation through segment prefixes. F.e. if (__this_cpu_read(tags-nrfree) { int n = __this_cpu_dec_return(tags-nr_free); tag = __this_cpu_read(tags-freelist[n]); } Can you explain what the point of that change would be? It sounds like it's preferable to do it that way and avoid this_cpu_ptr() for some reason, but you're not explaining why. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 3.10.1 - NMI received for unknown reason
On 25.08.2013 13:45, Gleb Natapov wrote: On Fri, Aug 09, 2013 at 09:14:13PM +0200, Stefan Pietsch wrote: On 04.08.2013 14:44, Gleb Natapov wrote: On Fri, Aug 02, 2013 at 08:24:38AM +0200, Stefan Pietsch wrote: On 31.07.2013 11:20, Gleb Natapov wrote: On Wed, Jul 31, 2013 at 11:10:01AM +0200, Stefan Pietsch wrote: On 30.07.2013 07:31, Gleb Natapov wrote: What happen if you run perf on your host (perf record -a)? Do you see same NMI messages? It seems that perf record -a triggers some delayed NMI messages. They appear about 20 or 30 minutes after the command. This seems strange. Definitely strange. KVM guest is not running in parallel, correct? 20, 30 minutes after perf stopped running or it is running all of the time? No, the KVM guest ist not running in parallel. But I'm not able to clearly reproduce the NMI messages with perf record. I start perf record -a and after some minutes I stop the recording. After that it seems NMI messages appear within a random period of time. So, I cannot tell what triggers the messages. When you run KVM with coreduo cpu model it emulates PMU which basically make is perf front end. If you can reproduce the messages with perf too it probably means that the problem is not in the KVM itself. If you disabled NMI watchdog in the guest the messages may go away. Can you send your guest's dmesg when you boot it with coreduo mode? The NMI messages appear in the host only. The guest runs as usual. I understand that. But enabling guest nmi watchdog is what makes KVM to use perf subsystem and likely causes this host messages. Try do disable nmi watchdog in a guest and see what happens. I disabled the watchdog in the guest by booting the kernel with nmi_watchdog=0. This does not produce any NMI errors. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Windows Server 2008R2 KVM guest performance issues
On 8/26/2013 3:15 PM, Brian Rak wrote: I've been trying to track down the cause of some serious performance issues with a Windows 2008R2 KVM guest. So far, I've been unable to determine what exactly is causing the issue. When the guest is under load, I see very high kernel CPU usage, as well as terrible guest performance. The workload on the guest is approximately 1/4 of what we'd run unvirtualized on the same hardware. Even at that level, we max out every vCPU in the guest. While the guest runs, I see very high kernel CPU usage (based on `htop` output). Host setup: Linux nj1058 3.10.8-1.el6.elrepo.x86_64 #1 SMP Tue Aug 20 18:48:29 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux CentOS 6 qemu 1.6.0 2x Intel E5-2630 (virtualization extensions turned on, total of 24 cores including hyperthread cores) 24GB memory swap file is enabled, but unused Guest setup: Windows Server 2008R2 (64 bit) 24 vCPUs 16 GB memory VirtIO disk and network drivers installed /qemu16/bin/qemu-system-x86_64 -name VMID100 -S -machine pc-i440fx-1.6,accel=kvm,usb=off -cpu host,hv_relaxed,hv_vapic,hv_spinlocks=0x1000 -m 15259 -smp 24,sockets=1,cores=12,threads=2 -uuid 90301200-8d47-6bb3-0623-bed7c8b1dd7c -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/libvirt111/var/lib/libvirt/qemu/VMID100.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc,driftfix=slew -no-hpet -boot c -usb -drive file=/dev/vmimages/VMID100,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=18,id=hostnet0,vhost=on,vhostfd=19 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:00:2c:6d,bus=pci.0,addr=0x3 -vnc 127.0.0.1:100 -k en-us -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 The beginning of `perf top` output: Samples: 62M of event 'cycles', Event count (approx.): 642019289177 64.69% [kernel][k] _raw_spin_lock 2.59% qemu-system-x86_64 [.] 0x001e688d 1.90% [kernel][k] native_write_msr_safe 0.84% [kvm] [k] vcpu_enter_guest 0.80% [kernel][k] __schedule 0.77% [kvm_intel] [k] vmx_vcpu_run 0.68% [kernel][k] effective_load 0.65% [kernel][k] update_cfs_shares 0.62% [kernel][k] _raw_spin_lock_irq 0.61% [kernel][k] native_read_msr_safe 0.56% [kernel][k] enqueue_entity I've captured 20,000 lines of kvm trace output. This can be found https://gist.github.com/devicenull/fa8f49d4366060029ee4/raw/fb89720d34b43920be22e3e9a1d88962bf305da8/trace So far, I've tried the following with very little effect: * Disable HPET on the guest * Enable hv_relaxed, hv_vapic, hv_spinlocks * Enable SR-IOV * Pin vCPUs to physical CPUs * Forcing x2apic enabled in the guest (bcdedit /set x2apicpolicy yes) * bcdedit /set useplatformclock yes and no Any suggestions as to what I can do to get better performance out of ths guest? Or reasons why I'm seeing such high kernel cpu usage with it? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I've done some additional research on this, and I believe that 'kvm_pio: pio_read at 0xb008 size 4 count 1' is related to windows trying to read the pm timer. This timer appears to use the TSC in some cases (I think). I found this patchset: http://www.spinics.net/lists/kvm/msg91214.html which doesn't appear to be applied yet. Does it seem reasonable that this patchset would eliminate the need for windows to read from the pm timer continuously? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is fallback vhost_net to qemu for live migrate available?
Hi all I am participating in a project which try to port vhost_net on Xen。 By change the memory copy and notify mechanism ,currently virtio-net with vhost_net could run on Xen with good performance。TCP receive throughput of single vnic from 2.77Gbps up to 6Gps。In VM receive side,I instead grant_copy with grant_map + memcopy,it efficiently reduce the cost of grant_table spin_lock of dom0,So the hole server TCP performance from 5.33Gps up to 9.5Gps。 Now I am consider the live migrate of vhost_net on Xen,vhost_net use vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the hole memory of VM,So I am trying to fallback datapath from vhost_net to qemu when doing live migrate ,and fallback datapath from qemu to vhost_net again after vm migrate to new server。 My question is: why didn't vhost_net do the same fallback operation for live migrate on KVM,but use vhost_log to mark the dirty page? Is there any mechanism fault for the idea of fallback datapath from vhost_net to qemu for live migrate? any question about the detail of vhost_net on Xen is welcome。 Thanks -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Investment.
Dear Sir/Madam, Please note that my client would like to invest in your country and if you can assist us to invest in a profitable areas that would yield profits kindly get back to me for a detailed information on how to proceed with this project. If you are really interested to assist do reply through this email peter.ko...@gmail.com as soon as you receive this email for more details. Thank you and waiting for your response. Regards, Mr Peter Komo. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support
On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote: Gleb, any chance you can put this (and the next one) into a tree to lock in the numbers ? Applied it. Sorry for slow response, was on vocation and still go through the email backlog. Thanks. Since it's not in a topic branch that I can pull, I'm going to just cherry-pick them. However, they are in your queue branch, not next branch. Should I still assume this is a stable branch and that the numbers aren't going to change ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support
On Tue, 2013-08-27 at 14:19 +1000, Benjamin Herrenschmidt wrote: On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote: Gleb, any chance you can put this (and the next one) into a tree to lock in the numbers ? Applied it. Sorry for slow response, was on vocation and still go through the email backlog. Thanks. Since it's not in a topic branch that I can pull, I'm going to just cherry-pick them. However, they are in your queue branch, not next branch. Should I still assume this is a stable branch and that the numbers aren't going to change ? Oh and Alexey mentions that there are two capabilities and you only applied one :-) Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is fallback vhost_net to qemu for live migrate available?
On Tue, Aug 27, 2013 at 11:32:31AM +0800, Qin Chuanyu wrote: Hi all I am participating in a project which try to port vhost_net on Xen。 By change the memory copy and notify mechanism ,currently virtio-net with vhost_net could run on Xen with good performance。TCP receive throughput of single vnic from 2.77Gbps up to 6Gps。In VM receive side,I instead grant_copy with grant_map + memcopy,it efficiently reduce the cost of grant_table spin_lock of dom0,So the hole server TCP performance from 5.33Gps up to 9.5Gps。 Now I am consider the live migrate of vhost_net on Xen,vhost_net use vhost_log for live migrate on Kvm,but qemu on Xen havn't manage the hole memory of VM,So I am trying to fallback datapath from vhost_net to qemu when doing live migrate ,and fallback datapath from qemu to vhost_net again after vm migrate to new server。 My question is: why didn't vhost_net do the same fallback operation for live migrate on KVM,but use vhost_log to mark the dirty page? Is there any mechanism fault for the idea of fallback datapath from vhost_net to qemu for live migrate? any question about the detail of vhost_net on Xen is welcome。 Thanks It should work, in practice. However, one issue with this approach that I see is that you are running two instances of virtio-net on the host: qemu and vhost-net, doubling your security surface for guest to host attack. I don't exactly see why does it matter that qemu doesn't manage the whole memory of a VM - vhost only needs to log memory writes that it performs. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Il 25/08/2013 17:04, Alexander Graf ha scritto: On 24.08.2013, at 21:14, Yann Droneaud wrote: KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Does it make sense to use non-O_CLOEXEC file descriptors with KVM at all? Besides fork() not being supported by KVM, as described in Documentation/virtual/kvm/api.txt, the VMAs of the parent process go away as soon as you exec(). I'm not sure how you can use the inherited file descriptor in a sensible way after exec(). Paolo -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Le 26.08.2013 09:39, Paolo Bonzini a écrit : Il 25/08/2013 17:04, Alexander Graf ha scritto: On 24.08.2013, at 21:14, Yann Droneaud wrote: This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud ydrone...@opteya.com Link: http://lkml.kernel.org/r/cover.1377372576.git.ydrone...@opteya.com Reviewed-by: Alexander Graf ag...@suse.de Would it make sense to simply inherit the O_CLOEXEC flag from the parent kvm fd instead? That would give user space the power to keep fds across exec() if it wants to. Does it make sense to use non-O_CLOEXEC file descriptors with KVM at all? Besides fork() not being supported by KVM, as described in Documentation/virtual/kvm/api.txt, the VMAs of the parent process go away as soon as you exec(). I'm not sure how you can use the inherited file descriptor in a sensible way after exec(). Sounds a lot like InfiniBand subsystem behavor: IB file descriptors are of no use accross exec() since memory mappings tied to those fds won't be available in the new process: https://lkml.org/lkml/2013/7/8/380 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag
Il 26/08/2013 10:23, Yann Droneaud ha scritto: Sounds a lot like InfiniBand subsystem behavor: IB file descriptors are of no use accross exec() since memory mappings tied to those fds won't be available in the new process: https://lkml.org/lkml/2013/7/8/380 http://mid.gmane.org/f58540dc64fec1ac0e496dfcd3cc1...@meuh.org Yes, it is very similar. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] kvm: use anon_inode_getfd() with O_CLOEXEC flag
On Sat, Aug 24, 2013 at 10:14:06PM +0200, Yann Droneaud wrote: Hi, Following a patchset asking to change calls to get_unused_flag() [1] to use O_CLOEXEC, Alex Williamson [2][3] decided to change VFIO to use the flag. Since it's a related subsystem to KVM, using O_CLOEXEC for file descriptors created by KVM might be applicable too. I'm suggesting to change calls to anon_inode_getfd() to use O_CLOEXEC as default flag. This patchset should be reviewed to not break existing userspace program. BTW, if it's not applicable, I would suggest that new ioctls be added to KVM subsystem, those ioctls would have a flag field added to their arguments. Such flag would let userspace choose the open flag to use. See for example other APIs using anon_inode_getfd() such as fanotify, inotify, signalfd and timerfd. You might be interested to read: - Secure File Descriptor Handling (Ulrich Drepper, 2008) http://udrepper.livejournal.com/20407.html - Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) http://danwalsh.livejournal.com/53603.html Applied, thanks. Regards. [1] http://lkml.kernel.org/r/cover.1376327678.git.ydrone...@opteya.com [2] http://lkml.kernel.org/r/1377186804.25163.17.ca...@ul30vt.home [3] http://lkml.kernel.org/r/20130822171744.1297.13711.st...@bling.home Yann Droneaud (2): kvm: use anon_inode_getfd() with O_CLOEXEC flag ppc: kvm: use anon_inode_getfd() with O_CLOEXEC flag arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_vio.c| 2 +- arch/powerpc/kvm/book3s_hv.c| 2 +- virt/kvm/kvm_main.c | 6 +++--- 4 files changed, 6 insertions(+), 6 deletions(-) -- 1.8.3.1 -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly
On 26.08.2013, at 05:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Isn't this you? Yes. The patches are generated using git format-patch and sent by git send-email. That's how it always created patches for me. I am not sure if there is a config I can change to avoid having From: We should be able to copy upto count bytes Why? Without this we end up doing +struct kvm_get_htab_buf { +struct kvm_get_htab_header header; +/* + * Older kernel required one extra byte. + */ +unsigned long hpte[3]; +} hpte_buf; even though we are only looking for one hpte entry. Ok, please give me an example with real numbers and why it breaks. http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com Didn't quiet get what you are looking for. As explained before, we now need to pass an array with array size 3 even though we know we need to read only 2 entries because kernel doesn't loop correctly. But we need to do that regardless, because newer QEMU needs to be able to run on older kernels, no? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: PPC: Book3S HV: Save/restore SIAR and SDAR along with other PMU registers
Currently we are not saving and restoring the SIAR and SDAR registers in the PMU (performance monitor unit) on guest entry and exit. The result is that performance monitoring tools in the guest could get false information about where a program was executing and what data it was accessing at the time of a performance monitor interrupt. This fixes it by saving and restoring these registers along with the other PMU registers on guest entry/exit. This also provides a way for userspace to access these values for a vcpu via the one_reg interface. Signed-off-by: Paul Mackerras pau...@samba.org --- arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/kernel/asm-offsets.c | 2 ++ arch/powerpc/kvm/book3s_hv.c| 12 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 4 files changed, 28 insertions(+) diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 3328353..91b833d 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -498,6 +498,8 @@ struct kvm_vcpu_arch { u64 mmcr[3]; u32 pmc[8]; + u64 siar; + u64 sdar; #ifdef CONFIG_KVM_EXIT_TIMING struct mutex exit_timing_lock; diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index a67c76e..822b6ba 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -506,6 +506,8 @@ int main(void) DEFINE(VCPU_PRODDED, offsetof(struct kvm_vcpu, arch.prodded)); DEFINE(VCPU_MMCR, offsetof(struct kvm_vcpu, arch.mmcr)); DEFINE(VCPU_PMC, offsetof(struct kvm_vcpu, arch.pmc)); + DEFINE(VCPU_SIAR, offsetof(struct kvm_vcpu, arch.siar)); + DEFINE(VCPU_SDAR, offsetof(struct kvm_vcpu, arch.sdar)); DEFINE(VCPU_SLB, offsetof(struct kvm_vcpu, arch.slb)); DEFINE(VCPU_SLB_MAX, offsetof(struct kvm_vcpu, arch.slb_max)); DEFINE(VCPU_SLB_NR, offsetof(struct kvm_vcpu, arch.slb_nr)); diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 2b95c45..9df824f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -771,6 +771,12 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) } break; #endif /* CONFIG_VSX */ + case KVM_REG_PPC_SIAR: + *val = get_reg_val(id, vcpu-arch.siar); + break; + case KVM_REG_PPC_SDAR: + *val = get_reg_val(id, vcpu-arch.sdar); + break; case KVM_REG_PPC_VPA_ADDR: spin_lock(vcpu-arch.vpa_update_lock); *val = get_reg_val(id, vcpu-arch.vpa.next_gpa); @@ -855,6 +861,12 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, union kvmppc_one_reg *val) } break; #endif /* CONFIG_VSX */ + case KVM_REG_PPC_SIAR: + vcpu-arch.siar = set_reg_val(id, *val); + break; + case KVM_REG_PPC_SDAR: + vcpu-arch.sdar = set_reg_val(id, *val); + break; case KVM_REG_PPC_VPA_ADDR: addr = set_reg_val(id, *val); r = -EINVAL; diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S index 60dce5b..2e1dd6c 100644 --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S @@ -198,6 +198,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201) ld r6, VCPU_MMCR + 16(r4) mtspr SPRN_MMCR1, r5 mtspr SPRN_MMCRA, r6 +BEGIN_FTR_SECTION + ld r7, VCPU_SIAR(r4) + ld r8, VCPU_SDAR(r4) + mtspr SPRN_SIAR, r7 + mtspr SPRN_SDAR, r8 +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) mtspr SPRN_MMCR0, r3 isync @@ -1125,6 +1131,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) std r4, VCPU_MMCR(r9) std r5, VCPU_MMCR + 8(r9) std r6, VCPU_MMCR + 16(r9) +BEGIN_FTR_SECTION + mfspr r7, SPRN_SIAR + mfspr r8, SPRN_SDAR + std r7, VCPU_SIAR(r9) + std r8, VCPU_SDAR(r9) +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206) mfspr r3, SPRN_PMC1 mfspr r4, SPRN_PMC2 mfspr r5, SPRN_PMC3 -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: PPC: Book3S HV: Add one_reg definitions for more PMU registers
This adds one_reg register numbers for two performance monitor registers that exist on POWER7 and later processors (SIAR and SDAR) and three that will be introduced on POWER8 (MMCR2, MMCRS and SIER). Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt | 5 + arch/powerpc/include/uapi/asm/kvm.h | 5 + 2 files changed, 10 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 66dd2aa..8b4d984 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1765,6 +1765,11 @@ registers, find a list below: PPC | KVM_REG_PPC_MMCR0 | 64 PPC | KVM_REG_PPC_MMCR1 | 64 PPC | KVM_REG_PPC_MMCRA | 64 + PPC | KVM_REG_PPC_MMCR2 | 64 + PPC | KVM_REG_PPC_MMCRS | 64 + PPC | KVM_REG_PPC_SIAR | 64 + PPC | KVM_REG_PPC_SDAR | 64 + PPC | KVM_REG_PPC_SIER | 64 PPC | KVM_REG_PPC_PMC1 | 32 PPC | KVM_REG_PPC_PMC2 | 32 PPC | KVM_REG_PPC_PMC3 | 32 diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 0fb1a6e..fb0a8a9 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -429,6 +429,11 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_MMCR0 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x10) #define KVM_REG_PPC_MMCR1 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x11) #define KVM_REG_PPC_MMCRA (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x12) +#define KVM_REG_PPC_MMCR2 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x13) +#define KVM_REG_PPC_MMCRS (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x14) +#define KVM_REG_PPC_SIAR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x15) +#define KVM_REG_PPC_SDAR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x16) +#define KVM_REG_PPC_SIER (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x17) #define KVM_REG_PPC_PMC1 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x18) #define KVM_REG_PPC_PMC2 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x19) -- 1.8.4.rc3 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: PPC: Book3S HV: Implement timebase offset for guests
This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. Therefore this provides a new per-vcpu value accessed via the one_reg interface using the new KVM_REG_PPC_TB_OFFSET identifier. This value defaults to 0 and is not modified by KVM. On entering the guest, this value is added onto the timebase, and on exiting the guest, it is subtracted from the timebase. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. Writing to the TBU40 register only alters the upper 40 bits of the timebase, leaving the lower 24 bits unchanged. This provides a way to modify the timebase for guest migration without disturbing the synchronization of the timebase registers across CPU cores. This means that userspace must supply a value for the offset that has zeroes in the lower 24 bits. If the lower 24 bits are non-zero, they are ignored and taken as zeroes. Timebase values stored in KVM structures (struct kvm_vcpu, struct kvmppc_vcore, etc.) are stored as host timebase values. The timebase values in the dispatch trace log need to be guest timebase values, however, since that is read directly by the guest. This moves the setting of vcpu-arch.dec_expires on guest exit to a point after we have restored the host timebase so that vcpu-arch.dec_expires is a host timebase value. Signed-off-by: Paul Mackerras pau...@samba.org --- Documentation/virtual/kvm/api.txt | 1 + arch/powerpc/include/asm/kvm_host.h | 2 ++ arch/powerpc/include/asm/reg.h | 1 + arch/powerpc/include/uapi/asm/kvm.h | 3 ++ arch/powerpc/kernel/asm-offsets.c | 1 + arch/powerpc/kvm/book3s_hv.c| 8 +- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 50 +++-- 7 files changed, 56 insertions(+), 10 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 8b4d984..88f4653 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1815,6 +1815,7 @@ registers, find a list below: PPC | KVM_REG_PPC_TLB3PS | 32 PPC | KVM_REG_PPC_EPTCFG | 32 PPC | KVM_REG_PPC_ICP_STATE | 64 + PPC | KVM_REG_PPC_TB_OFFSET| 64 ARM registers are mapped using the lower 32 bits. The upper 16 of that is the register group type, or coprocessor number: diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 91b833d..702d88b 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -607,6 +607,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + u64 tb_offset; /* guest timebase - host timebase */ #endif }; diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index 4a9e408..72f8798 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -243,6 +243,7 @@ #define SPRN_TBRU 0x10D /* Time Base Read Upper Register (user, R/O) */ #define SPRN_TBWL 0x11C /* Time Base Lower Register (super, R/W) */ #define SPRN_TBWU 0x11D /* Time Base Upper Register (super, R/W) */ +#define SPRN_TBU40 0x11E /* Timebase upper 40 bits (hyper, R/W) */ #define SPRN_SPURR 0x134 /* Scaled PURR */ #define SPRN_HSPRG00x130 /* Hypervisor Scratch 0 */ #define SPRN_HSPRG10x131 /* Hypervisor Scratch 1 */ diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index fb0a8a9..9935321 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -504,6 +504,9 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_TLB3PS (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9a) #define KVM_REG_PPC_EPTCFG (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x9b) +/* Timebase offset */ +#define KVM_REG_PPC_TB_OFFSET (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x9c) + /* PPC64 eXternal Interrupt Controller Specification */ #define KVM_DEV_XICS_GRP_SOURCES 1 /* 64-bit source attributes */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 822b6ba..62acafd 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -488,6 +488,7 @@ int main(void) DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar)); DEFINE(VCPU_VPA, offsetof(struct kvm_vcpu, arch.vpa.pinned_addr)); DEFINE(VCPU_VPA_DIRTY, offsetof(struct kvm_vcpu, arch.vpa.dirty)); + DEFINE(VCPU_TB_OFFSET, offsetof(struct kvm_vcpu, arch.tb_offset)); #endif #ifdef CONFIG_PPC_BOOK3S DEFINE(VCPU_VCPUID, offsetof(struct kvm_vcpu, vcpu_id));
[PATCH 0/3] Some fixes for PPC HV-style KVM
Here are 3 patches that add two PMU (performance monitor unit) registers to the set being context-switched on guest entry and exit, and implement a per-guest timebase offset that is needed when we migrate a guest from one host to another that has a different timebase origin. The first patch just adds some one_reg register definitions for extra PMU registers, including some that exist on POWER8. These new registers aren't yet handled by the kernel code, but their definitions are included here so as to reserve the numbers. These patches are against Alex Graf's kvm-ppc-queue branch. Paul. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc/kvm: Handle the boundary condition correctly
Alexander Graf ag...@suse.de writes: On 26.08.2013, at 05:28, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 23.08.2013, at 04:31, Aneesh Kumar K.V wrote: Alexander Graf ag...@suse.de writes: On 22.08.2013, at 12:37, Aneesh Kumar K.V wrote: From: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com Isn't this you? Yes. The patches are generated using git format-patch and sent by git send-email. That's how it always created patches for me. I am not sure if there is a config I can change to avoid having From: We should be able to copy upto count bytes Why? Without this we end up doing +struct kvm_get_htab_buf { +struct kvm_get_htab_header header; +/* + * Older kernel required one extra byte. + */ +unsigned long hpte[3]; +} hpte_buf; even though we are only looking for one hpte entry. Ok, please give me an example with real numbers and why it breaks. http://mid.gmane.org/1376995766-16526-4-git-send-email-aneesh.ku...@linux.vnet.ibm.com Didn't quiet get what you are looking for. As explained before, we now need to pass an array with array size 3 even though we know we need to read only 2 entries because kernel doesn't loop correctly. But we need to do that regardless, because newer QEMU needs to be able to run on older kernels, no? yes. So use space will have to pass an array of size 3. But that should not prevent us from fixing this right ? -aneesh -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support
On Wed, Aug 14, 2013 at 10:51:14AM +1000, Benjamin Herrenschmidt wrote: On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote: This is to reserve a capablity number for upcoming support of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls which support mulptiple DMA map/unmap operations per one call. Gleb, any chance you can put this (and the next one) into a tree to lock in the numbers ? Applied it. Sorry for slow response, was on vocation and still go through the email backlog. I've been wanting to apply the whole series to powerpc-next, that's stuff has been simmering for way too long and is in a good enough shape imho, but I need the capabilities and ioctl numbers locked in your tree first. Cheers, Ben. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- Changes: 2013/07/16: * changed the number Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- include/uapi/linux/kvm.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index acccd08..99c2533 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_RTAS 91 #define KVM_CAP_IRQ_XICS 92 #define KVM_CAP_ARM_EL1_32BIT 93 +#define KVM_CAP_SPAPR_MULTITCE 94 #ifdef KVM_CAP_IRQ_ROUTING -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support
On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote: Gleb, any chance you can put this (and the next one) into a tree to lock in the numbers ? Applied it. Sorry for slow response, was on vocation and still go through the email backlog. Thanks. Since it's not in a topic branch that I can pull, I'm going to just cherry-pick them. However, they are in your queue branch, not next branch. Should I still assume this is a stable branch and that the numbers aren't going to change ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html