Re: Nested paging in nested SVM setup
Hi all, Please excuse me for bringing alive a two-month old thread, but I had time to investigate the issue a bit only recently. On 18.06.2014 18:47, Jan Kiszka wrote: On 2014-06-18 13:36, Valentine Sinitsyn wrote: If we want to provide useful nested SVM support, this must be feasible. If there is a bug, it has to be fixed. Looks like it is a bug in KVM. I had a chance to run the same code bare-metal ([1], line 310 is uncommented for bare-metal case but present for nested SVM), and it seems to work as expected. However, When I trace it in nested SVM setup, after some successful APIC reads and writes, I get the following: qemu-system-x86-1968 [001] 220417.681261: kvm_nested_vmexit:rip: 0x8104f5b8 reason: npf ext_inf1: 0x0001000f ext_inf2: 0xfee00300 ext_int: 0x ext_int_err: 0x qemu-system-x86-1968 [001] 220417.681261: kvm_page_fault: address fee00300 error_code f qemu-system-x86-1968 [001] 220417.681263: kvm_emulate_insn: 0:8104f5b8:89 04 25 00 93 5f ff (prot64) qemu-system-x86-1968 [001] 220417.681268: kvm_inj_exception: (0x23c) qemu-system-x86-1968 [001] 220417.681269: kvm_entry:vcpu 0 qemu-system-x86-1968 [001] 220417.681271: kvm_exit: reason rip 0x8104f5b8 info 0 0 You can see the problem here: the code tries to access APIC MMIO register, which is trapped by KVM's MMU code (at nested page table walk). During MMIO access emulation, KVM decides to inject 0x23c exception (which looks wrong, as there is no exception with this number defined). After that things become flawed (pay attention to empty reason in the last line; the VMCB is certainly not in the state KVM expects/supports). I'm no KVM expert, and will be grateful for debugging suggestions (or maybe even assistance). Many thanks for the help. 1. https://github.com/vsinitsyn/jailhouse/blob/amd-v/hypervisor/arch/x86/svm.c#L301 -- Regards, Valentine Sinitsyn -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: vmx: fix ept reserved bits for 1-GByte page
Il 20/08/2014 05:17, Wanpeng Li ha scritto: + else if (spte (1ULL 7)) You have to check level == 1 specifically here, or add... + /* + * 1GB/2MB page, bits 29:12 or 20:12 reserved respectively, + * level == 1 if the hypervisor is using the ignored bit 7. + */ + mask |= (PAGE_SIZE ((level - 1) * 9)) - PAGE_SIZE; + else ... if (level 1) here. Otherwise, you're marking bits 6:3 as reserved for 4K pages. This should cause a WARN, because KVM puts 0110 in those bits: ret = (MTRR_TYPE_WRBACK VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; (in vmx_get_mt_mask: writeback memory, ignore PAT memory type from the guest's page tables) How are you testing this patch? Paolo + /* bits 6:3 reserved */ + mask |= 0x78; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Il 20/08/2014 08:46, Valentine Sinitsyn ha scritto: You can see the problem here: the code tries to access APIC MMIO register, which is trapped by KVM's MMU code (at nested page table walk). During MMIO access emulation, KVM decides to inject 0x23c exception (which looks wrong, as there is no exception with this number defined). After that things become flawed (pay attention to empty reason in the last line; the VMCB is certainly not in the state KVM expects/supports). I'm no KVM expert, and will be grateful for debugging suggestions (or maybe even assistance). Is the 0x23c always the same? Can you try this patch? diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 204422de3fed..194e9300a31b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -346,6 +346,7 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, kvm_make_request(KVM_REQ_EVENT, vcpu); + WARN_ON(nr 0x1f); if (!vcpu-arch.exception.pending) { queue: vcpu-arch.exception.pending = true; Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: nVMX: nested TPR shadow/threshold emulation
Hi Paolo, On Tue, Aug 19, 2014 at 10:34:20AM +0200, Paolo Bonzini wrote: Il 19/08/2014 10:30, Wanpeng Li ha scritto: +if (vmx-nested.virtual_apic_page) +nested_release_page(vmx-nested.virtual_apic_page); +vmx-nested.virtual_apic_page = + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); +if (!vmx-nested.virtual_apic_page) +exec_control = +~CPU_BASED_TPR_SHADOW; +else +vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, +page_to_phys(vmx-nested.virtual_apic_page)); + +/* + * If CR8 load exits are enabled, CR8 store exits are enabled, + * and virtualize APIC access is disabled, the processor would + * never notice. Doing it unconditionally is not correct, but + * it is the simplest thing. + */ +if (!(exec_control CPU_BASED_TPR_SHADOW) +!((exec_control CPU_BASED_CR8_LOAD_EXITING) +(exec_control CPU_BASED_CR8_STORE_EXITING))) +nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); + You aren't checking virtualize APIC access here, but the comment mentions it. As the comment says, failing the entry unconditionally could be the simplest thing, which means moving the nested_vmx_failValid call inside the if (!vmx-nested.virtual_apic_page). If you want to check all of CR8_LOAD/CR8_STORE/VIRTUALIZE_APIC_ACCESS, please mention in the comment that failing the vm entry is _not_ what the processor does but it's basically the only possibility we have. In that case, I would also place the if within the if (!vmx-nested.virtual_apic_page): it also simplifies the condition because you don't have to check CPU_BASED_TPR_SHADOW anymore. You can send v5 with these changes, and I'll apply it for 3.18. Thanks! Do you mean this? + /* +* Failing the vm entry is _not_ what the processor does +* but it's basically the only possibility we have. +*/ + if (!vmx-nested.virtual_apic_page) + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); Regards, Wanpeng Li Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] KVM: track pid for VCPU only on KVM_RUN ioctl
On 20/08/14 01:22, Wanpeng Li wrote: On Tue, Aug 19, 2014 at 04:04:03PM +0200, Christian Borntraeger wrote: On 18/08/14 07:02, Wanpeng Li wrote: Hi Christian, On Tue, Aug 05, 2014 at 04:44:14PM +0200, Christian Borntraeger wrote: We currently track the pid of the task that runs the VCPU in vcpu_load. Since we call vcpu_load for all kind of ioctls on a CPU, this causes hickups due to synchronize_rcu if one CPU is modified by another CPU or the main thread (e.g. initialization, reset). We track the pid only for the purpose of yielding, so let's update the pid only in the KVM_RUN ioctl. In addition, don't do a synchronize_rcu on startup (pid == 0). This speeds up guest boot time on s390 noticably for some configs, e.g. HZ=100, no full state tracking, 64 guest cpus 32 host cpus. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com CC: Rik van Riel r...@redhat.com CC: Raghavendra K T raghavendra...@linux.vnet.ibm.com CC: Michael Mueller m...@linux.vnet.ibm.com --- virt/kvm/kvm_main.c | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9ae9135..ebc8f54 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -124,14 +124,6 @@ int vcpu_load(struct kvm_vcpu *vcpu) if (mutex_lock_killable(vcpu-mutex)) return -EINTR; One question: - if (unlikely(vcpu-pid != current-pids[PIDTYPE_PID].pid)) { When vcpu-pid and current-pids[PIDTYPE_PID].pid will be different? If two different thread call an ioctl on a vcpu fd. (It must be an ioctl that has done vcpu_load - almost all except for some interrupt injections) Thanks for your explanation. When can this happen? In general, by using clone and do an ioctl in the new thread on a pre-existing fd. In qemu, e.g. by using an kvm_ioctl on a vcpu from main thread or another cpu. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers
Il 19/08/2014 22:35, Radim Krčmář ha scritto: They were almost identical and thus merged with a loathable macro. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- This solution is hopefully more acceptable than function pointers. I think a little amount of duplication is not a problem. Paolo arch/x86/kvm/vmx.c | 53 +++-- 1 file changed, 19 insertions(+), 34 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a236a9f..c6cfb71 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5694,42 +5694,27 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) out: return ret; } - -static void grow_ple_window(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - int old = vmx-ple_window; - int new; - - if (ple_window_grow 1) - new = ple_window; - else if (ple_window_grow ple_window) - new = old * ple_window_grow; - else - new = old + ple_window_grow; - - vmx-ple_window = min(new, ple_window_max); - - trace_kvm_ple_window_grow(vcpu-vcpu_id, vmx-ple_window, old); +#define make_ple_window_modifier(type, oplt, opge, cmp, bound) \ +static void type##_ple_window(struct kvm_vcpu *vcpu) \ +{ \ + struct vcpu_vmx *vmx = to_vmx(vcpu); \ + int old = vmx-ple_window; \ + int new; \ +\ + if (ple_window_##type 1) \ + new = ple_window; \ + else if (ple_window_##type ple_window) \ + new = old oplt ple_window_##type; \ + else \ + new = old opge ple_window_##type; \ +\ + vmx-ple_window = cmp(new, bound); \ +\ + trace_kvm_ple_window_##type(vcpu-vcpu_id, vmx-ple_window, old); \ } -static void shrink_ple_window(struct kvm_vcpu *vcpu) -{ - struct vcpu_vmx *vmx = to_vmx(vcpu); - int old = vmx-ple_window; - int new; - - if (ple_window_shrink 1) - new = ple_window; - else if (ple_window_shrink ple_window) - new = old / ple_window_shrink; - else - new = old - ple_window_shrink; - - vmx-ple_window = max(new, ple_window); - - trace_kvm_ple_window_shrink(vcpu-vcpu_id, vmx-ple_window, old); -} +make_ple_window_modifier(grow, *, +, min, ple_window_max) +make_ple_window_modifier(shrink, /, -, max, ple_window) /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu
Il 19/08/2014 22:35, Radim Krčmář ha scritto: Change PLE window into per-vcpu variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- I've been thinking about a general hierarchical per-vcpu variable model, but it's hard to have current performance and sane code. arch/x86/kvm/vmx.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2b306f9..eaa5574 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -484,6 +484,9 @@ struct vcpu_vmx { /* Support for a guest hypervisor (nested VMX) */ struct nested_vmx nested; + + /* Dynamic PLE window. */ + int ple_window; }; enum segment_cache_field { @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) if (ple_gap) { vmcs_write32(PLE_GAP, ple_gap); vmcs_write32(PLE_WINDOW, ple_window); Is this necessary? + vmx-ple_window = ple_window; } vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0); @@ -7387,6 +7391,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vmx-emulation_required) return; + if (ple_gap) + vmcs_write32(PLE_WINDOW, vmx-ple_window); + if (vmx-nested.sync_shadow_vmcs) { copy_vmcs12_to_shadow(vmx); vmx-nested.sync_shadow_vmcs = false; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
Il 19/08/2014 22:35, Radim Krčmář ha scritto: Every increase of ple_window_grow creates potential overflows. They are not serious, because we clamp ple_window and userspace is expected to fix ple_window_max within a second. --- arch/x86/kvm/vmx.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d7f58e8..6873a0b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR); /* Default doubles per-vcpu window every exit. */ static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW; -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR); +static struct kernel_param_ops ple_window_grow_ops; +module_param_cb(ple_window_grow, ple_window_grow_ops, +ple_window_grow, S_IRUGO | S_IWUSR); /* Default resets per-vcpu window every exit to ple_window. */ static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK; @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \ make_ple_window_modifier(grow, *, +) /* grow_ple_window */ make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */ +static void clamp_ple_window_max(void) +{ + int maximum; + + if (ple_window_grow 1) + return; + + if (ple_window_grow ple_window) + maximum = INT_MAX / ple_window_grow; + else + maximum = INT_MAX - ple_window_grow; + + ple_window_max = clamp(ple_window_max, ple_window, maximum); +} I think avoiding overflows is better. In fact, I think you should call this function for ple_window_max too. You could keep the ple_window_max variable to the user-set value. Whenever ple_window_grow or ple_window_max are changed, you can set an internal variable (let's call it ple_window_actual_max, but I'm not wed to this name) to the computed value, and then do: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow; else new = max(ple_window_actual_max, old) + ple_window_grow; (I think the || in the first if can be eliminated with some creativity in clamp_ple_window_max). Paolo +static int set_ple_window_grow(const char *arg, const struct kernel_param *kp) +{ + int ret; + + clamp_ple_window_max(); + ret = param_set_int(arg, kp); + + return ret; +} + +static struct kernel_param_ops ple_window_grow_ops = { + .set = set_ple_window_grow, + .get = param_get_int, +}; + /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE * exiting, so only get here on cpu with PAUSE-Loop-Exiting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
Il 20/08/2014 09:16, Paolo Bonzini ha scritto: Il 19/08/2014 22:35, Radim Krčmář ha scritto: Every increase of ple_window_grow creates potential overflows. They are not serious, because we clamp ple_window and userspace is expected to fix ple_window_max within a second. --- arch/x86/kvm/vmx.c | 34 +- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d7f58e8..6873a0b 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -138,7 +138,9 @@ module_param(ple_window, int, S_IRUGO | S_IWUSR); /* Default doubles per-vcpu window every exit. */ static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW; -module_param(ple_window_grow, int, S_IRUGO | S_IWUSR); +static struct kernel_param_ops ple_window_grow_ops; +module_param_cb(ple_window_grow, ple_window_grow_ops, +ple_window_grow, S_IRUGO | S_IWUSR); /* Default resets per-vcpu window every exit to ple_window. */ static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK; @@ -5717,6 +5719,36 @@ static void type##_ple_window(struct kvm_vcpu *vcpu) \ make_ple_window_modifier(grow, *, +) /* grow_ple_window */ make_ple_window_modifier(shrink, /, -) /* shrink_ple_window */ +static void clamp_ple_window_max(void) +{ +int maximum; + +if (ple_window_grow 1) +return; + +if (ple_window_grow ple_window) +maximum = INT_MAX / ple_window_grow; +else +maximum = INT_MAX - ple_window_grow; + +ple_window_max = clamp(ple_window_max, ple_window, maximum); +} I think avoiding overflows is better. In fact, I think you should call this function for ple_window_max too. You could keep the ple_window_max variable to the user-set value. Whenever ple_window_grow or ple_window_max are changed, you can set an internal variable (let's call it ple_window_actual_max, but I'm not wed to this name) to the computed value, and then do: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow; else new = max(ple_window_actual_max, old) + ple_window_grow; Ehm, this should of course be min. Paolo (I think the || in the first if can be eliminated with some creativity in clamp_ple_window_max). Paolo +static int set_ple_window_grow(const char *arg, const struct kernel_param *kp) +{ +int ret; + +clamp_ple_window_max(); +ret = param_set_int(arg, kp); + +return ret; +} + +static struct kernel_param_ops ple_window_grow_ops = { +.set = set_ple_window_grow, +.get = param_get_int, +}; + /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE * exiting, so only get here on cpu with PAUSE-Loop-Exiting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] KVM: VMX: clamp PLE window
Il 19/08/2014 22:35, Radim Krčmář ha scritto: Modifications could get unwanted values of PLE window. (low or negative) Use ple_window and the maximal value that cannot overflow as bounds. ple_window_max defaults to a very high value, but it would make sense to set it to some fraction of the scheduler tick. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 66259fd..e1192fb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -144,6 +144,10 @@ module_param(ple_window_grow, int, S_IRUGO); static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK; module_param(ple_window_shrink, int, S_IRUGO); +/* Default is to compute the maximum so we can never overflow. */ +static int ple_window_max = INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW; +module_param(ple_window_max, int, S_IRUGO); + extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 @@ -5704,7 +5708,7 @@ static void grow_ple_window(struct kvm_vcpu *vcpu) else new = old + ple_window_grow; - vmx-ple_window = new; + vmx-ple_window = min(new, ple_window_max); } Please introduce a dynamic overflow-avoiding ple_window_max (like what you have in patch 9) already in patch 4... static void shrink_ple_window(struct kvm_vcpu *vcpu) @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu) else new = old - ple_window_shrink; - vmx-ple_window = new; + vmx-ple_window = max(new, ple_window); ... and also squash this in patch 4. This patch can then introduce the ple_window_max module parameter (using module_param_cb to avoid overflows). Paolo } /* -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page
EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v3 - v4: * don't mask bits 6:3 as reserved for 4K pages v2 - v3: * return 0xf8 for level == 4 * check spte (1ULL 7) if level == 1 * (rsvd_mask 0x38) == 0 for large page or leaf page v1 - v2: * same if statement cover both 2MB and 1GB pages * return 0xf8 for level == 4 * get the level by checking the return value of ept_rsvd_mask --- arch/x86/kvm/vmx.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cad37d5..286c283 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5521,17 +5521,18 @@ static u64 ept_rsvd_mask(u64 spte, int level) for (i = 51; i boot_cpu_data.x86_phys_bits; i--) mask |= (1ULL i); - if (level 2) + if (level == 4) /* bits 7:3 reserved */ mask |= 0xf8; - else if (level == 2) { - if (spte (1ULL 7)) - /* 2MB ref, bits 20:12 reserved */ - mask |= 0x1ff000; - else - /* bits 6:3 reserved */ - mask |= 0x78; - } + else if (spte (1ULL 7)) + /* +* 1GB/2MB page, bits 29:12 or 20:12 reserved respectively, +* level == 1 if the hypervisor is using the ignored bit 7. +*/ + mask |= (PAGE_SIZE ((level - 1) * 9)) - PAGE_SIZE; + else if (level 1) + /* bits 6:3 reserved */ + mask |= 0x78; return mask; } @@ -5561,7 +5562,8 @@ static void ept_misconfig_inspect_spte(struct kvm_vcpu *vcpu, u64 spte, WARN_ON(1); } - if (level == 1 || (level == 2 (spte (1ULL 7 { + /* bits 5:3 are _not_ reserved for large page or leaf page */ + if ((rsvd_bits 0x38) == 0) { u64 ept_mem_type = (spte 0x38) 3; if (ept_mem_type == 2 || ept_mem_type == 3 || -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] KVM: vmx: fix ept reserved bits for 1-GByte page
On Wed, Aug 20, 2014 at 08:51:38AM +0200, Paolo Bonzini wrote: Il 20/08/2014 05:17, Wanpeng Li ha scritto: +else if (spte (1ULL 7)) You have to check level == 1 specifically here, or add... +/* + * 1GB/2MB page, bits 29:12 or 20:12 reserved respectively, + * level == 1 if the hypervisor is using the ignored bit 7. + */ +mask |= (PAGE_SIZE ((level - 1) * 9)) - PAGE_SIZE; +else ... if (level 1) here. Otherwise, you're marking bits 6:3 as reserved for 4K pages. This should cause a WARN, because KVM puts 0110 in those bits: ret = (MTRR_TYPE_WRBACK VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT; (in vmx_get_mt_mask: writeback memory, ignore PAT memory type from the guest's page tables) Got it. Regards, Wanpeng Li How are you testing this patch? Paolo +/* bits 6:3 reserved */ +mask |= 0x78; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Hi Paolo, On 20.08.2014 12:55, Paolo Bonzini wrote: Is the 0x23c always the same? No, it's just a garbage - I've seen other values as well (0x80 last time). Can you try this patch? Sure. It does print a warning: [ 2176.722098] [ cut here ] [ 2176.722118] WARNING: CPU: 0 PID: 1488 at /home/val/kvm-kmod/x86/x86.c:368 kvm_multiple_exception+0x121/0x130 [kvm]() [ 2176.722121] Modules linked in: kvm_amd(O) kvm(O) amd_freq_sensitivity snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel aesni_intel snd_hda_controller radeon snd_hda_codec ipmi_si aes_x86_64 ipmi_msghandler snd_hwdep ttm r8169 ppdev mii lrw gf128mul snd_pcm glue_helper drm_kms_helper snd_timer fam15h_power evdev drm shpchp snd ablk_helper cryptd microcode mac_hid soundcore serio_raw pcspkr i2c_algo_bit k10temp i2c_piix4 i2c_core parport_pc parport hwmon edac_core tpm_tis edac_mce_amd tpm video button acpi_cpufreq processor ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common atkbd libps2 ahci libahci ohci_pci ohci_hcd ehci_pci xhci_hcd libata ehci_hcd usbcore scsi_mod usb_common i8042 serio [last unloaded: kvm] [ 2176.722217] CPU: 0 PID: 1488 Comm: qemu-system-x86 Tainted: G W O 3.16.1-1-ARCH #1 [ 2176.71] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./IMB-A180, BIOS L0.17 05/24/2013 [ 2176.74] 25350f51 8800919fbbc0 8152ae6c [ 2176.79] 8800919fbbf8 8106e45d 880037f68000 [ 2176.722234] 0080 0001 81a4 [ 2176.722239] Call Trace: [ 2176.722250] [8152ae6c] dump_stack+0x4d/0x6f [ 2176.722257] [8106e45d] warn_slowpath_common+0x7d/0xa0 [ 2176.722262] [8106e58a] warn_slowpath_null+0x1a/0x20 [ 2176.722275] [a0651e41] kvm_multiple_exception+0x121/0x130 [kvm] [ 2176.722288] [a06594f8] x86_emulate_instruction+0x548/0x640 [kvm] [ 2176.722303] [a06653e1] kvm_mmu_page_fault+0x91/0xf0 [kvm] [ 2176.722310] [a04eb6a7] pf_interception+0xd7/0x180 [kvm_amd] [ 2176.722317] [8104e876] ? native_apic_mem_write+0x6/0x10 [ 2176.722323] [a04ef261] handle_exit+0x141/0x9d0 [kvm_amd] [ 2176.722335] [a065512c] ? kvm_set_cr8+0x1c/0x20 [kvm] [ 2176.722341] [a04ea3e0] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd] [ 2176.722355] [a065adc7] kvm_arch_vcpu_ioctl_run+0x597/0x1210 [kvm] [ 2176.722368] [a065705b] ? kvm_arch_vcpu_load+0xbb/0x200 [kvm] [ 2176.722378] [a064a152] kvm_vcpu_ioctl+0x2b2/0x5c0 [kvm] [ 2176.722384] [810b66b4] ? __wake_up+0x44/0x50 [ 2176.722390] [81200dcc] ? fsnotify+0x28c/0x370 [ 2176.722397] [811d4a70] do_vfs_ioctl+0x2d0/0x4b0 [ 2176.722403] [811df18e] ? __fget+0x6e/0xb0 [ 2176.722408] [811d4cd1] SyS_ioctl+0x81/0xa0 [ 2176.722414] [81530be9] system_call_fastpath+0x16/0x1b [ 2176.722418] ---[ end trace b0f81744c5a5ea4a ]--- Thanks, Valentine -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: nVMX: nested TPR shadow/threshold emulation
Il 20/08/2014 08:59, Wanpeng Li ha scritto: + /* + * Failing the vm entry is _not_ what the processor does + * but it's basically the only possibility we have. * We could still enter the guest if CR8 load exits are * enabled, CR8 store exits are enabled, and virtualize APIC * access is disabled; in this case the processor would never * use the TPR shadow and we could simply clear the bit from * the execution control. But such a configuration is useless, * so let's keep the code simple. + */ + if (!vmx-nested.virtual_apic_page) + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); I thought so, but I'm afraid it's too late to do nested_vmx_failValid here. Without a test case, I'd be more confident if you moved the nested_release_page/nested_get_page to a separate function, that nested_vmx_run calls before enter_guest_mode. The same function can map apic_access_page too, for cleanliness. Something like this: if (cpu_has_secondary_exec_ctrls() nested_cpu_has(vmcs12, CPU_BASED_ACTIVATE_SECONDARY_CONTROLS) (vmcs12-secondary_vm_exec_control SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { if (vmx-nested.apic_access_page) /* shouldn't happen */ nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = nested_get_page(vcpu, vmcs12-apic_access_addr); } if (...) { /* do the same for virtual_apic_page if CPU_BASED_TPR_SHADOW is set... */ /* * Failing the vm entry is _not_ what the processor does * but it's basically the only possibility we have. * We could still enter the guest if CR8 load exits are * enabled, CR8 store exits are enabled, and virtualize APIC * access is disabled; in this case the processor would never * use the TPR shadow and we could simply clear the bit from * the execution control. But such a configuration is useless, * so let's keep the code simple. */ if (!vmx-nested.virtual_apic_page) return -EFAULT; } return 0; ... Then nested_vmx_run can do the nested_vmx_failValid if the function returns an error. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] KVM: add kvm_arch_sched_in
On 19/08/14 22:35, Radim Krčmář wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) if (vcpu-preempted) vcpu-preempted = false; + kvm_arch_sched_in(vcpu, cpu); + kvm_arch_vcpu_load(vcpu, cpu); } Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and is architecture specific. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 82761] DMAR:[fault reason 06] PTE Read access is not set
https://bugzilla.kernel.org/show_bug.cgi?id=82761 --- Comment #8 from Ansa89 ansalonistef...@gmail.com --- (In reply to Alex Williamson from comment #6) Are these 3 separate NICs plugged into PCI slots on the motherboard or is this a single triple-port card with embedded PCIe-to-PCI bridge? They are 3 separate NICs plugged into 3 separate PCI slots. You might be able to run the IOMMU in passthrough mode with iommu=pt r8169.use_dac=1, but note the warning in modinfo use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. Unfortunately if you don't enable use_dac, then intel_iommu will ignore the passthrough option for these devices. I tried using intel_iommu=pt, but it didn't work (resulted in vt-d disabled). However with intel_iommu=on iommu=pt the errors remain (probably because I didn't add r8169.use_dac=1). I'm on a 64 bit system, but I think it has nothing to with 32 bit PCI slot. Also note that this problem has nothing to do with Virtualization/KVM. Drivers/Network or perhaps Drivers/PCI would be a more appropriate classification. I searched for IOMMU section but it doesn't exist. I will probably change classification to Drivers/PCI. (In reply to Alex Williamson from comment #7) I'm guessing this might be the motherboard here: MSI ZH77A-G43 Yes, that is my motherboard. Since you're apparently trying to use VT-d on this system for KVM and therefore presumably device assignment, I'll note that you will never be able to successfully assign the conventional PCI devices separately between guests or between host and guests. The IOMMU does not have the granularity to create separate IOMMU domains per PCI slot in this topology. Also, some (all?) Realtek NICs have some strange backdoors to PCI configuration space that make them poor targets for PCI device assignment: Yes, I'm trying to do device assignment, but not with those NICs: I want to pass only the nVidia PCIe VGA card to guest; while all NICs (and the integrated VGA card) will remain available to host. It would be nice if there would be a way to prevent IOMMU on these NICs (or something like that). SIDE NOTE: in the qemu commit they talk about RTL8168, but I have real RTL8169 devices (the only RTL8168 device is the integrated NIC and for that device I'm using r8168 driver from realtek compiled by hand). -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Nested paging in nested SVM setup
Il 20/08/2014 09:37, Valentine Sinitsyn ha scritto: Hi Paolo, On 20.08.2014 12:55, Paolo Bonzini wrote: Is the 0x23c always the same? No, it's just a garbage - I've seen other values as well (0x80 last time). Can you try this patch? Sure. It does print a warning: [ 2176.722098] [ cut here ] [ 2176.722118] WARNING: CPU: 0 PID: 1488 at /home/val/kvm-kmod/x86/x86.c:368 kvm_multiple_exception+0x121/0x130 [kvm]() [ 2176.722121] Modules linked in: kvm_amd(O) kvm(O) amd_freq_sensitivity snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel snd_hda_intel aesni_intel snd_hda_controller radeon snd_hda_codec ipmi_si aes_x86_64 ipmi_msghandler snd_hwdep ttm r8169 ppdev mii lrw gf128mul snd_pcm glue_helper drm_kms_helper snd_timer fam15h_power evdev drm shpchp snd ablk_helper cryptd microcode mac_hid soundcore serio_raw pcspkr i2c_algo_bit k10temp i2c_piix4 i2c_core parport_pc parport hwmon edac_core tpm_tis edac_mce_amd tpm video button acpi_cpufreq processor ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common atkbd libps2 ahci libahci ohci_pci ohci_hcd ehci_pci xhci_hcd libata ehci_hcd usbcore scsi_mod usb_common i8042 serio [last unloaded: kvm] [ 2176.722217] CPU: 0 PID: 1488 Comm: qemu-system-x86 Tainted: G W O 3.16.1-1-ARCH #1 [ 2176.71] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./IMB-A180, BIOS L0.17 05/24/2013 [ 2176.74] 25350f51 8800919fbbc0 8152ae6c [ 2176.79] 8800919fbbf8 8106e45d 880037f68000 [ 2176.722234] 0080 0001 81a4 [ 2176.722239] Call Trace: [ 2176.722250] [8152ae6c] dump_stack+0x4d/0x6f [ 2176.722257] [8106e45d] warn_slowpath_common+0x7d/0xa0 [ 2176.722262] [8106e58a] warn_slowpath_null+0x1a/0x20 [ 2176.722275] [a0651e41] kvm_multiple_exception+0x121/0x130 [kvm] [ 2176.722288] [a06594f8] x86_emulate_instruction+0x548/0x640 [kvm] [ 2176.722303] [a06653e1] kvm_mmu_page_fault+0x91/0xf0 [kvm] [ 2176.722310] [a04eb6a7] pf_interception+0xd7/0x180 [kvm_amd] [ 2176.722317] [8104e876] ? native_apic_mem_write+0x6/0x10 [ 2176.722323] [a04ef261] handle_exit+0x141/0x9d0 [kvm_amd] [ 2176.722335] [a065512c] ? kvm_set_cr8+0x1c/0x20 [kvm] [ 2176.722341] [a04ea3e0] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd] [ 2176.722355] [a065adc7] kvm_arch_vcpu_ioctl_run+0x597/0x1210 [kvm] [ 2176.722368] [a065705b] ? kvm_arch_vcpu_load+0xbb/0x200 [kvm] [ 2176.722378] [a064a152] kvm_vcpu_ioctl+0x2b2/0x5c0 [kvm] [ 2176.722384] [810b66b4] ? __wake_up+0x44/0x50 [ 2176.722390] [81200dcc] ? fsnotify+0x28c/0x370 [ 2176.722397] [811d4a70] do_vfs_ioctl+0x2d0/0x4b0 [ 2176.722403] [811df18e] ? __fget+0x6e/0xb0 [ 2176.722408] [811d4cd1] SyS_ioctl+0x81/0xa0 [ 2176.722414] [81530be9] system_call_fastpath+0x16/0x1b [ 2176.722418] ---[ end trace b0f81744c5a5ea4a ]--- Thanks, Valentine -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I audited the various places that return X86EMUl_PROPAGATE_FAULT and I think the culprit is this code in paging_tmpl.h. real_gpa = mmu-translate_gpa(vcpu, gfn_to_gpa(gfn), access); if (real_gpa == UNMAPPED_GVA) return 0; It returns zero without setting fault.vector. Another patch... I will post parts of it separately, if I am right you should get 0xfe as the vector and a WARN from the gva_to_gpa function. diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ef297919a691..e5bf13003cd2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -527,6 +527,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) static int emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, u32 error, bool valid) { + WARN_ON(vec 0x1f); ctxt-exception.vector = vec; ctxt-exception.error_code = error; ctxt-exception.error_code_valid = valid; @@ -3016,7 +3015,7 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt) ctxt-dst.val = swab64(ctxt-src.val); break; default: - return X86EMUL_PROPAGATE_FAULT; + BUG(); } return X86EMUL_CONTINUE; } @@ -4829,8 +4828,10 @@ writeback: ctxt-eip = ctxt-_eip; done: - if (rc == X86EMUL_PROPAGATE_FAULT) + if (rc == X86EMUL_PROPAGATE_FAULT) { + WARN_ON(ctxt-exception.vector 0x1f); ctxt-have_exception = true; + } if (rc == X86EMUL_INTERCEPTED) return
Re: [PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page
Il 20/08/2014 09:31, Wanpeng Li ha scritto: EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Thanks, the patch looks good. Can you describe how you detected the problem and how you're testing for it? Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: vmx: fix ept reserved bits for 1-GByte page
On Wed, Aug 20, 2014 at 10:13:07AM +0200, Paolo Bonzini wrote: Il 20/08/2014 09:31, Wanpeng Li ha scritto: EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Thanks, the patch looks good. Can you describe how you detected the problem and how you're testing for it? I found the issue by reviewing codes. Regards, Wanpeng Li Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 20/08/2014 03:03, David Matlack ha scritto: On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: On 08/19/2014 05:03 PM, Paolo Bonzini wrote: Il 19/08/2014 10:50, Xiao Guangrong ha scritto: Okay, what confused me it that it seems that the single line patch is ok to you. :) No, it was late and I was confused. :) Now, do we really need to care the case 2? like David said: Sorry I didn't explain myself very well: Since we can get a single wrong mmio exit no matter what, it has to be handled in userspace. So my point was, it doesn't really help to fix that one very specific way that it can happen, because it can just happen in other ways. (E.g. update memslots occurs after is_noslot_pfn() and before mmio exit). What's your idea? I think if you always treat the low bit as zero in mmio sptes, you can do that without losing a bit of the generation. What's you did is avoiding cache a invalid generation number into spte, but actually if we can figure it out when we check mmio access, it's ok. Like the updated patch i posted should fix it, that way avoids doubly increase the number. Yes. Okay, if you're interested increasing the number doubly, there is the simpler one: This wastes a bit in the mmio spte though. My idea is to increase the memslots generation twice, but drop the low bit in the mmio spte. Yeah, really smart idea. :) Paolo/David, would you mind making a patch for this (+ the comments in David's patch)? Paolo, since it was your idea would you like to write it? I don't mind either way. Sure, I'll post the patch for review. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
On 10/08/14 10:30, Razya Ladelsky wrote: From: Razya Ladelsky ra...@il.ibm.com Date: Thu, 31 Jul 2014 09:47:20 +0300 Subject: [PATCH] vhost: Add polling mode When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won't be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html I ran some experiments with TCP stream netperf and filebench (having 2 threads performing random reads) benchmarks on an IBM System x3650 M4. I have two machines, A and B. A hosts the vms, B runs the netserver. The vms (on A) run netperf, its destination server is running on B. All runs loaded the guests in a way that they were (cpu) saturated. For example, I ran netperf with 64B messages, which is heavily loading the vm (which is why its throughput is low). The idea was to get it 100% loaded, so we can see that the polling is getting it to produce higher throughput. The system had two cores per guest, as to allow for both the vcpu and the vhost thread to run concurrently for maximum throughput (but I didn't pin the threads to specific cores). My experiments were fair in a sense that for both cases, with or without polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity that way). The only difference was whether polling was enabled/disabled. Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com Gave it a quick try on s390/kvm. As expected it makes no difference for big streaming workload like iperf. uperf with a 1-1 round robin got indeed faster by about 30%. The high CPU consumption is something that bothers me though, as virtualized systems tend to be full. +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); + +static int poll_stop_idle = 3*HZ; /* 3 seconds */ +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after this many jiffies of no work.); This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 1 as a default would reduce the CPU overhead for most cases. If we dont have a packet in one millisecond, we can surely go back to the kick approach, I think. Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info
Re: [PATCH] KVM: x86: Warn on APIC base relocation
CC’ing the KVM mailing list which I forgot. On Aug 20, 2014, at 11:12 AM, Nadav Amit na...@cs.technion.ac.il wrote: APIC base relocation is unsupported by KVM. If anyone uses it, the least should be to report a warning in the hypervisor. Note that kvm-unit-tests performs APIC base relocation, and causes the warning to be printed. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/lapic.c | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 08e8a89..6655e20 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1416,6 +1416,11 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value) apic-base_address = apic-vcpu-arch.apic_base MSR_IA32_APICBASE_BASE; + if ((value MSR_IA32_APICBASE_ENABLE) + apic-base_address != APIC_DEFAULT_PHYS_BASE) + printk_once(KERN_WARNING + APIC base relocation is unsupported by KVM\n); + /* with FSB delivery interrupt, we can restart APIC functionality */ apic_debug(apic base msr is 0x%016 PRIx64 , and base address is 0x%lx.\n, apic-vcpu-arch.apic_base, apic-base_address); -- 1.9.1 signature.asc Description: Message signed with OpenPGP using GPGMail
[PATCH v2 1/2] KVM: Introduce gfn_to_hva_memslot_prot
To support read-only memory regions on arm and arm64, we have a need to resolve a gfn to an hva given a pointer to a memslot to avoid looping through the memslots twice and to reuse the hva error checking of gfn_to_hva_prot(), add a new gfn_to_hva_memslot_prot() function and refactor gfn_to_hva_prot() to use this function. Acked-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- Changelog[v2]: - Fix typo in patch title include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 11 +-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b3..85875e0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -528,6 +528,8 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn); unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable); unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn); +unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, + bool *writable); void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); void kvm_set_page_accessed(struct page *page); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..36b887d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1075,9 +1075,9 @@ EXPORT_SYMBOL_GPL(gfn_to_hva); * If writable is set to false, the hva returned by this function is only * allowed to be read. */ -unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable) +unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, + gfn_t gfn, bool *writable) { - struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); unsigned long hva = __gfn_to_hva_many(slot, gfn, NULL, false); if (!kvm_is_error_hva(hva) writable) @@ -1086,6 +1086,13 @@ unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable) return hva; } +unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable) +{ + struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); + + return gfn_to_hva_memslot_prot(slot, gfn, writable); +} + static int kvm_read_hva(void *data, void __user *hva, int len) { return __copy_from_user(data, hva, len); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/2] arm/arm64: KVM: Support KVM_CAP_READONLY_MEM
When userspace loads code and data in a read-only memory regions, KVM needs to be able to handle this on arm and arm64. Specifically this is used when running code directly from a read-only flash device; the common scenario is a UEFI blob loaded with the -bios option in QEMU. Note that the MMIO exit on writes to a read-only memory is ABI and can be used to emulate block-erase style flash devices. Acked-by: Marc Zyngier marc.zyng...@arm.com Signed-off-by: Christoffer Dall christoffer.d...@linaro.org --- arch/arm/include/uapi/asm/kvm.h | 1 + arch/arm/kvm/arm.c| 1 + arch/arm/kvm/mmu.c| 15 --- arch/arm64/include/uapi/asm/kvm.h | 1 + 4 files changed, 11 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h index e6ebdd3..51257fd 100644 --- a/arch/arm/include/uapi/asm/kvm.h +++ b/arch/arm/include/uapi/asm/kvm.h @@ -25,6 +25,7 @@ #define __KVM_HAVE_GUEST_DEBUG #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_READONLY_MEM #define KVM_REG_SIZE(id) \ (1U (((id) KVM_REG_SIZE_MASK) KVM_REG_SIZE_SHIFT)) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index a99e0cd..3ab3e60 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -188,6 +188,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ONE_REG: case KVM_CAP_ARM_PSCI: case KVM_CAP_ARM_PSCI_0_2: + case KVM_CAP_READONLY_MEM: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 16e7994..dcbe01e 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -747,14 +747,13 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap) } static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, - struct kvm_memory_slot *memslot, + struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) { int ret; bool write_fault, writable, hugetlb = false, force_pte = false; unsigned long mmu_seq; gfn_t gfn = fault_ipa PAGE_SHIFT; - unsigned long hva = gfn_to_hva(vcpu-kvm, gfn); struct kvm *kvm = vcpu-kvm; struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache; struct vm_area_struct *vma; @@ -863,7 +862,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) unsigned long fault_status; phys_addr_t fault_ipa; struct kvm_memory_slot *memslot; - bool is_iabt; + unsigned long hva; + bool is_iabt, write_fault, writable; gfn_t gfn; int ret, idx; @@ -884,7 +884,10 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) idx = srcu_read_lock(vcpu-kvm-srcu); gfn = fault_ipa PAGE_SHIFT; - if (!kvm_is_visible_gfn(vcpu-kvm, gfn)) { + memslot = gfn_to_memslot(vcpu-kvm, gfn); + hva = gfn_to_hva_memslot_prot(memslot, gfn, writable); + write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu)); + if (kvm_is_error_hva(hva) || (write_fault !writable)) { if (is_iabt) { /* Prefetch Abort on I/O address */ kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu)); @@ -910,9 +913,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run) goto out_unlock; } - memslot = gfn_to_memslot(vcpu-kvm, gfn); - - ret = user_mem_abort(vcpu, fault_ipa, memslot, fault_status); + ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status); if (ret == 0) ret = 1; out_unlock: diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index e633ff8..f4ec5a6 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -37,6 +37,7 @@ #define __KVM_HAVE_GUEST_DEBUG #define __KVM_HAVE_IRQ_LINE +#define __KVM_HAVE_READONLY_MEM #define KVM_REG_SIZE(id) \ (1U (((id) KVM_REG_SIZE_MASK) KVM_REG_SIZE_SHIFT)) -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Questions]
Hi MST, I see reduce networking latency from Networking Todo, the idea is to allow handling short packets from softirq or VCPU context. If from softirq context, how could softirq copy skb to guest memory? If the method is to use mmstruct of Qemu, would it be expensive? If from VCPU context, maybe the internal operation of the virtual machine will have a significant delay. Thanks! -- Best Wishes! Zhang Jie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/2] KVM: nVMX: introduce apic_access_and_virtual_page_valid
Introduce apic_access_and_virtual_page_valid() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/vmx.c | 82 ++ 1 file changed, 46 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index caf239d..02bc07d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7838,6 +7838,50 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, kvm_inject_page_fault(vcpu, fault); } +static bool apic_access_and_virtual_page_valid(struct kvm_vcpu *vcpu, + struct vmcs12 *vmcs12) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { + if (!PAGE_ALIGNED(vmcs12-apic_access_addr)) + /*TODO: Also verify bits beyond physical address width are 0*/ + return false; + + /* +* Translate L1 physical address to host physical +* address for vmcs02. Keep the page pinned, so this +* physical address remains valid. We keep a reference +* to it so we can release it later. +*/ + if (vmx-nested.apic_access_page) /* shouldn't happen */ + nested_release_page(vmx-nested.apic_access_page); + vmx-nested.apic_access_page = + nested_get_page(vcpu, vmcs12-apic_access_addr); + } + + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { + if (vmx-nested.virtual_apic_page) /* shouldn't happen */ + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); + + /* +* Failing the vm entry is _not_ what the processor does +* but it's basically the only possibility we have. +* We could still enter the guest if CR8 load exits are +* enabled, CR8 store exits are enabled, and virtualize APIC +* access is disabled; in this case the processor would never +* use the TPR shadow and we could simply clear the bit from +* the execution control. But such a configuration is useless, +* so let's keep the code simple. +*/ + if (!vmx-nested.virtual_apic_page) + return false; + } + return true; +} + static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu) { u64 preemption_timeout = get_vmcs12(vcpu)-vmx_preemption_timer_value; @@ -7984,16 +8028,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) if (exec_control SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) { /* -* Translate L1 physical address to host physical -* address for vmcs02. Keep the page pinned, so this -* physical address remains valid. We keep a reference -* to it so we can release it later. -*/ - if (vmx-nested.apic_access_page) /* shouldn't happen */ - nested_release_page(vmx-nested.apic_access_page); - vmx-nested.apic_access_page = - nested_get_page(vcpu, vmcs12-apic_access_addr); - /* * If translation failed, no matter: This feature asks * to exit when accessing the given address, and if it * can never be accessed, this feature won't do @@ -8040,30 +8074,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) exec_control |= vmcs12-cpu_based_vm_exec_control; if (exec_control CPU_BASED_TPR_SHADOW) { - if (vmx-nested.virtual_apic_page) - nested_release_page(vmx-nested.virtual_apic_page); - vmx-nested.virtual_apic_page = - nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); - if (!vmx-nested.virtual_apic_page) - exec_control = - ~CPU_BASED_TPR_SHADOW; - else - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, page_to_phys(vmx-nested.virtual_apic_page)); - - /* -* Failing the vm entry is _not_ what the processor does -* but it's basically the only possibility we have. -* We could still enter the guest if CR8 load exits are -* enabled, CR8
[PATCH v5 1/2] KVM: nVMX: nested TPR shadow/threshold emulation
This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Reviewed-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- v4 - v5: * moving the nested_vmx_failValid call inside the if (!vmx-nested.virtual_apic_page) v3 - v4: * add Paolo's Reviewed-by * unconditionally fail the vmentry, with a comment * setup the TPR_SHADOW/virtual_apic_page of vmcs02 based on vmcs01 if L2 owns the APIC v2 - v3: * nested vm entry failure if both tpr shadow and cr8 exiting bits are not set v1 - v2: * don't take L0's virtualize APIC accesses setting into account * virtual_apic_page do exactly the same thing that is done for apic_access_page * add the tpr threshold field to the read-write fields for shadow VMCS arch/x86/kvm/vmx.c | 51 +-- 1 file changed, 49 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 286c283..caf239d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -379,6 +379,7 @@ struct nested_vmx { * we must keep them pinned while L2 runs. */ struct page *apic_access_page; + struct page *virtual_apic_page; u64 msr_ia32_feature_control; struct hrtimer preemption_timer; @@ -533,6 +534,7 @@ static int max_shadow_read_only_fields = ARRAY_SIZE(shadow_read_only_fields); static unsigned long shadow_read_write_fields[] = { + TPR_THRESHOLD, GUEST_RIP, GUEST_RSP, GUEST_CR0, @@ -2330,7 +2332,7 @@ static __init void nested_vmx_setup_ctls_msrs(void) CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING | CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING | CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING | - CPU_BASED_PAUSE_EXITING | + CPU_BASED_PAUSE_EXITING | CPU_BASED_TPR_SHADOW | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; /* * We can allow some features even when not supported by the @@ -6150,6 +6152,10 @@ static void free_nested(struct vcpu_vmx *vmx) nested_release_page(vmx-nested.apic_access_page); vmx-nested.apic_access_page = 0; } + if (vmx-nested.virtual_apic_page) { + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = 0; + } nested_free_all_saved_vmcss(vmx); } @@ -6938,7 +6944,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu) case EXIT_REASON_MCE_DURING_VMENTRY: return 0; case EXIT_REASON_TPR_BELOW_THRESHOLD: - return 1; + return nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW); case EXIT_REASON_APIC_ACCESS: return nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES); @@ -7059,6 +7065,12 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu) static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { + struct vmcs12 *vmcs12 = get_vmcs12(vcpu); + + if (is_guest_mode(vcpu) + nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) + return; + if (irr == -1 || tpr irr) { vmcs_write32(TPR_THRESHOLD, 0); return; @@ -8026,6 +8038,37 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) exec_control = ~CPU_BASED_VIRTUAL_NMI_PENDING; exec_control = ~CPU_BASED_TPR_SHADOW; exec_control |= vmcs12-cpu_based_vm_exec_control; + + if (exec_control CPU_BASED_TPR_SHADOW) { + if (vmx-nested.virtual_apic_page) + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); + if (!vmx-nested.virtual_apic_page) + exec_control = + ~CPU_BASED_TPR_SHADOW; + else + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + page_to_phys(vmx-nested.virtual_apic_page)); + + /* +* Failing the vm entry is _not_ what the processor does +* but it's basically the only possibility we have. +* We could still enter the guest if CR8 load exits are +* enabled, CR8 store exits are enabled, and virtualize APIC +* access is disabled; in this case the processor would never +* use the TPR shadow and we could simply clear the bit from +* the
Re: Nested paging in nested SVM setup
On 20.08.2014 14:11, Paolo Bonzini wrote: Another patch... I will post parts of it separately, if I am right you should get 0xfe as the vector and a WARN from the gva_to_gpa function. I confirm the vector is 0xfe, however I see no warnings from gva_to_gpa() - only from emulate_exception(): [ 3417.251967] [ cut here ] [ 3417.251983] WARNING: CPU: 1 PID: 1584 at /home/val/kvm-kmod/x86/emulate.c:4839 x86_emulate_insn+0xb33/0xb70 [kvm]() I can see both warnings, if I move 'WARN(walker.fault.vector 0x1f)' from gva_to_gpa() to gva_to_gpa_nested(), however: [ 3841.420019] WARNING: CPU: 0 PID: 1945 at /home/val/kvm-kmod/x86/paging_tmpl.h:903 paging64_gva_to_gpa_nested+0xd1/0xe0 [kvm]() [ 3841.420457] WARNING: CPU: 0 PID: 1945 at /home/val/kvm-kmod/x86/emulate.c:4839 x86_emulate_insn+0xb33/0xb70 [kvm]() Thanks, Valentine -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Questions]
On Wed, Aug 20, 2014 at 05:37:01PM +0800, Zhangjie (HZ) wrote: Hi MST, I see reduce networking latency from Networking Todo, the idea is to allow handling short packets from softirq or VCPU context. If from softirq context, how could softirq copy skb to guest memory? If the method is to use mmstruct of Qemu, would it be expensive? I have some very rough patches to explain this part of the idea. Will dig them out for you. If from VCPU context, maybe the internal operation of the virtual machine will have a significant delay. We'd have to find a good heuristic here. Maybe for a small number of very short packets the delay won't be significant. Thanks! -- Best Wishes! Zhang Jie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86: Clarify PMU related features bit manipulation
kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact there are already unions that can be used instead. Changing the bit manipulation to the union for clarity. This patch does not change the functionality. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/pmu.c | 24 ++-- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 3dd6acc..8e6b7d8 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -15,6 +15,7 @@ #include linux/types.h #include linux/kvm_host.h #include linux/perf_event.h +#include asm/perf_event.h #include x86.h #include cpuid.h #include lapic.h @@ -463,7 +464,8 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) { struct kvm_pmu *pmu = vcpu-arch.pmu; struct kvm_cpuid_entry2 *entry; - unsigned bitmap_len; + union cpuid10_eax eax; + union cpuid10_edx edx; pmu-nr_arch_gp_counters = 0; pmu-nr_arch_fixed_counters = 0; @@ -475,25 +477,27 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry) return; + eax.full = entry-eax; + edx.full = entry-edx; - pmu-version = entry-eax 0xff; + pmu-version = eax.split.version_id; if (!pmu-version) return; - pmu-nr_arch_gp_counters = min((int)(entry-eax 8) 0xff, - INTEL_PMC_MAX_GENERIC); - pmu-counter_bitmask[KVM_PMC_GP] = - ((u64)1 ((entry-eax 16) 0xff)) - 1; - bitmap_len = (entry-eax 24) 0xff; - pmu-available_event_types = ~entry-ebx ((1ull bitmap_len) - 1); + pmu-nr_arch_gp_counters = min_t(int, eax.split.num_counters, + INTEL_PMC_MAX_GENERIC); + pmu-counter_bitmask[KVM_PMC_GP] = ((u64)1 eax.split.bit_width) - 1; + pmu-available_event_types = ~entry-ebx + ((1ull eax.split.mask_length) - 1); if (pmu-version == 1) { pmu-nr_arch_fixed_counters = 0; } else { - pmu-nr_arch_fixed_counters = min((int)(entry-edx 0x1f), + pmu-nr_arch_fixed_counters = + min_t(int, edx.split.num_counters_fixed, INTEL_PMC_MAX_FIXED); pmu-counter_bitmask[KVM_PMC_FIXED] = - ((u64)1 ((entry-edx 5) 0xff)) - 1; + ((u64)1 edx.split.bit_width_fixed) - 1; } pmu-global_ctrl = ((1 pmu-nr_arch_gp_counters) - 1) | -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: x86: pmu: Enabling PMU v3
Currently the guest PMU version number 3 is not supported (versions up to 2 are reported). The features PMU v3 presents are: AnyThread, extend reporting of capabilities in CPUID leaf 0AH and varible number of perfomrance counters. While most of the support is already present, the version reported is still 2, since dealing with AnyThread is complicated. Nonetheless, OSes may assume other features than AnyThread are not supported since the version report is 2. This patch checks if the guest vCPU uses SMT. If not, it reports PMU v3. When PMU v3 is used, the AnyThread bit is ignored, but does not trigger faults. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/pmu.c | 11 +-- arch/x86/kvm/svm.c | 15 +++ arch/x86/kvm/vmx.c | 16 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4bda61b..8c8401b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -327,6 +327,7 @@ struct kvm_pmu { u64 counter_bitmask[2]; u64 global_ctrl_mask; u64 reserved_bits; + u64 fixed_ctrl_reserved_bits; u8 version; struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC]; struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED]; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..0d7b729 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -406,7 +406,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, if (!cap.version) memset(cap, 0, sizeof(cap)); - eax.split.version_id = min(cap.version, 2); + eax.split.version_id = min(cap.version, 3); eax.split.num_counters = cap.num_counters_gp; eax.split.bit_width = cap.bit_width_gp; eax.split.mask_length = cap.events_mask_len; diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 8e6b7d8..2ad7101 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -383,7 +383,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) case MSR_CORE_PERF_FIXED_CTR_CTRL: if (pmu-fixed_ctr_ctrl == data) return 0; - if (!(data 0xf444ull)) { + if (!(data pmu-fixed_ctrl_reserved_bits)) { reprogram_fixed_counters(pmu, data); return 0; } @@ -472,7 +472,7 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) pmu-counter_bitmask[KVM_PMC_GP] = 0; pmu-counter_bitmask[KVM_PMC_FIXED] = 0; pmu-version = 0; - pmu-reserved_bits = 0x0020ull; + pmu-reserved_bits = 0xull; entry = kvm_find_cpuid_entry(vcpu, 0xa, 0); if (!entry) @@ -504,6 +504,13 @@ void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu) (((1ull pmu-nr_arch_fixed_counters) - 1) INTEL_PMC_IDX_FIXED); pmu-global_ctrl_mask = ~pmu-global_ctrl; + pmu-fixed_ctrl_reserved_bits = + ~((1ull pmu-nr_arch_fixed_counters * 4) - 1); + if (pmu-version == 2) { + /* No support for anythread */ + pmu-reserved_bits |= 0x20; + pmu-fixed_ctrl_reserved_bits |= 0xull; + } entry = kvm_find_cpuid_entry(vcpu, 7, 0); if (entry (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1f49c86..963a9c0 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4057,6 +4057,21 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio) static void svm_cpuid_update(struct kvm_vcpu *vcpu) { + struct kvm_cpuid_entry2 *best; + + /* If SMT, then PMU v3 is unsupported because of the anythread bit */ + best = kvm_find_cpuid_entry(vcpu, 0x801e, 0); + if (best ((best-ebx 8) 3) 0) { + best = kvm_find_cpuid_entry(vcpu, 0xa, 0); + if (best) { + union cpuid10_eax eax; + + eax.full = best-eax; + eax.split.version_id = + min_t(int, eax.split.version_id, 2); + best-eax = eax.full; + } + } } static void svm_set_supported_cpuid(u32 func, struct kvm_cpuid_entry2 *entry) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cad37d5..437b131 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7726,6 +7726,7 @@ static void vmx_cpuid_update(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; struct vcpu_vmx *vmx = to_vmx(vcpu); u32 exec_control; + bool smt = false;
[PATCH 0/2] KVM: x86: Enabling PMU v3 on non-SMT VMs
This patch-set enables PMU v3 on non-SMT VMs. All the PMU v3 features are already in KVM except the AnyThread support. However, AnyThread is only important on SMT machines, and can be ignored otherwise. Reporting PMU v3 can be useful for OSes that rely on the version, and not on other CPUID fields. Thanks for reviewing the code. Note that it was not tested on AMD machine. Nadav Amit (2): KVM: x86: Clarify PMU related features bit manipulation KVM: x86: pmu: Enabling PMU v3 arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/pmu.c | 35 +++ arch/x86/kvm/svm.c | 15 +++ arch/x86/kvm/vmx.c | 16 5 files changed, 56 insertions(+), 13 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
On Wed, Aug 20, 2014 at 10:41:32AM +0200, Christian Borntraeger wrote: On 10/08/14 10:30, Razya Ladelsky wrote: From: Razya Ladelsky ra...@il.ibm.com Date: Thu, 31 Jul 2014 09:47:20 +0300 Subject: [PATCH] vhost: Add polling mode When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won't be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html I ran some experiments with TCP stream netperf and filebench (having 2 threads performing random reads) benchmarks on an IBM System x3650 M4. I have two machines, A and B. A hosts the vms, B runs the netserver. The vms (on A) run netperf, its destination server is running on B. All runs loaded the guests in a way that they were (cpu) saturated. For example, I ran netperf with 64B messages, which is heavily loading the vm (which is why its throughput is low). The idea was to get it 100% loaded, so we can see that the polling is getting it to produce higher throughput. The system had two cores per guest, as to allow for both the vcpu and the vhost thread to run concurrently for maximum throughput (but I didn't pin the threads to specific cores). My experiments were fair in a sense that for both cases, with or without polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity that way). The only difference was whether polling was enabled/disabled. Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com Gave it a quick try on s390/kvm. As expected it makes no difference for big streaming workload like iperf. uperf with a 1-1 round robin got indeed faster by about 30%. The high CPU consumption is something that bothers me though, as virtualized systems tend to be full. +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); + +static int poll_stop_idle = 3*HZ; /* 3 seconds */ +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after this many jiffies of no work.); This seems ridicoudly high. Even one jiffie is an eternity, so setting it to 1 as a default would reduce the CPU overhead for most cases. If we dont have a packet in one
Re: [PATCH v5 2/2] KVM: nVMX: introduce apic_access_and_virtual_page_valid
Il 20/08/2014 11:45, Wanpeng Li ha scritto: Introduce apic_access_and_virtual_page_valid() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: Wanpeng Li wanpeng...@linux.intel.com --- arch/x86/kvm/vmx.c | 82 ++ 1 file changed, 46 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index caf239d..02bc07d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -7838,6 +7838,50 @@ static void vmx_inject_page_fault_nested(struct kvm_vcpu *vcpu, kvm_inject_page_fault(vcpu, fault); } +static bool apic_access_and_virtual_page_valid(struct kvm_vcpu *vcpu, + struct vmcs12 *vmcs12) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + + if (nested_cpu_has2(vmcs12, SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES)) { + if (!PAGE_ALIGNED(vmcs12-apic_access_addr)) + /*TODO: Also verify bits beyond physical address width are 0*/ + return false; + + /* + * Translate L1 physical address to host physical + * address for vmcs02. Keep the page pinned, so this + * physical address remains valid. We keep a reference + * to it so we can release it later. + */ + if (vmx-nested.apic_access_page) /* shouldn't happen */ + nested_release_page(vmx-nested.apic_access_page); + vmx-nested.apic_access_page = + nested_get_page(vcpu, vmcs12-apic_access_addr); + } + + if (nested_cpu_has(vmcs12, CPU_BASED_TPR_SHADOW)) { + if (vmx-nested.virtual_apic_page) /* shouldn't happen */ + nested_release_page(vmx-nested.virtual_apic_page); + vmx-nested.virtual_apic_page = + nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); + + /* + * Failing the vm entry is _not_ what the processor does + * but it's basically the only possibility we have. + * We could still enter the guest if CR8 load exits are + * enabled, CR8 store exits are enabled, and virtualize APIC + * access is disabled; in this case the processor would never + * use the TPR shadow and we could simply clear the bit from + * the execution control. But such a configuration is useless, + * so let's keep the code simple. + */ + if (!vmx-nested.virtual_apic_page) + return false; + } + return true; +} + static void vmx_start_preemption_timer(struct kvm_vcpu *vcpu) { u64 preemption_timeout = get_vmcs12(vcpu)-vmx_preemption_timer_value; @@ -7984,16 +8028,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) if (exec_control SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) { /* - * Translate L1 physical address to host physical - * address for vmcs02. Keep the page pinned, so this - * physical address remains valid. We keep a reference - * to it so we can release it later. - */ - if (vmx-nested.apic_access_page) /* shouldn't happen */ - nested_release_page(vmx-nested.apic_access_page); - vmx-nested.apic_access_page = - nested_get_page(vcpu, vmcs12-apic_access_addr); - /* * If translation failed, no matter: This feature asks * to exit when accessing the given address, and if it * can never be accessed, this feature won't do @@ -8040,30 +8074,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) exec_control |= vmcs12-cpu_based_vm_exec_control; if (exec_control CPU_BASED_TPR_SHADOW) { - if (vmx-nested.virtual_apic_page) - nested_release_page(vmx-nested.virtual_apic_page); - vmx-nested.virtual_apic_page = -nested_get_page(vcpu, vmcs12-virtual_apic_page_addr); - if (!vmx-nested.virtual_apic_page) - exec_control = - ~CPU_BASED_TPR_SHADOW; - else - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, page_to_phys(vmx-nested.virtual_apic_page)); - - /* - * Failing the vm entry is _not_ what the processor does - * but it's basically the only possibility we have. - * We could still enter the guest if CR8 load exits are - *
Re: [PATCH] vhost: Add polling mode
On Sun, Aug 10, 2014 at 11:30:35AM +0300, Razya Ladelsky wrote: From: Razya Ladelsky ra...@il.ibm.com Date: Thu, 31 Jul 2014 09:47:20 +0300 Subject: [PATCH] vhost: Add polling mode When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won't be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html I ran some experiments with TCP stream netperf and filebench (having 2 threads performing random reads) benchmarks on an IBM System x3650 M4. I have two machines, A and B. A hosts the vms, B runs the netserver. The vms (on A) run netperf, its destination server is running on B. All runs loaded the guests in a way that they were (cpu) saturated. For example, I ran netperf with 64B messages, which is heavily loading the vm (which is why its throughput is low). The idea was to get it 100% loaded, so we can see that the polling is getting it to produce higher throughput. The system had two cores per guest, as to allow for both the vcpu and the vhost thread to run concurrently for maximum throughput (but I didn't pin the threads to specific cores). My experiments were fair in a sense that for both cases, with or without polling, I run both threads, vcpu and vhost, on 2 cores (set their affinity that way). The only difference was whether polling was enabled/disabled. Results: Netperf, 1 vm: The polling patch improved throughput by ~33% (1516 MB/sec - 2046 MB/sec). Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf (4086 MB/sec - 5545 MB/sec). filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Signed-off-by: Razya Ladelsky ra...@il.ibm.com This really needs more thourough benchmarking report, including system data. One good example for a related patch: http://lwn.net/Articles/551179/ though for virtualization, we need data about host as well, and if you want to look at streaming benchmarks, you need to test different message sizes and measure packet size. For now, commenting on the patches assuming that will be forthcoming. --- drivers/vhost/net.c |6 +- drivers/vhost/scsi.c |6 +- drivers/vhost/vhost.c | 245 +++-- drivers/vhost/vhost.h | 38 +++- 4 files changed, 277 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 971a760..558aecb 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); +
Re: [PATCH 0/2] KVM: x86: Enabling PMU v3 on non-SMT VMs
Il 20/08/2014 12:25, Nadav Amit ha scritto: This patch-set enables PMU v3 on non-SMT VMs. All the PMU v3 features are already in KVM except the AnyThread support. However, AnyThread is only important on SMT machines, and can be ignored otherwise. Reporting PMU v3 can be useful for OSes that rely on the version, and not on other CPUID fields. Thanks for reviewing the code. Note that it was not tested on AMD machine. Nadav Amit (2): KVM: x86: Clarify PMU related features bit manipulation KVM: x86: pmu: Enabling PMU v3 arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c| 2 +- arch/x86/kvm/pmu.c | 35 +++ arch/x86/kvm/svm.c | 15 +++ arch/x86/kvm/vmx.c | 16 5 files changed, 56 insertions(+), 13 deletions(-) For now I've reviewed patch 1 and will apply that to kvm/queue. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
On Tue, Aug 19, 2014 at 11:36:31AM +0300, Razya Ladelsky wrote: That was just one example. There many other possibilities. Either actually make the systems load all host CPUs equally, or divide throughput by host CPU. The polling patch adds this capability to vhost, reducing costly exit overhead when the vm is loaded. In order to load the vm I ran netperf with msg size of 256: Without polling: 2480 Mbits/sec, utilization: vm - 100% vhost - 64% With Polling: 4160 Mbits/sec, utilization: vm - 100% vhost - 100% Therefore, throughput/cpu without polling is 15.1, and 20.8 with polling. Can you please present results in a form that makes it possible to see the effect on various configurations and workloads? Here's one example where this was done: https://lkml.org/lkml/2014/8/14/495 You really should also provide data about your host configuration (missing in the above link). My intention was to load vhost as close as possible to 100% utilization without polling, in order to compare it to the polling utilization case (where vhost is always 100%). The best use case, of course, would be when the shared vhost thread work (TBD) is integrated and then vhost will actually be using its polling cycles to handle requests of multiple devices (even from multiple vms). Thanks, Razya -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: emulate: warn on invalid or uninitialized exception numbers
These were reported when running Jailhouse on AMD processors. Initialize ctxt-exception.vector with an invalid exception number, and warn if it remained invalid even though the emulator got an X86EMUL_PROPAGATE_FAULT return code. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/emulate.c | 5 - arch/x86/kvm/x86.c | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4fbf4b598f92..e5bf13003cd2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -527,6 +527,7 @@ static unsigned long seg_base(struct x86_emulate_ctxt *ctxt, int seg) static int emulate_exception(struct x86_emulate_ctxt *ctxt, int vec, u32 error, bool valid) { + WARN_ON(vec 0x1f); ctxt-exception.vector = vec; ctxt-exception.error_code = error; ctxt-exception.error_code_valid = valid; @@ -4827,8 +4828,10 @@ writeback: ctxt-eip = ctxt-_eip; done: - if (rc == X86EMUL_PROPAGATE_FAULT) + if (rc == X86EMUL_PROPAGATE_FAULT) { + WARN_ON(ctxt-exception.vector 0x1f); ctxt-have_exception = true; + } if (rc == X86EMUL_INTERCEPTED) return EMULATION_INTERCEPTED; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 737b4bdac41c..cd718c01cdf1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5248,6 +5248,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, ctxt-interruptibility = 0; ctxt-have_exception = false; + ctxt-exception.vector = -1; ctxt-perm_ok = false; ctxt-ud = emulation_type EMULTYPE_TRAP_UD; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitly
Always get it through emulate_exception or emulate_ts. This ensures that the ctxt-exception fields have been populated. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/emulate.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ef297919a691..4fbf4b598f92 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1549,8 +1549,7 @@ load: ctxt-ops-set_segment(ctxt, selector, seg_desc, base3, seg); return X86EMUL_CONTINUE; exception: - emulate_exception(ctxt, err_vec, err_code, true); - return X86EMUL_PROPAGATE_FAULT; + return emulate_exception(ctxt, err_vec, err_code, true); } static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, @@ -2723,8 +2722,7 @@ static int emulator_do_task_switch(struct x86_emulate_ctxt *ctxt, if (!next_tss_desc.p || ((desc_limit 0x67 (next_tss_desc.type 8)) || desc_limit 0x2b)) { - emulate_ts(ctxt, tss_selector 0xfffc); - return X86EMUL_PROPAGATE_FAULT; + return emulate_ts(ctxt, tss_selector 0xfffc); } if (reason == TASK_SWITCH_IRET || reason == TASK_SWITCH_JMP) { @@ -3016,7 +3014,7 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt) ctxt-dst.val = swab64(ctxt-src.val); break; default: - return X86EMUL_PROPAGATE_FAULT; + BUG(); } return X86EMUL_CONTINUE; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it
On 08/20/2014 08:01 AM, Chen Gang wrote: By the way, at present, I use Qemu as user mode program, is there common test with both Qemu and KVM/Xen? And is a PC enough for the common test? Oh, I find Qemu have make check just like gcc/binutils, so for each of my patch, next, shall run ./configure make make check at least. And also welcome any additional ideas, suggestions or completions about test for kvm/xen/qemu. Thanks. On 08/20/2014 07:58 AM, Chen Gang wrote: On 08/19/2014 11:49 PM, Paolo Bonzini wrote: Il 19/08/2014 17:44, Chen Gang ha scritto: Hello maintainers: Please help check this patch, when you have time. Hi, it's already on its way to 3.17-rc2, but I first have to run a bunch of tests. OK, thanks. Also can let me try the test, although I am not quite familiar with KVM. Since I plan to focus on KVM/Xen next, I shall construct related environments for its' common test, at least. I am just constructing the gcc common test environments under a new PC, is a PC also enough for KVM/Xen common test? Welcome any ideas, suggestions or completions about it (especially the information about KVM/Xen common test). Thanks. -- Chen Gang Open share and attitude like air water and life which God blessed -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: do not check CS.DPL against RPL during task switch
This reverts the check added by commit 5045b468037d (KVM: x86: check CS.DPL against RPL during task switch, 2014-05-15). Although the CS.DPL=CS.RPL check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists invalid TSS conditions which cause #TSS exceptions. In fact it causes some tests to fail, which pass on bare-metal. Keep the rest of the commit, since we will find new uses for it in 3.18. Reported-by: Nadav Amit na...@cs.technion.ac.il Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/emulate.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index ef117b842334..03954f7900f5 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1491,9 +1491,6 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, goto exception; break; case VCPU_SREG_CS: - if (in_task_switch rpl != dpl) - goto exception; - if (!(seg_desc.type 8)) goto exception; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: raise invalid TSS exceptions during a task switch
Conditions that would usually trigger a general protection fault should instead raise #TS. Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/emulate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 03954f7900f5..ef297919a691 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -1468,7 +1468,7 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt, return ret; err_code = selector 0xfffc; - err_vec = GP_VECTOR; + err_vec = in_task_switch ? TS_VECTOR : GP_VECTOR; /* can't load system descriptor into segment selector */ if (seg = VCPU_SREG_GS !seg_desc.s) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM changes for 3.17-rc2
Linus, The following changes since commit 7d1311b93e58ed55f3a31cc8f94c4b8fe988a2b9: Linux 3.17-rc1 (2014-08-16 10:40:26 -0600) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to 30d1e0e806e5b2fadc297ba78f2d7afd6ba309cf: virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it (2014-08-19 15:12:28 +0200) Reverting a 3.16 patch, fixing two bugs in device assignment (one has a CVE), and fixing some problems introduced during the merge window (the CMA bug came in via Andrew, the x86 ones via yours truly). Alexey Kardashevskiy (1): PC, KVM, CMA: Fix regression caused by wrong get_order() use Chen Gang (1): virt/kvm/assigned-dev.c: Set 'dev-irq_source_id' to '-1' after free it Michael S. Tsirkin (1): kvm: iommu: fix the third parameter of kvm_iommu_put_pages (CVE-2014-3601) Nadav Amit (1): KVM: x86: Avoid emulating instructions on #UD mistakenly Paolo Bonzini (2): KVM: x86: do not check CS.DPL against RPL during task switch Revert KVM: x86: Increase the number of fixed MTRR regs to 10 arch/powerpc/kvm/book3s_hv_builtin.c | 6 +++--- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/emulate.c | 11 --- virt/kvm/assigned-dev.c | 4 +++- virt/kvm/iommu.c | 19 ++- 5 files changed, 21 insertions(+), 21 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: x86: Keep masked bits unmodified on kvm_set_shared_msr
Currently, when an msr is updated using kvm_set_shared_msr the masked bits are zeroed. This behavior is currently valid since the only MSR with partial mask is EFER, in which only SCE might be unmasked. However, using the kvm_set_shared_msr for other purposes becomes impossible. This patch keeps the masked bits unmodified while setting a shared msr. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5f5edb6..ee42410 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -236,6 +236,7 @@ void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask) if (((value ^ smsr-values[slot].curr) mask) == 0) return; + value = (smsr-values[slot].curr ~mask) | (value mask); smsr-values[slot].curr = value; wrmsrl(shared_msrs_global.msrs[slot], value); if (!smsr-registered) { -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] KVM: VMX: abstract ple_window modifiers
2014-08-20 09:02+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: They were almost identical and thus merged with a loathable macro. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- This solution is hopefully more acceptable than function pointers. I think a little amount of duplication is not a problem. Ok, I'll drop this patch from from v2. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
2014-08-20 09:16+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: Every increase of ple_window_grow creates potential overflows. They are not serious, because we clamp ple_window and userspace is expected to fix ple_window_max within a second. --- I think avoiding overflows is better. In fact, I think you should call this function for ple_window_max too. (Ack, I just wanted to avoid the worst userspace error, which is why PW_max hasn't changed when PW_grow got smaller and we could overflow.) You could keep the ple_window_max variable to the user-set value. Whenever ple_window_grow or ple_window_max are changed, you can set an internal variable (let's call it ple_window_actual_max, but I'm not wed to this name) to the computed value, and then do: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow; else new = max(ple_window_actual_max, old) + ple_window_grow; Oh, I like that this can get rid of all overflows, ple_window_actual_max (PW_effective_max?) is going to be set to ple_window_max [/-] ple_window_grow in v2. (I think the || in the first if can be eliminated with some creativity in clamp_ple_window_max). To do it, we'll want to intercept changes to ple_window as well. (I disliked this patch a lot even before :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/9] KVM: VMX: clamp PLE window
2014-08-20 09:18+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: Modifications could get unwanted values of PLE window. (low or negative) Use ple_window and the maximal value that cannot overflow as bounds. ple_window_max defaults to a very high value, but it would make sense to set it to some fraction of the scheduler tick. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- Please introduce a dynamic overflow-avoiding ple_window_max (like what you have in patch 9) already in patch 4... static void shrink_ple_window(struct kvm_vcpu *vcpu) @@ -5720,7 +5724,7 @@ static void shrink_ple_window(struct kvm_vcpu *vcpu) else new = old - ple_window_shrink; - vmx-ple_window = new; + vmx-ple_window = max(new, ple_window); ... and also squash this in patch 4. This patch can then introduce the ple_window_max module parameter (using module_param_cb to avoid overflows). Will do. --- It is going to make the patches slightly harder to review; Are we doing it because git doesn't bisect on series boundaries? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/9] KVM: VMX: make PLE window per-vcpu
2014-08-20 09:13+0200, Paolo Bonzini: Il 19/08/2014 22:35, Radim Krčmář ha scritto: enum segment_cache_field { @@ -4403,6 +4406,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) if (ple_gap) { vmcs_write32(PLE_GAP, ple_gap); vmcs_write32(PLE_WINDOW, ple_window); Is this necessary? V2, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/9] KVM: add kvm_arch_sched_in
2014-08-20 09:47+0200, Christian Borntraeger: On 19/08/14 22:35, Radim Krčmář wrote: --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) if (vcpu-preempted) vcpu-preempted = false; + kvm_arch_sched_in(vcpu, cpu); + kvm_arch_vcpu_load(vcpu, cpu); } Why cant you reuse kvm_arch_vcpu_load? Its also called on each sched_in and is architecture specific. kvm_arch_vcpu_load is also called from kvm_vcpu_ioctl, so we'd be shrinking unnecessarily. (sched_in gives us a bit of useful information about the state of the system, kvm_vcpu_ioctl not that much.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
Il 20/08/2014 14:41, Radim Krčmář ha scritto: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow; else new = max(ple_window_actual_max, old) + ple_window_grow; Oh, I like that this can get rid of all overflows, ple_window_actual_max (PW_effective_max?) is going to be set to ple_window_max [/-] ple_window_grow in v2. (I think the || in the first if can be eliminated with some creativity in clamp_ple_window_max). To do it, we'll want to intercept changes to ple_window as well. (I disliked this patch a lot even before :) What about setting ple_window_actual_max to 0 if ple_window_grow is 0 (instead of just returning)? Then the if (ple_window_actual_max ple_window) will always fail and you'll go through new = ple_window. But perhaps it's more gross and worthless than creative. :) Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/6] KVM: PPC: Move ONE_REG AltiVec support to powerpc
Move ONE_REG AltiVec support to powerpc generic layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - split ONE_REG powerpc generic and ONE_REG AltiVec v3: - make ONE_REG AltiVec support powerpc generic v2: - add comment describing VCSR register representation in KVM vs kernel arch/powerpc/include/uapi/asm/kvm.h | 5 + arch/powerpc/kvm/book3s.c | 42 - arch/powerpc/kvm/powerpc.c | 42 + 3 files changed, 47 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 3ca357a..ab4d473 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -476,6 +476,11 @@ struct kvm_get_htab_header { /* FP and vector status/control registers */ #define KVM_REG_PPC_FPSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80) +/* + * VSCR register is documented as a 32-bit register in the ISA, but it can + * only be accesses via a vector register. Expose VSCR as a 32-bit register + * even though the kernel represents it as a 128-bit vector. + */ #define KVM_REG_PPC_VSCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81) /* Virtual processor areas */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 26868e2..1b5adda 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -558,25 +558,6 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_FPSCR: *val = get_reg_val(id, vcpu-arch.fp.fpscr); break; -#ifdef CONFIG_ALTIVEC - case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0]; - break; - case KVM_REG_PPC_VSCR: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]); - break; - case KVM_REG_PPC_VRSAVE: - *val = get_reg_val(id, vcpu-arch.vrsave); - break; -#endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { @@ -653,29 +634,6 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_FPSCR: vcpu-arch.fp.fpscr = set_reg_val(id, *val); break; -#ifdef CONFIG_ALTIVEC - case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0] = val-vval; - break; - case KVM_REG_PPC_VSCR: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vr.vscr.u[3] = set_reg_val(id, *val); - break; - case KVM_REG_PPC_VRSAVE: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vrsave = set_reg_val(id, *val); - break; -#endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1326116..19d4755 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -941,6 +941,25 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) if (r == -EINVAL) { r = 0; switch (reg-id) { +#ifdef CONFIG_ALTIVEC + case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: + if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { + r = -ENXIO; + break; + } + val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0]; + break; + case KVM_REG_PPC_VSCR: + if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { + r = -ENXIO; + break; + } + val = get_reg_val(reg-id,
[PATCH] KVM: x86: Replace X86_FEATURE_NX offset with the definition
Replace reference to X86_FEATURE_NX using bit shift with the defined X86_FEATURE_NX. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/cpuid.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..f4bad87 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -112,8 +112,8 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) break; } } - if (entry (entry-edx (1 20)) !is_efer_nx()) { - entry-edx = ~(1 20); + if (entry (entry-edx bit(X86_FEATURE_NX)) !is_efer_nx()) { + entry-edx = ~bit(X86_FEATURE_NX); printk(KERN_INFO kvm: guest NX capability removed\n); } } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 6/6] KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs
Add ONE_REG support for IVPR and IVORs registers. Implement IVPR, IVORs 0-15 and 35 in booke common layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - add ONE_REG IVPR - use IVPR, IVOR2 and IVOR8 setters - add api documentation for ONE_REG IVPR and IVORs v3: - new patch Documentation/virtual/kvm/api.txt | 7 ++ arch/powerpc/include/uapi/asm/kvm.h | 25 +++ arch/powerpc/kvm/booke.c| 145 arch/powerpc/kvm/e500.c | 42 ++- arch/powerpc/kvm/e500mc.c | 16 5 files changed, 233 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index beae3fd..cd7b171 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1917,6 +1917,13 @@ registers, find a list below: PPC | KVM_REG_PPC_TM_VSCR | 32 PPC | KVM_REG_PPC_TM_DSCR | 64 PPC | KVM_REG_PPC_TM_TAR| 64 + PPC | KVM_REG_PPC_IVPR | 64 + PPC | KVM_REG_PPC_IVOR0 | 32 + ... + PPC | KVM_REG_PPC_IVOR15| 32 + PPC | KVM_REG_PPC_IVOR32| 32 + ... + PPC | KVM_REG_PPC_IVOR37| 32 | | MIPS | KVM_REG_MIPS_R0 | 64 ... diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ab4d473..c97f119 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -564,6 +564,31 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba) #define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb) +/* Booke IVPR IVOR registers */ +#define KVM_REG_PPC_IVPR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc) +#define KVM_REG_PPC_IVOR0 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbd) +#define KVM_REG_PPC_IVOR1 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbe) +#define KVM_REG_PPC_IVOR2 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) +#define KVM_REG_PPC_IVOR3 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc0) +#define KVM_REG_PPC_IVOR4 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc1) +#define KVM_REG_PPC_IVOR5 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc2) +#define KVM_REG_PPC_IVOR6 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc3) +#define KVM_REG_PPC_IVOR7 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc4) +#define KVM_REG_PPC_IVOR8 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc5) +#define KVM_REG_PPC_IVOR9 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc6) +#define KVM_REG_PPC_IVOR10 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc7) +#define KVM_REG_PPC_IVOR11 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc8) +#define KVM_REG_PPC_IVOR12 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc9) +#define KVM_REG_PPC_IVOR13 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xca) +#define KVM_REG_PPC_IVOR14 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcb) +#define KVM_REG_PPC_IVOR15 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcc) +#define KVM_REG_PPC_IVOR32 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcd) +#define KVM_REG_PPC_IVOR33 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xce) +#define KVM_REG_PPC_IVOR34 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcf) +#define KVM_REG_PPC_IVOR35 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd0) +#define KVM_REG_PPC_IVOR36 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd1) +#define KVM_REG_PPC_IVOR37 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd2) + /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index d4df648..1cb2a2a 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1570,6 +1570,75 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, int r = 0; switch (id) { + case KVM_REG_PPC_IVPR: + *val = get_reg_val(id, vcpu-arch.ivpr); + break; + case KVM_REG_PPC_IVOR0: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL]); + break; + case KVM_REG_PPC_IVOR1: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK]); + break; + case KVM_REG_PPC_IVOR2: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE]); + break; + case KVM_REG_PPC_IVOR3: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE]); + break; + case KVM_REG_PPC_IVOR4: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_EXTERNAL]); + break; + case KVM_REG_PPC_IVOR5: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_ALIGNMENT]); + break; + case
[PATCH v4 0/6] KVM: PPC: Book3e: AltiVec support
Add KVM Book3e AltiVec support. Changes: v4: - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC - remove SPE handlers from bookehv - split ONE_REG powerpc generic and ONE_REG AltiVec - add setters for IVPR, IVOR2 and IVOR8 - add api documentation for ONE_REG IVPR and IVORs - don't enable e6500 core since hardware threads are not yet supported v3: - use distinct SPE/AltiVec exception handlers - make ONE_REG AltiVec support powerpc generic - add ONE_REG IVORs support v2: - integrate Paul's FP/VMX/VSX changes that landed in kvm-ppc-queue in January and take into account feedback Mihai Caraman (6): KVM: PPC: Book3E: Increase FPU laziness KVM: PPC: Book3e: Add AltiVec support KVM: PPC: Make ONE_REG powerpc generic KVM: PPC: Move ONE_REG AltiVec support to powerpc KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs Documentation/virtual/kvm/api.txt | 7 + arch/powerpc/include/uapi/asm/kvm.h | 30 +++ arch/powerpc/kvm/book3s.c | 151 -- arch/powerpc/kvm/booke.c | 371 -- arch/powerpc/kvm/booke.h | 43 +--- arch/powerpc/kvm/booke_emulate.c | 15 +- arch/powerpc/kvm/bookehv_interrupts.S | 9 +- arch/powerpc/kvm/e500.c | 42 +++- arch/powerpc/kvm/e500_emulate.c | 20 ++ arch/powerpc/kvm/e500mc.c | 18 +- arch/powerpc/kvm/powerpc.c| 97 + 11 files changed, 576 insertions(+), 227 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/6] KVM: PPC: Make ONE_REG powerpc generic
Make ONE_REG generic for server and embedded architectures by moving kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions to powerpc layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - split ONE_REG powerpc generic and ONE_REG AltiVec v3: - make ONE_REG AltiVec support powerpc generic v2: - add comment describing VCSR register representation in KVM vs kernel arch/powerpc/kvm/book3s.c | 121 +++-- arch/powerpc/kvm/booke.c | 91 +- arch/powerpc/kvm/powerpc.c | 55 + 3 files changed, 138 insertions(+), 129 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index dd03f6b..26868e2 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -535,33 +535,28 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) return -ENOTSUPP; } -int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, + union kvmppc_one_reg *val) { - int r; - union kvmppc_one_reg val; - int size; + int r = 0; long int i; - size = one_reg_size(reg-id); - if (size sizeof(val)) - return -EINVAL; - - r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, reg-id, val); + r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, id, val); if (r == -EINVAL) { r = 0; - switch (reg-id) { + switch (id) { case KVM_REG_PPC_DAR: - val = get_reg_val(reg-id, kvmppc_get_dar(vcpu)); + *val = get_reg_val(id, kvmppc_get_dar(vcpu)); break; case KVM_REG_PPC_DSISR: - val = get_reg_val(reg-id, kvmppc_get_dsisr(vcpu)); + *val = get_reg_val(id, kvmppc_get_dsisr(vcpu)); break; case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31: - i = reg-id - KVM_REG_PPC_FPR0; - val = get_reg_val(reg-id, VCPU_FPR(vcpu, i)); + i = id - KVM_REG_PPC_FPR0; + *val = get_reg_val(id, VCPU_FPR(vcpu, i)); break; case KVM_REG_PPC_FPSCR: - val = get_reg_val(reg-id, vcpu-arch.fp.fpscr); + *val = get_reg_val(id, vcpu-arch.fp.fpscr); break; #ifdef CONFIG_ALTIVEC case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: @@ -569,110 +564,94 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) r = -ENXIO; break; } - val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0]; + val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0]; break; case KVM_REG_PPC_VSCR: if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { r = -ENXIO; break; } - val = get_reg_val(reg-id, vcpu-arch.vr.vscr.u[3]); + *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]); break; case KVM_REG_PPC_VRSAVE: - val = get_reg_val(reg-id, vcpu-arch.vrsave); + *val = get_reg_val(id, vcpu-arch.vrsave); break; #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { - long int i = reg-id - KVM_REG_PPC_VSR0; - val.vsxval[0] = vcpu-arch.fp.fpr[i][0]; - val.vsxval[1] = vcpu-arch.fp.fpr[i][1]; + i = id - KVM_REG_PPC_VSR0; + val-vsxval[0] = vcpu-arch.fp.fpr[i][0]; + val-vsxval[1] = vcpu-arch.fp.fpr[i][1]; } else { r = -ENXIO; } break; #endif /* CONFIG_VSX */ - case KVM_REG_PPC_DEBUG_INST: { - u32 opcode = INS_TW; - r = copy_to_user((u32 __user *)(long)reg-addr, -opcode, sizeof(u32)); + case KVM_REG_PPC_DEBUG_INST: + *val = get_reg_val(id, INS_TW); break; - } #ifdef CONFIG_KVM_XICS case KVM_REG_PPC_ICP_STATE: if (!vcpu-arch.icp) { r = -ENXIO; break; }
[PATCH v4 2/6] KVM: PPC: Book3e: Add AltiVec support
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host infrastructure so follow the same approach for AltiVec. Book3e specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which of the SPE or AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec exception handlers distinct to improve code readability. Guests have the privilege to enable AltiVec, so we always need to support AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore the unit context. KVM will be loaded on cores with AltiVec unit only if CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC - remove SPE handlers from bookehv - update commit message v3: - use distinct SPE/AltiVec exception handlers v2: - integrate Paul's FP/VMX/VSX changes arch/powerpc/kvm/booke.c | 74 ++- arch/powerpc/kvm/booke.h | 6 +++ arch/powerpc/kvm/bookehv_interrupts.S | 9 + arch/powerpc/kvm/e500_emulate.c | 20 ++ 4 files changed, 101 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 91e7217..8ace612 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -168,6 +168,40 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +/* + * Simulate AltiVec unavailable fault to load guest state + * from thread to AltiVec unit. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + if (!(current-thread.regs-msr MSR_VEC)) { + enable_kernel_altivec(); + load_vr_state(vcpu-arch.vr); + current-thread.vr_save_area = vcpu-arch.vr; + current-thread.regs-msr |= MSR_VEC; + } + } +#endif +} + +/* + * Save guest vcpu AltiVec state into thread. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_save_guest_altivec(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + if (current-thread.regs-msr MSR_VEC) + giveup_altivec(current); + current-thread.vr_save_area = NULL; + } +#endif +} + static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { /* Synchronize guest's desire to get debug interrupts into shadow MSR */ @@ -375,9 +409,15 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: +#ifdef CONFIG_SPE_POSSIBLE case BOOKE_IRQPRIO_SPE_UNAVAIL: case BOOKE_IRQPRIO_SPE_FP_DATA: case BOOKE_IRQPRIO_SPE_FP_ROUND: +#endif +#ifdef CONFIG_ALTIVEC + case BOOKE_IRQPRIO_ALTIVEC_UNAVAIL: + case BOOKE_IRQPRIO_ALTIVEC_ASSIST: +#endif case BOOKE_IRQPRIO_AP_UNAVAIL: allowed = 1; msr_mask = MSR_CE | MSR_ME | MSR_DE; @@ -697,6 +737,17 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif +#ifdef CONFIG_ALTIVEC + /* Save userspace AltiVec state in stack */ + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + enable_kernel_altivec(); + /* +* Since we can't trap on MSR_VEC in GS-mode, we consider the guest +* as always using the AltiVec. +*/ + kvmppc_load_guest_altivec(vcpu); +#endif + /* Switch to guest debug context */ debug = vcpu-arch.dbg_reg; switch_booke_debug_regs(debug); @@ -719,6 +770,10 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_save_guest_fp(vcpu); #endif +#ifdef CONFIG_ALTIVEC + kvmppc_save_guest_altivec(vcpu); +#endif + out: vcpu-mode = OUTSIDE_GUEST_MODE; return ret; @@ -1025,7 +1080,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND); r = RESUME_GUEST; break; -#else +#elif defined(CONFIG_SPE_POSSIBLE) case BOOKE_INTERRUPT_SPE_UNAVAIL: /* * Guest wants SPE, but host kernel doesn't support it. Send @@ -1046,6 +1101,22 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-hw.hardware_exit_reason = exit_nr; r = RESUME_HOST; break; +#endif /* CONFIG_SPE_POSSIBLE
[PATCH v4 5/6] KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation
Add setter functions for IVPR, IVOR2 and IVOR8 emulation in preparation for ONE_REG support. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - new patch - add api documentation for ONE_REG IVPR and IVORs arch/powerpc/kvm/booke.c | 24 arch/powerpc/kvm/booke.h | 3 +++ arch/powerpc/kvm/booke_emulate.c | 15 +++ 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 831c1b4..d4df648 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1782,6 +1782,30 @@ void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits) update_timer_ints(vcpu); } +void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr) +{ + vcpu-arch.ivpr = new_ivpr; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVPR, new_ivpr); +#endif +} + +void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor) +{ + vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = new_ivor; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVOR2, new_ivor); +#endif +} + +void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor) +{ + vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = new_ivor; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVOR8, new_ivor); +#endif +} + void kvmppc_decrementer_func(unsigned long data) { struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 22ba08e..0242530 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -80,6 +80,9 @@ void kvmppc_set_epcr(struct kvm_vcpu *vcpu, u32 new_epcr); void kvmppc_set_tcr(struct kvm_vcpu *vcpu, u32 new_tcr); void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits); void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits); +void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr); +void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor); +void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor); int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance); diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 92bc668..94c64e3 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -191,10 +191,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) break; case SPRN_IVPR: - vcpu-arch.ivpr = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVPR, spr_val); -#endif + kvmppc_set_ivpr(vcpu, spr_val); break; case SPRN_IVOR0: vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL] = spr_val; @@ -203,10 +200,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK] = spr_val; break; case SPRN_IVOR2: - vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVOR2, spr_val); -#endif + kvmppc_set_ivor2(vcpu, spr_val); break; case SPRN_IVOR3: vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE] = spr_val; @@ -224,10 +218,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) vcpu-arch.ivor[BOOKE_IRQPRIO_FP_UNAVAIL] = spr_val; break; case SPRN_IVOR8: - vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVOR8, spr_val); -#endif + kvmppc_set_ivor8(vcpu, spr_val); break; case SPRN_IVOR9: vcpu-arch.ivor[BOOKE_IRQPRIO_AP_UNAVAIL] = spr_val; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/6] KVM: PPC: Book3E: Increase FPU laziness
Increase FPU laziness by loading the guest state into the unit before entering the guest instead of doing it on each vcpu schedule. Without this improvement an interrupt may claim floating point corrupting guest state. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - update commit message v3: - no changes v2: - remove fpu_active - add descriptive comments arch/powerpc/kvm/booke.c | 43 --- arch/powerpc/kvm/booke.h | 34 -- arch/powerpc/kvm/e500mc.c | 2 -- 3 files changed, 36 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 074b7fc..91e7217 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -124,6 +124,40 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu) } #endif +/* + * Load up guest vcpu FP state if it's needed. + * It also set the MSR_FP in thread so that host know + * we're holding FPU, and then host can help to save + * guest vcpu FP state if other threads require to use FPU. + * This simulates an FP unavailable fault. + * + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_PPC_FPU + if (!(current-thread.regs-msr MSR_FP)) { + enable_kernel_fp(); + load_fp_state(vcpu-arch.fp); + current-thread.fp_save_area = vcpu-arch.fp; + current-thread.regs-msr |= MSR_FP; + } +#endif +} + +/* + * Save guest vcpu FP state into thread. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_PPC_FPU + if (current-thread.regs-msr MSR_FP) + giveup_fpu(current); + current-thread.fp_save_area = NULL; +#endif +} + static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) { #if defined(CONFIG_PPC_FPU) !defined(CONFIG_KVM_BOOKE_HV) @@ -658,12 +692,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) /* * Since we can't trap on MSR_FP in GS-mode, we consider the guest -* as always using the FPU. Kernel usage of FP (via -* enable_kernel_fp()) in this thread must not occur while -* vcpu-fpu_active is set. +* as always using the FPU. */ - vcpu-fpu_active = 1; - kvmppc_load_guest_fp(vcpu); #endif @@ -687,8 +717,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #ifdef CONFIG_PPC_FPU kvmppc_save_guest_fp(vcpu); - - vcpu-fpu_active = 0; #endif out: @@ -1194,6 +1222,7 @@ out: else { /* interrupts now hard-disabled */ kvmppc_fix_ee_before_entry(); + kvmppc_load_guest_fp(vcpu); } } diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index f753543..e73d513 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -116,40 +116,6 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val); -/* - * Load up guest vcpu FP state if it's needed. - * It also set the MSR_FP in thread so that host know - * we're holding FPU, and then host can help to save - * guest vcpu FP state if other threads require to use FPU. - * This simulates an FP unavailable fault. - * - * It requires to be called with preemption disabled. - */ -static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_PPC_FPU - if (vcpu-fpu_active !(current-thread.regs-msr MSR_FP)) { - enable_kernel_fp(); - load_fp_state(vcpu-arch.fp); - current-thread.fp_save_area = vcpu-arch.fp; - current-thread.regs-msr |= MSR_FP; - } -#endif -} - -/* - * Save guest vcpu FP state into thread. - * It requires to be called with preemption disabled. - */ -static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_PPC_FPU - if (vcpu-fpu_active (current-thread.regs-msr MSR_FP)) - giveup_fpu(current); - current-thread.fp_save_area = NULL; -#endif -} - static inline void kvmppc_clear_dbsr(void) { mtspr(SPRN_DBSR, mfspr(SPRN_DBSR)); diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 000cf82..4549349 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -145,8 +145,6 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) kvmppc_e500_tlbil_all(vcpu_e500); __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu; } - - kvmppc_load_guest_fp(vcpu); } static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu *vcpu) -- 1.7.11.7 -- To unsubscribe from this list:
[PATCH 2/2] KVM: vmx: Reflect misc_enables in real CPU
IA32_MISC_ENABLE MSR has two bits that affect the actual results which can be observed by the guest: fast string enable, and FOPCODE compatibility. Guests may wish to change the default settings of these bits. Linux usually enables fast-string by default. However, when fast string is enabled data breakpoints are only recognized on boundaries between data-groups. On some old CPUs enabling fast-string also resulted in single-step not occurring upon each iteration. FOPCODE compatibility can be used to analyze program performance by recording the last instruction executed before FSAVE/FSTENV/FXSAVE. This patch saves and restores these bits in IA32_MISC_ENABLE if they are supported upon entry to guest and exit to userspace respectively. To avoid possible issues, fast-string can only be enabled by the guest if the host enabled them. The physical CPU version is checked to ensure no shared bits are reconfigured in the process. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm.c | 7 ++ arch/x86/kvm/vmx.c | 56 + arch/x86/kvm/x86.c | 2 +- 4 files changed, 65 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4bda61b..879b930 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -699,6 +699,7 @@ struct kvm_x86_ops { void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long cr3); int (*set_cr4)(struct kvm_vcpu *vcpu, unsigned long cr4); void (*set_efer)(struct kvm_vcpu *vcpu, u64 efer); + void (*set_misc_enable)(struct kvm_vcpu *vcpu, u64 data); void (*get_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); void (*set_idt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1f49c86..378e50e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -480,6 +480,11 @@ static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) mark_dirty(to_svm(vcpu)-vmcb, VMCB_CR); } +static void svm_set_misc_enable(struct kvm_vcpu *vcpu, u64 data) +{ + vcpu-arch.ia32_misc_enable_msr = data; +} + static int is_external_interrupt(u32 info) { info = SVM_EVTINJ_TYPE_MASK | SVM_EVTINJ_VALID; @@ -1152,6 +1157,7 @@ static void init_vmcb(struct vcpu_svm *svm) init_sys_seg(save-tr, SEG_TYPE_BUSY_TSS16); svm_set_efer(svm-vcpu, 0); + svm_set_misc_enable(svm-vcpu, 0); save-dr6 = 0x0ff0; kvm_set_rflags(svm-vcpu, 2); save-rip = 0xfff0; @@ -4338,6 +4344,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_cr3 = svm_set_cr3, .set_cr4 = svm_set_cr4, .set_efer = svm_set_efer, + .set_misc_enable = svm_set_misc_enable, .get_idt = svm_get_idt, .set_idt = svm_set_idt, .get_gdt = svm_get_gdt, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 45bab55..2d2efd0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -809,6 +809,8 @@ static const struct kvm_vmx_segment_field { }; static u64 host_efer; +static u64 host_misc_enable; +static u64 guest_misc_enable_mask; static void ept_save_pdptrs(struct kvm_vcpu *vcpu); @@ -1609,6 +1611,33 @@ static void reload_tss(void) load_TR_desc(); } +static void __init update_guest_misc_enable_mask(void) +{ + /* Calculating which of the IA32_MISC_ENABLE bits should be reflected + in hardware */ + struct cpuinfo_x86 *c = boot_cpu_data; + u64 data; + + guest_misc_enable_mask = 0; + + /* Core/Atom architecture share fast-string and x86 compat */ + if (c-x86 != 6 || c-x86_model 0xd) + return; + + if (rdmsrl_safe(MSR_IA32_MISC_ENABLE, data) 0) + return; + if (boot_cpu_has(X86_FEATURE_REP_GOOD)) + guest_misc_enable_mask |= MSR_IA32_MISC_ENABLE_FAST_STRING; + + preempt_disable(); + if (wrmsrl_safe(MSR_IA32_MISC_ENABLE, + data | MSR_IA32_MISC_ENABLE_X87_COMPAT) = 0) { + guest_misc_enable_mask |= MSR_IA32_MISC_ENABLE_X87_COMPAT; + wrmsrl(MSR_IA32_MISC_ENABLE, data); + } + preempt_enable(); +} + static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) { u64 guest_efer; @@ -3126,6 +3155,8 @@ static __init int hardware_setup(void) if (!cpu_has_vmx_apicv()) enable_apicv = 0; + update_guest_misc_enable_mask(); + if (enable_apicv) kvm_x86_ops-update_cr8_intercept = NULL; else { @@ -3315,6 +3346,28 @@ static void vmx_set_efer(struct kvm_vcpu *vcpu, u64 efer) setup_msrs(vmx); } +static void vmx_set_misc_enable(struct kvm_vcpu *vcpu, u64 data) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + +
[PATCH] x86: Test debug exceptions with disabled fast-string
x86 allows to enable fast strings, sacrificing the precision of debug watchpoints. Previously, KVM did not reflect the guest fast strings settings in the actual MSR, resulting always in imprecise exception. This test checks whether disabled fast strings causes the debug trap on rep-string to occur on the precise iteration. A debug watchpoint which is not cache-line aligned is set, and 128 bytes are set using rep-string operation. The iteration in which the debug exception occurred is then checked. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- x86/debug.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/x86/debug.c b/x86/debug.c index 34e56fb..eb96dbe 100644 --- a/x86/debug.c +++ b/x86/debug.c @@ -11,10 +11,13 @@ #include libcflat.h #include desc.h +#include msr.h +#include processor.h -static volatile unsigned long bp_addr[10], dr6[10]; +static volatile unsigned long bp_addr[10], dr6[10], rcx[10]; static volatile unsigned int n; static volatile unsigned long value; +static unsigned char dst[128] __attribute__ ((aligned(64))); static unsigned long get_dr6(void) { @@ -43,6 +46,7 @@ static void handle_db(struct ex_regs *regs) { bp_addr[n] = regs-rip; dr6[n] = get_dr6(); + rcx[n] = regs-rcx; if (dr6[n] 0x1) regs-rflags |= (1 16); @@ -60,7 +64,7 @@ static void handle_bp(struct ex_regs *regs) int main(int ac, char **av) { - unsigned long start; + unsigned long start, misc_enable; setup_idt(); handle_exception(DB_VECTOR, handle_db); @@ -109,5 +113,18 @@ hw_wp: n == 1 bp_addr[0] == ((unsigned long)hw_wp) dr6[0] == 0x4ff2); + misc_enable = rdmsr(MSR_IA32_MISC_ENABLE); + wrmsr(MSR_IA32_MISC_ENABLE, + misc_enable ~MSR_IA32_MISC_ENABLE_FAST_STRING); + + n = 0; + set_dr1((void *)dst[59]); + set_dr7(0x0010040a); + + asm volatile(rep stosb\n\t : : D(dst), c(128) : cc, memory); + + report(hw watchpoint with disabled fast-string, rcx[0] == 128-1-59); + wrmsr(MSR_IA32_MISC_ENABLE, misc_enable); + return report_summary(); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86: update cpuid according to IA32_MISC_ENABLE
Virtual BIOS may use the Limit CPUID Maxval and XD Bit Disable fields in IA32_MISC_ENABLE. These two fields update the CPUID, and in the case of XD Bit Disable also disable NX support. This patch reflects this behavior in CPUID, and disables NX bit accordingly. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/cpuid.c | 20 arch/x86/kvm/vmx.c | 8 ++-- 2 files changed, 26 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..ff7f429 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -757,6 +757,25 @@ static struct kvm_cpuid_entry2* check_cpuid_limit(struct kvm_vcpu *vcpu, return kvm_find_cpuid_entry(vcpu, maxlevel-eax, index); } +static void cpuid_override(struct kvm_vcpu *vcpu, u32 function, u32 index, + u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) +{ + switch (function) { + case 0: + if (vcpu-arch.ia32_misc_enable_msr + MSR_IA32_MISC_ENABLE_LIMIT_CPUID) + *eax = min_t(u32, *eax, 3); + break; + case 1: + if (vcpu-arch.ia32_misc_enable_msr + MSR_IA32_MISC_ENABLE_XD_DISABLE) + *edx = ~bit(X86_FEATURE_NX); + break; + default: + break; + } +} + void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) { u32 function = *eax, index = *ecx; @@ -774,6 +793,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, u32 *ecx, u32 *edx) *edx = best-edx; } else *eax = *ebx = *ecx = *edx = 0; + cpuid_override(vcpu, function, index, eax, ebx, ecx, edx); trace_kvm_cpuid(function, *eax, *ebx, *ecx, *edx); } EXPORT_SYMBOL_GPL(kvm_cpuid); diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index cad37d5..45bab55 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1633,9 +1633,13 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset) vmx-guest_msrs[efer_offset].mask = ~ignore_bits; clear_atomic_switch_msr(vmx, MSR_EFER); + /* Clear nx according if xd_disable is on */ + guest_efer = vmx-vcpu.arch.efer; + if (vmx-vcpu.arch.ia32_misc_enable_msr + MSR_IA32_MISC_ENABLE_XD_DISABLE) + guest_efer = ~EFER_NX; /* On ept, can't emulate nx, and must switch nx atomically */ - if (enable_ept ((vmx-vcpu.arch.efer ^ host_efer) EFER_NX)) { - guest_efer = vmx-vcpu.arch.efer; + if (enable_ept ((guest_efer ^ host_efer) EFER_NX)) { if (!(guest_efer EFER_LMA)) guest_efer = ~EFER_LME; add_atomic_switch_msr(vmx, MSR_EFER, guest_efer, host_efer); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: Replace X86_FEATURE_NX offset with the definition
Il 20/08/2014 15:38, Nadav Amit ha scritto: Replace reference to X86_FEATURE_NX using bit shift with the defined X86_FEATURE_NX. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/kvm/cpuid.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 38a0afe..f4bad87 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -112,8 +112,8 @@ static void cpuid_fix_nx_cap(struct kvm_vcpu *vcpu) break; } } - if (entry (entry-edx (1 20)) !is_efer_nx()) { - entry-edx = ~(1 20); + if (entry (entry-edx bit(X86_FEATURE_NX)) !is_efer_nx()) { + entry-edx = ~bit(X86_FEATURE_NX); printk(KERN_INFO kvm: guest NX capability removed\n); } } Applying, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] KVM: vmx: Supporting IA32_MISC_ENABLE MSR
IA32_MISC_ENABLE MSR has several bits that affect the behavior of the CPU and are visible to the guest. This patch-set makes the vCPU to behave correctly in respect to: Fast string enable, FPU Fopcode compatibilty mode, limit CPUID max-val and XD-bit disable. Thanks for reviewing the patch. Nadav Amit (2): KVM: x86: update cpuid according to IA32_MISC_ENABLE KVM: vmx: Reflect misc_enables in real CPU arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/cpuid.c| 20 + arch/x86/kvm/svm.c | 7 + arch/x86/kvm/vmx.c | 64 +++-- arch/x86/kvm/x86.c | 2 +- 5 files changed, 91 insertions(+), 3 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: balloon: fix overflow in PFN to address conversion
Fix trivial overflow of u32 value Signed-off-by: Konstantin Khlebnikov koc...@gmail.com --- tools/kvm/virtio/balloon.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/kvm/virtio/balloon.c b/tools/kvm/virtio/balloon.c index f7dfb0b..84c4bb0 100644 --- a/tools/kvm/virtio/balloon.c +++ b/tools/kvm/virtio/balloon.c @@ -64,7 +64,7 @@ static bool virtio_bln_do_io_request(struct kvm *kvm, struct bln_dev *bdev, stru for (i = 0 ; i len ; i++) { void *guest_ptr; - guest_ptr = guest_flat_to_host(kvm, ptrs[i] VIRTIO_BALLOON_PFN_SHIFT); + guest_ptr = guest_flat_to_host(kvm, (u64)ptrs[i] VIRTIO_BALLOON_PFN_SHIFT); if (queue == bdev-vqs[VIRTIO_BLN_INFLATE]) { madvise(guest_ptr, 1 VIRTIO_BALLOON_PFN_SHIFT, MADV_DONTNEED); bdev-config.actual++; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v6 2/7] random, timekeeping: Collect timekeeping entropy in the timekeeping code
On Thu, Aug 14, 2014 at 12:43 AM, Andy Lutomirski l...@amacapital.net wrote: Currently, init_std_data calls ktime_get_real(). This imposes awkward constraints on when init_std_data can be called, and init_std_data is unlikely to collect the full unpredictable data available to the timekeeping code, especially after resume. Remove this code from random.c and add the appropriate add_device_randomness calls to timekeeping.c instead. Cc: John Stultz john.stu...@linaro.org Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 2 -- kernel/time/timekeeping.c | 11 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 7673e60..8dc3e3a 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1263,12 +1263,10 @@ static void seed_entropy_store(void *ctx, u32 data) static void init_std_data(struct entropy_store *r) { int i; - ktime_t now = ktime_get_real(); unsigned long rv; char log_prefix[128]; r-last_pulled = jiffies; - mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 32d8d6a..9609db9 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -23,6 +23,7 @@ #include linux/stop_machine.h #include linux/pvclock_gtod.h #include linux/compiler.h +#include linux/random.h #include tick-internal.h #include ntp_internal.h @@ -835,6 +836,9 @@ void __init timekeeping_init(void) memcpy(shadow_timekeeper, timekeeper, sizeof(timekeeper)); write_seqcount_end(timekeeper_seq); + + add_device_randomness(tk, sizeof(tk)); + So I can't (and really don't want to) vouch for the correctness side of this. The initial idea of using the structure instead of reading the time worried me a bit, but we have already read the clocksource and stored it in cycle_last so there's a wee bit more then just the RTC time and a bunch of zeros in the timekeeper structure. Though on some systems the read_persistent_clock call can't access the RTC at timekeeping_init, so I'm not sure we're really getting that much more then the cycle_last clocksource value here. Probably should add something like this to the RTC hctosys logic. thanks -john -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: failed to initialize KVM: Permission denied
On 08/19/2014 02:38 PM, arnaud gaboury wrote: $ uname -r 3.16.1-1-ARCH - As a regular user, member of the libvirt group, I run this command to create a basic VM: virt-install --connect qemu:///system --name=test --ram 2048 --cpu host-model-only --os-variant=win7 --disk /myVM/test --boot cdrom,hd --virt-type kvm --graphics spice --controller scsi,model=virtio-scsi --cdrom=/drawer/myIso/w8.iso It returns an error : -- --- Starting install... ERRORinternal error: process exited while connecting to monitor: Could not access KVM kernel module: Permission denied failed to initialize KVM: Permission denied - $ getfacl /dev/kvm # file: dev/kvm # owner: root # group: kvm user::rw- user:martinus:rw- group::rw- mask::rw- other::--- The command return seems to indicate rights are correct. $ lsmod return kvm kvm_intel are loaded. If I run the virt-install with qemu:///session, I do not have this issue and can create the VM. I found many entries about the KVM permission issue, but with no clear answer to solve it. When connecting to qemu:///system, libvirt does not run VMs as your regular user. What user libvirtd uses though is dependent on how it's configured. On Fedora, qemu VMs are run as the 'qemu' user. If that's how it's configured on your distro, the above permissions would block use of /dev/kvm. Here's how permissions look on Fedora 20 for me: $ ls -l /dev/kvm crw-rw-rw-+ 1 root kvm 10, 232 Aug 8 09:51 /dev/kvm $ getfacl /dev/kvm getfacl: Removing leading '/' from absolute path names # file: dev/kvm # owner: root # group: kvm user::rw- user:crobinso:rw- group::rw- mask::rw- other::rw- Those permissive permissions are set by a udev rule installed by qemu-system-x86: $ cat /lib/udev/rules.d/80-kvm.rules KERNEL==kvm, GROUP=kvm, MODE=0666 So perhaps your distro should do the same. - Cole -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
2014-08-20 15:15+0200, Paolo Bonzini: Il 20/08/2014 14:41, Radim Krčmář ha scritto: if (ple_window_grow 1 || ple_window_actual_max ple_window) new = ple_window; else if (ple_window_grow ple_window) new = max(ple_window_actual_max, old) * ple_window_grow; else new = max(ple_window_actual_max, old) + ple_window_grow; Oh, I like that this can get rid of all overflows, ple_window_actual_max (PW_effective_max?) is going to be set to ple_window_max [/-] ple_window_grow in v2. (I think the || in the first if can be eliminated with some creativity in clamp_ple_window_max). To do it, we'll want to intercept changes to ple_window as well. (I disliked this patch a lot even before :) What about setting ple_window_actual_max to 0 if ple_window_grow is 0 (instead of just returning)? Then the if (ple_window_actual_max ple_window) will always fail and you'll go through new = ple_window. But perhaps it's more gross and worthless than creative. :) That code can't use PW directly, because PW_actual_max needs to be one PW_grow below PW_max, so I'd rather enforce minimal PW_actual_max. Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? (+ There is a bug in this patch -- clamp_ple_window_max() should be after param_set_int() ... damned unreviewed last-second changes.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? You mean because there is no spinlock or similar protecting the changes? I guess you could use a seqlock. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: virt-install: failed to initialize KVM: Permission denied
Those permissive permissions are set by a udev rule installed by qemu-system-x86: $ cat /lib/udev/rules.d/80-kvm.rules KERNEL==kvm, GROUP=kvm, MODE=0666 So perhaps your distro should do the same. I have it as /lib/udev/rules.d/65-kvm.rules. In fact, I solved my issue when setting user:group to qemu:kvm in /etc/libvirt/qemu.conf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm process soft lockups cpu, results in server crash
hi, i encountered several times my server crashes because qemu-kvm process locks up cpu. at the front conntrack drop packets, but i have no idea there is how much relationship between the stuck with dropping packets. environment: centos6.5 kernel 2.6.32-431.11.2 , x86_64, qemu-kvm-tools-0.12 thanks --log--- Aug 18 22:07:05 localhost kernel: [4625821.185649] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.192085] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.198608] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.205021] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.211432] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.217874] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.224301] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.230764] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.237219] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost kernel: [4625821.243664] nf_conntrack: table full, dropping packet. Aug 18 22:07:05 localhost ./gbalancer-0.5.1[19991]: dial tcp 10.200.86.30:3306: i/o timeout Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: dial tcp 10.200.86.31:3306: i/o timeout Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected server 10.200.86.32:3306 is down Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected server 10.200.86.31:3306 is down Aug 18 22:07:06 localhost ./gbalancer-0.5.1[19991]: wrangler: detected server 10.200.86.30:3306 is down Aug 18 22:07:07 localhost kernel: [4625822.875756] [ cut here ] Aug 18 22:07:07 localhost kernel: [4625822.881685] WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted) Aug 18 22:07:07 localhost kernel: [4625822.892399] Hardware name: IBM System x3550 M4: -[7914ON9]- Aug 18 22:07:07 localhost kernel: [4625822.899346] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out Aug 18 22:07:07 localhost kernel: [4625822.907052] Modules linked in: iptable_filter iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables tcp_diag inet_diag netconsole xt_CHECKSUM configfs ip6table_filter ip6_tables ebtable_nat ebtables cpufreq_ondemand acpi_cpufreq freq_table mperf ipmi_watchdog ipmi_poweroff ipmi_devintf bridge bonding 8021q garp stp llc ipv6 vhost_net macvtap macvlan tun kvm_intel kvm cdc_ether usbnet mii microcode iTCO_wdt iTCO_vendor_support igb dca i2c_algo_bit ptp pps_core sg ics932s401 i2c_i801 i2c_core lpc_ich mfd_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: nf_conntrack] Aug 18 22:07:07 localhost kernel: [4625822.979601] Pid: 0, comm: swapper Not tainted 2.6.32-431.11.2.el6.x86_64 #1 Aug 18 22:07:07 localhost kernel: [4625822.988093] Call Trace: Aug 18 22:07:07 localhost kernel: [4625822.991527] IRQ [81071e27] ? warn_slowpath_common+0x87/0xc0 Aug 18 22:07:07 localhost kernel: [4625822.999861] [81071f16] ? warn_slowpath_fmt+0x46/0x50 Aug 18 22:07:07 localhost kernel: [4625823.007193] [8147bc8b] ? dev_watchdog+0x26b/0x280 Aug 18 22:07:07 localhost kernel: [4625823.014239] [8105dd5c] ? scheduler_tick+0xcc/0x260 Aug 18 22:07:07 localhost kernel: [4625823.021369] [8147ba20] ? dev_watchdog+0x0/0x280 Aug 18 22:07:07 localhost kernel: [4625823.028204] [81084ae7] ? run_timer_softirq+0x197/0x340 Aug 18 22:07:07 localhost kernel: [4625823.035716] [810ac8e5] ? tick_dev_program_event+0x65/0xc0 Aug 18 22:07:07 localhost kernel: [4625823.043526] [8107a8e1] ? __do_softirq+0xc1/0x1e0 Aug 18 22:07:07 localhost kernel: [4625823.050447] [810ac9ba] ? tick_program_event+0x2a/0x30 Aug 18 22:07:07 localhost kernel: [4625823.057877] [8100c30c] ? call_softirq+0x1c/0x30 Aug 18 22:07:07 localhost kernel: [4625823.064744] [8100fa75] ? do_softirq+0x65/0xa0 Aug 18 22:07:07 localhost kernel: [4625823.071385] [8107a795] ? irq_exit+0x85/0x90 Aug 18 22:07:07 localhost kernel: [4625823.077831] [815316ca] ? smp_apic_timer_interrupt+0x4a/0x60 Aug 18 22:07:07 localhost kernel: [4625823.085832] [8100bb93] ? apic_timer_interrupt+0x13/0x20 Aug 18 22:07:07 localhost kernel: [4625823.093612] EOI [812e0bee] ? intel_idle+0xde/0x170 Aug 18 22:07:07 localhost kernel: [4625823.101047] [812e0bd1] ? intel_idle+0xc1/0x170 Aug 18 22:07:07 localhost kernel: [4625823.107876] [81426b67] ? cpuidle_idle_call+0xa7/0x140 Aug 18 22:07:07 localhost kernel: [4625823.115287] [81009fc6] ? cpu_idle+0xb6/0x110 Aug 18 22:07:07 localhost kernel: [4625823.121822] [8152143c] ? start_secondary+0x2ac/0x2ef Aug 18 22:07:07 localhost
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
2014-08-20 17:34+0200, Paolo Bonzini: Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? You mean because there is no spinlock or similar protecting the changes? I guess you could use a seqlock. Yes, for example between a modification of ple_window new = min(old, PW_actual_max) * PW_grow which gets compiled into something like this: 1) tmp = min(old, PW_actual_max) 2) new = tmp * PW_grow and a write to increase PW_grow 3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max) 4) PW_grow = new_PW_grow 5) PW_actual_max = PW_max / new_PW_grow 3 and 4 can exectute between 1 and 2, which could overflow. I don't think they are important enough to warrant a significant performance hit of locking. Or even more checks that would prevent it in a lockless way. (I'd just see that the result is set to something legal and also drop line 3, because it does not help things that much.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
Il 20/08/2014 18:01, Radim Krčmář ha scritto: 2014-08-20 17:34+0200, Paolo Bonzini: Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? You mean because there is no spinlock or similar protecting the changes? I guess you could use a seqlock. Yes, for example between a modification of ple_window new = min(old, PW_actual_max) * PW_grow which gets compiled into something like this: 1) tmp = min(old, PW_actual_max) 2) new = tmp * PW_grow and a write to increase PW_grow 3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max) 4) PW_grow = new_PW_grow 5) PW_actual_max = PW_max / new_PW_grow 3 and 4 can exectute between 1 and 2, which could overflow. I don't think they are important enough to warrant a significant performance hit of locking. A seqlock just costs two memory accesses to the same (shared) cache line as the PW data, and a non-taken branch. I don't like code that is unsafe by design... Paolo Or even more checks that would prevent it in a lockless way. (I'd just see that the result is set to something legal and also drop line 3, because it does not help things that much.) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] KVM: VMX: automatic PLE window maximum
2014-08-20 18:03+0200, Paolo Bonzini: Il 20/08/2014 18:01, Radim Krčmář ha scritto: 2014-08-20 17:34+0200, Paolo Bonzini: Il 20/08/2014 17:31, Radim Krčmář ha scritto: Btw. without extra code, we are still going to overflow on races when changing PW_grow, should they be covered as well? You mean because there is no spinlock or similar protecting the changes? I guess you could use a seqlock. Yes, for example between a modification of ple_window new = min(old, PW_actual_max) * PW_grow which gets compiled into something like this: 1) tmp = min(old, PW_actual_max) 2) new = tmp * PW_grow and a write to increase PW_grow 3) PW_actual_max = min(PW_max / new_PW_grow, PW_actual_max) 4) PW_grow = new_PW_grow 5) PW_actual_max = PW_max / new_PW_grow 3 and 4 can exectute between 1 and 2, which could overflow. I don't think they are important enough to warrant a significant performance hit of locking. A seqlock just costs two memory accesses to the same (shared) cache line as the PW data, and a non-taken branch. Oh, seqlock readers do not have to write to shared memory, so it is acceptable ... I don't like code that is unsafe by design... I wouldn't say it is unsafe, because VCPU's PW is always greater than module's PW. We are just going to PLE exit sooner than expected. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LPC IOMMU and VFIO MicroConference - Call for Participation
Ok folks, it's time to submit your discussion proposals for the LPC IOMMU and VFIO uconf. If you added an idea to the wiki, now is the time to formally propose it as a discussion topic. If you have ideas how to make the IOMMU or VFIO subsystems better, now is the time to propose it. If you can't figure out how to make something work in the current infrastructure, now is the time to propose a discussion. If you're adding new features and want to make sure we can support them, now is the time to propose a discussion. I don't think we've seen a formal schedule yet, but many of us have conflicts with KVM Forum this year and I expect the LPC planning committee to take that into account, so please submit your proposals anyway and feel free to note your availability/conflicts in the Note to organizers section. LPC is full, but there is a waiting list and the sooner you can get on it, the more likely you are to be registered. I expect uconf discussion leads to have an advantage in moving through the queue and we may be able to provide discounted registration for discussion leads. Thanks, Alex On Tue, 2014-08-12 at 11:20 +0200, Joerg Roedel wrote: LPC IOMMU and VFIO MicroConference - Call for Participation === We are pleased to announce that this year there will be the first IOMMU and VFIO MicroConference held at Linux Plumbers Conference in Düsseldorf. An initial request for support of this micro conference generated, among others, the following possible topic ideas: * Improving generic IOMMU code and move code out of drivers * IOMMU device error handling * IOMMU Power Management * Virtualizing IOMMUs * Interface between IOMMUs an memory management More suggested topics can be found at the wiki page of the micro conference: http://wiki.linuxplumbersconf.org/2014:iommu_microconference We now ask for formal proposals for these discussions along with any other topics or problems that need to be discussed in this area. The format of the micro conference will be roughly half-hour slots for each topic, where the discussion lead gives a short introduction to the problem and maybe sketches possible solutions. The rest of the slot is open for discussions so that we come to an agreement how to move forward. Please submit your formal proposal on the Linux Plumbers website (OpenID login required) until August 31st at: http://www.linuxplumbersconf.org/2014/how-to-submit-microconference-discussions-topics/ Hope to see you in Düsseldorf! Joerg Roedel and Alex Williamson -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/6] KVM: VMX: make PLE window per-VCPU
Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2b306f9..18e0e52 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -484,6 +484,9 @@ struct vcpu_vmx { /* Support for a guest hypervisor (nested VMX) */ struct nested_vmx nested; + + /* Dynamic PLE window. */ + int ple_window; }; enum segment_cache_field { @@ -4402,7 +4405,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) if (ple_gap) { vmcs_write32(PLE_GAP, ple_gap); - vmcs_write32(PLE_WINDOW, ple_window); + vmx-ple_window = ple_window; } vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0); @@ -7387,6 +7390,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vmx-emulation_required) return; + if (ple_gap) + vmcs_write32(PLE_WINDOW, vmx-ple_window); + if (vmx-nested.sync_shadow_vmcs) { copy_vmcs12_to_shadow(vmx); vmx-nested.sync_shadow_vmcs = false; -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/6] KVM: VMX: runtime knobs for dynamic PLE window
ple_window is updated on every vmentry, so there is no reason to have it read-only anymore. ple_window* weren't writable to prevent runtime overflow races; they are prevented by a seqlock. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 48 +--- 1 file changed, 37 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f63ac5d..bd73fa1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -132,24 +132,29 @@ module_param(nested, bool, S_IRUGO); #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX\ INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW +static struct kernel_param_ops param_ops_ple_t; +#define param_check_ple_t(name, p) __param_check(name, p, int) + +static DEFINE_SEQLOCK(ple_window_seqlock); + static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP; module_param(ple_gap, int, S_IRUGO); static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; -module_param(ple_window, int, S_IRUGO); +module_param(ple_window, ple_t, S_IRUGO | S_IWUSR); /* Default doubles per-vcpu window every exit. */ static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW; -module_param(ple_window_grow, int, S_IRUGO); +module_param(ple_window_grow, ple_t, S_IRUGO | S_IWUSR); /* Default resets per-vcpu window every exit to ple_window. */ static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK; -module_param(ple_window_shrink, int, S_IRUGO); +module_param(ple_window_shrink, int, S_IRUGO | S_IWUSR); /* Default is to compute the maximum so we can never overflow. */ static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; static int ple_window_max= KVM_VMX_DEFAULT_PLE_WINDOW_MAX; -module_param(ple_window_max, int, S_IRUGO); +module_param(ple_window_max, ple_t, S_IRUGO | S_IWUSR); extern const ulong vmx_return; @@ -5730,13 +5735,19 @@ static void modify_ple_window(struct kvm_vcpu *vcpu, int grow) struct vcpu_vmx *vmx = to_vmx(vcpu); int old = vmx-ple_window; int new; + unsigned seq; - if (grow) - new = __grow_ple_window(old) - else - new = __shrink_ple_window(old, ple_window_shrink, ple_window); + do { + seq = read_seqbegin(ple_window_seqlock); - vmx-ple_window = max(new, ple_window); + if (grow) + new = __grow_ple_window(old); + else + new = __shrink_ple_window(old, ple_window_shrink, + ple_window); + + vmx-ple_window = max(new, ple_window); + } while (read_seqretry(ple_window_seqlock, seq)); trace_kvm_ple_window(grow, vcpu-vcpu_id, vmx-ple_window, old); } @@ -5750,6 +5761,23 @@ static void update_ple_window_actual_max(void) ple_window_grow, INT_MIN); } +static int param_set_ple_t(const char *arg, const struct kernel_param *kp) +{ + int ret; + + write_seqlock(ple_window_seqlock); + ret = param_set_int(arg, kp); + update_ple_window_actual_max(); + write_sequnlock(ple_window_seqlock); + + return ret; +} + +static struct kernel_param_ops param_ops_ple_t = { + .set = param_set_ple_t, + .get = param_get_int, +}; + /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE * exiting, so only get here on cpu with PAUSE-Loop-Exiting. @@ -9153,8 +9181,6 @@ static int __init vmx_init(void) } else kvm_disable_tdp(); - update_ple_window_actual_max(); - return 0; out7: -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/6] KVM: VMX: dynamise PLE window
Window is increased on every PLE exit and decreased on every sched_in. The idea is that we don't want to PLE exit if there is no preemption going on. We do this with sched_in() because it does not hold rq lock. There are two new kernel parameters for changing the window: ple_window_grow and ple_window_shrink ple_window_grow affects the window on PLE exit and ple_window_shrink does it on sched_in; depending on their value, the window is modifier like this: (ple_window is kvm_intel's global) ple_window_shrink/ | ple_window_grow| PLE exit | sched_in ---++- 1| = ple_window | = ple_window ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink A third new parameter, ple_window_max, controls a maximal ple_window. A minimum equals to ple_window. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/vmx.c | 80 -- 1 file changed, 78 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 18e0e52..e63d7ac 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -125,14 +125,32 @@ module_param(nested, bool, S_IRUGO); * Time is measured based on a counter that runs at the same rate as the TSC, * refer SDM volume 3b section 21.6.13 22.1.3. */ -#define KVM_VMX_DEFAULT_PLE_GAP128 -#define KVM_VMX_DEFAULT_PLE_WINDOW 4096 +#define KVM_VMX_DEFAULT_PLE_GAP 128 +#define KVM_VMX_DEFAULT_PLE_WINDOW4096 +#define KVM_VMX_DEFAULT_PLE_WINDOW_GROW 2 +#define KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK 0 +#define KVM_VMX_DEFAULT_PLE_WINDOW_MAX\ + INT_MAX / KVM_VMX_DEFAULT_PLE_WINDOW_GROW + static int ple_gap = KVM_VMX_DEFAULT_PLE_GAP; module_param(ple_gap, int, S_IRUGO); static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; module_param(ple_window, int, S_IRUGO); +/* Default doubles per-vcpu window every exit. */ +static int ple_window_grow = KVM_VMX_DEFAULT_PLE_WINDOW_GROW; +module_param(ple_window_grow, int, S_IRUGO); + +/* Default resets per-vcpu window every exit to ple_window. */ +static int ple_window_shrink = KVM_VMX_DEFAULT_PLE_WINDOW_SHRINK; +module_param(ple_window_shrink, int, S_IRUGO); + +/* Default is to compute the maximum so we can never overflow. */ +static int ple_window_actual_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; +static int ple_window_max= KVM_VMX_DEFAULT_PLE_WINDOW_MAX; +module_param(ple_window_max, int, S_IRUGO); + extern const ulong vmx_return; #define NR_AUTOLOAD_MSRS 8 @@ -5679,12 +5697,66 @@ out: return ret; } +static int __grow_ple_window(int val) +{ + if (ple_window_grow 1) + return ple_window; + + val = min(val, ple_window_actual_max); + + if (ple_window_grow ple_window) + val *= ple_window_grow; + else + val += ple_window_grow; + + return val; +} + +static int __shrink_ple_window(int val, int shrinker, int minimum) +{ + if (shrinker 1) + return ple_window; + + if (shrinker ple_window) + val /= shrinker; + else + val -= shrinker; + + return max(val, minimum); +} + +static void modify_ple_window(struct kvm_vcpu *vcpu, int grow) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + int new; + + if (grow) + new = __grow_ple_window(vmx-ple_window); + else + new = __shrink_ple_window(vmx-ple_window, ple_window_shrink, + ple_window); + + vmx-ple_window = max(new, ple_window); +} +#define grow_ple_window(vcpu) modify_ple_window(vcpu, 1) +#define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0) + +static void update_ple_window_actual_max(void) +{ + ple_window_actual_max = + __shrink_ple_window(max(ple_window_max, ple_window), + ple_window_grow, INT_MIN); +} + /* * Indicate a busy-waiting vcpu in spinlock. We do not enable the PAUSE * exiting, so only get here on cpu with PAUSE-Loop-Exiting. */ static int handle_pause(struct kvm_vcpu *vcpu) { + if (ple_gap) + grow_ple_window(vcpu); + skip_emulated_instruction(vcpu); kvm_vcpu_on_spin(vcpu); @@ -8854,6 +8926,8 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu, void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) { + if (ple_gap) + shrink_ple_window(vcpu); } static struct kvm_x86_ops vmx_x86_ops = { @@ -9077,6 +9151,8 @@ static int __init vmx_init(void) } else kvm_disable_tdp(); + update_ple_window_actual_max(); + return 0; out7: -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH v2 5/6] KVM: trace kvm_ple_window
Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/kvm/trace.h | 25 + arch/x86/kvm/vmx.c | 8 +--- arch/x86/kvm/x86.c | 1 + 3 files changed, 31 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index e850a7d..4b8e6cb 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -848,6 +848,31 @@ TRACE_EVENT(kvm_track_tsc, __print_symbolic(__entry-host_clock, host_clocks)) ); +TRACE_EVENT(kvm_ple_window, + TP_PROTO(int grow, unsigned int vcpu_id, int new, int old), + TP_ARGS(grow, vcpu_id, new, old), + + TP_STRUCT__entry( + __field( int, grow ) + __field(unsigned int, vcpu_id ) + __field( int, new ) + __field( int, old ) + ), + + TP_fast_assign( + __entry-grow = grow; + __entry-vcpu_id= vcpu_id; + __entry-new= new; + __entry-old= old; + ), + + TP_printk(vcpu %u: ple_window %d %s %d, + __entry-vcpu_id, + __entry-new, + __entry-grow ? + : -, + __entry-old) +); + #endif /* CONFIG_X86_64 */ #endif /* _TRACE_KVM_H */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index e63d7ac..f63ac5d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5728,15 +5728,17 @@ static int __shrink_ple_window(int val, int shrinker, int minimum) static void modify_ple_window(struct kvm_vcpu *vcpu, int grow) { struct vcpu_vmx *vmx = to_vmx(vcpu); + int old = vmx-ple_window; int new; if (grow) - new = __grow_ple_window(vmx-ple_window); + new = __grow_ple_window(old) else - new = __shrink_ple_window(vmx-ple_window, ple_window_shrink, - ple_window); + new = __shrink_ple_window(old, ple_window_shrink, ple_window); vmx-ple_window = max(new, ple_window); + + trace_kvm_ple_window(grow, vcpu-vcpu_id, vmx-ple_window, old); } #define grow_ple_window(vcpu) modify_ple_window(vcpu, 1) #define shrink_ple_window(vcpu) modify_ple_window(vcpu, 0) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 5696ee7..814b20c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7648,3 +7648,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts); EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset); +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window); -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/6] KVM: x86: introduce sched_in to kvm_x86_ops
sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/vmx.c | 6 ++ arch/x86/kvm/x86.c | 1 + 4 files changed, 15 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5724601..358e2f3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -772,6 +772,8 @@ struct kvm_x86_ops { bool (*mpx_supported)(void); int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr); + + void (*sched_in)(struct kvm_vcpu *kvm, int cpu); }; struct kvm_arch_async_pf { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ddf7427..4baf1bc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4305,6 +4305,10 @@ static void svm_handle_external_intr(struct kvm_vcpu *vcpu) local_irq_enable(); } +static void svm_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + static struct kvm_x86_ops svm_x86_ops = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -4406,6 +4410,8 @@ static struct kvm_x86_ops svm_x86_ops = { .check_intercept = svm_check_intercept, .handle_external_intr = svm_handle_external_intr, + + .sched_in = svm_sched_in, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bfe11cf..2b306f9 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -8846,6 +8846,10 @@ static int vmx_check_intercept(struct kvm_vcpu *vcpu, return X86EMUL_CONTINUE; } +void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + static struct kvm_x86_ops vmx_x86_ops = { .cpu_has_kvm_support = cpu_has_kvm_support, .disabled_by_bios = vmx_disabled_by_bios, @@ -8951,6 +8955,8 @@ static struct kvm_x86_ops vmx_x86_ops = { .mpx_supported = vmx_mpx_supported, .check_nested_events = vmx_check_nested_events, + + .sched_in = vmx_sched_in, }; static int __init vmx_init(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d7c214f..5696ee7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7148,6 +7148,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) { + kvm_x86_ops-sched_in(vcpu, cpu); } int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/6] Dynamic Pause Loop Exiting window.
v1 - v2: * squashed [v1 4/9] and [v1 5/9] (clamping) * dropped [v1 7/9] (CPP abstractions) * merged core of [v1 9/9] into [v1 4/9] (automatic maximum) * reworked kernel_param_ops: closer to pure int [v2 6/6] * introduced ple_window_actual_max reworked clamping [v2 4/6] * added seqlock for parameter modifications [v2 6/6] --- PLE does not scale in its current form. When increasing VCPU count above 150, one can hit soft lockups because of runqueue lock contention. (Which says a lot about performance.) The main reason is that kvm_ple_loop cycles through all VCPUs. Replacing it with a scalable solution would be ideal, but it has already been well optimized for various workloads, so this series tries to alleviate one different major problem while minimizing a chance of regressions: we have too many useless PLE exits. Just increasing PLE window would help some cases, but it still spirals out of control. By increasing the window after every PLE exit, we can limit the amount of useless ones, so we don't reach the state where CPUs spend 99% of the time waiting for a lock. HP confirmed that this series prevents soft lockups and TSC sync errors on large guests. Radim Krčmář (6): KVM: add kvm_arch_sched_in KVM: x86: introduce sched_in to kvm_x86_ops KVM: VMX: make PLE window per-VCPU KVM: VMX: dynamise PLE window KVM: trace kvm_ple_window KVM: VMX: runtime knobs for dynamic PLE window arch/arm/kvm/arm.c | 4 ++ arch/mips/kvm/mips.c| 4 ++ arch/powerpc/kvm/powerpc.c | 4 ++ arch/s390/kvm/kvm-s390.c| 4 ++ arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 6 ++ arch/x86/kvm/trace.h| 25 arch/x86/kvm/vmx.c | 124 ++-- arch/x86/kvm/x86.c | 6 ++ include/linux/kvm_host.h| 2 + virt/kvm/kvm_main.c | 2 + 11 files changed, 179 insertions(+), 4 deletions(-) -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/6] KVM: add kvm_arch_sched_in
Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/arm/kvm/arm.c | 4 arch/mips/kvm/mips.c | 4 arch/powerpc/kvm/powerpc.c | 4 arch/s390/kvm/kvm-s390.c | 4 arch/x86/kvm/x86.c | 4 include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c| 2 ++ 7 files changed, 24 insertions(+) diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index a99e0cd..9f788eb 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -288,6 +288,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { } +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { vcpu-cpu = cpu; diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index cd71141..2362df2 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1002,6 +1002,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) { } +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, struct kvm_translation *tr) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 4c79284..cbc432f 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -720,6 +720,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) kvmppc_subarch_vcpu_uninit(vcpu); } +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { #ifdef CONFIG_BOOKE diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index ce81eb2..a3c324e 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -555,6 +555,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) /* Nothing todo */ } +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { save_fp_ctl(vcpu-arch.host_fpregs.fpc); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8f1e22d..d7c214f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7146,6 +7146,10 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) static_key_slow_dec(kvm_no_apic_vcpu); } +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) +{ +} + int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) { if (type) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a4c33b3..ebd7236 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -624,6 +624,8 @@ void kvm_arch_exit(void); int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); +void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu); + void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..d3c3ed0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3123,6 +3123,8 @@ static void kvm_sched_in(struct preempt_notifier *pn, int cpu) if (vcpu-preempted) vcpu-preempted = false; + kvm_arch_sched_in(vcpu, cpu); + kvm_arch_vcpu_load(vcpu, cpu); } -- 2.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RESEND][PATCH] kvm-unit-tests: x86: pmu: call measure for every counter in check_counters_many
In the check_counters_many function measure was only being called on the last counter, causing the pmu test to fail. This ensures that measure is called for each counter in the array before calling verify_counter. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- x86/pmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/x86/pmu.c b/x86/pmu.c index 5c85146..3402d1e 100644 --- a/x86/pmu.c +++ b/x86/pmu.c @@ -287,11 +287,11 @@ static void check_counters_many(void) n++; } - measure(cnt, n); - - for (i = 0; i n; i++) + for (i = 0; i n; i++) { + measure(cnt[i], 1); if (!verify_counter(cnt[i])) break; + } report(all counters, i == n); } -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On 08/19/2014 09:36 AM, Chai Wen wrote: On 08/19/2014 04:38 AM, Don Zickus wrote: On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. Hi Ingo what do you think of Don's solution- 'comparing of task pointer' ? Anyway this is just an additional check about some very special cases, so I think the issue that I am concerned above is not a problem at all. And after learning some concepts about PID namespace, I think comparing of task pointer is reliable dealing with PID namespace here. And Don, If you want me to re-post this patch, please let me know that. thanks chai wen thanks chai wen Cheers, Don . -- Regards Chai Wen -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: LPC IOMMU and VFIO MicroConference - Call for Participation
Hi Alex and Joerg, I have my travel request approved but missed the registration window. Hope I will be lucky:) Regards! Gerry On 2014/8/21 1:10, Alex Williamson wrote: Ok folks, it's time to submit your discussion proposals for the LPC IOMMU and VFIO uconf. If you added an idea to the wiki, now is the time to formally propose it as a discussion topic. If you have ideas how to make the IOMMU or VFIO subsystems better, now is the time to propose it. If you can't figure out how to make something work in the current infrastructure, now is the time to propose a discussion. If you're adding new features and want to make sure we can support them, now is the time to propose a discussion. I don't think we've seen a formal schedule yet, but many of us have conflicts with KVM Forum this year and I expect the LPC planning committee to take that into account, so please submit your proposals anyway and feel free to note your availability/conflicts in the Note to organizers section. LPC is full, but there is a waiting list and the sooner you can get on it, the more likely you are to be registered. I expect uconf discussion leads to have an advantage in moving through the queue and we may be able to provide discounted registration for discussion leads. Thanks, Alex On Tue, 2014-08-12 at 11:20 +0200, Joerg Roedel wrote: LPC IOMMU and VFIO MicroConference - Call for Participation === We are pleased to announce that this year there will be the first IOMMU and VFIO MicroConference held at Linux Plumbers Conference in Düsseldorf. An initial request for support of this micro conference generated, among others, the following possible topic ideas: * Improving generic IOMMU code and move code out of drivers * IOMMU device error handling * IOMMU Power Management * Virtualizing IOMMUs * Interface between IOMMUs an memory management More suggested topics can be found at the wiki page of the micro conference: http://wiki.linuxplumbersconf.org/2014:iommu_microconference We now ask for formal proposals for these discussions along with any other topics or problems that need to be discussed in this area. The format of the micro conference will be roughly half-hour slots for each topic, where the discussion lead gives a short introduction to the problem and maybe sketches possible solutions. The rest of the slot is open for discussions so that we come to an agreement how to move forward. Please submit your formal proposal on the Linux Plumbers website (OpenID login required) until August 31st at: http://www.linuxplumbersconf.org/2014/how-to-submit-microconference-discussions-topics/ Hope to see you in Düsseldorf! Joerg Roedel and Alex Williamson -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] softlockup: make detector be aware of task switch of processes hogging cpu
On Thu, Aug 21, 2014 at 09:37:04AM +0800, Chai Wen wrote: On 08/19/2014 09:36 AM, Chai Wen wrote: On 08/19/2014 04:38 AM, Don Zickus wrote: On Mon, Aug 18, 2014 at 09:02:00PM +0200, Ingo Molnar wrote: * Don Zickus dzic...@redhat.com wrote: So I agree with the motivation of this improvement, but is this implementation namespace-safe? What namespace are you worried about colliding with? I thought softlockup_ would provide the safety?? Maybe I am missing something obvious. :-( I meant PID namespaces - a PID in itself isn't guaranteed to be unique across the system. Ah, I don't think we thought about that. Is there a better way to do this? Is there a domain id or something that can be OR'd with the pid? What is always unique is the task pointer itself. We use pids when we interface with user-space - but we don't really do that here, right? No, I don't believe so. Ok, so saving 'current' and comparing that should be enough, correct? I am not sure of the safety about using pid here with namespace. But as to the pointer of process, is there a chance that we got a 'historical' address saved in the 'softlockup_warn_pid(or address)_saved' and the current hogging process happened to get the same task pointer address? If it never happens, I think the comparing of address is ok. Hi Ingo what do you think of Don's solution- 'comparing of task pointer' ? Anyway this is just an additional check about some very special cases, so I think the issue that I am concerned above is not a problem at all. And after learning some concepts about PID namespace, I think comparing of task pointer is reliable dealing with PID namespace here. And Don, If you want me to re-post this patch, please let me know that. Sure, just quickly test with the task pointer to make sure it still works and then re-post. Cheers, Don -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
SPE exception handlers are now defined for 32-bit e500mc cores even though SPE unit is not present and CONFIG_SPE is undefined. Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE and consequently guard __stup_ivors and __setup_cpu functions. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- v2: - use CONFIG_PPC_E500MC without CONFIG_E500 - use elif defined() arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++- arch/powerpc/kernel/cputable.c| 5 + arch/powerpc/kernel/head_fsl_booke.S | 18 +- arch/powerpc/platforms/Kconfig.cputype| 6 +- 4 files changed, 34 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/kernel/cpu_setup_fsl_booke.S b/arch/powerpc/kernel/cpu_setup_fsl_booke.S index 4f1393d..dddba3e 100644 --- a/arch/powerpc/kernel/cpu_setup_fsl_booke.S +++ b/arch/powerpc/kernel/cpu_setup_fsl_booke.S @@ -91,6 +91,7 @@ _GLOBAL(setup_altivec_idle) blr +#ifdef CONFIG_PPC_E500MC _GLOBAL(__setup_cpu_e6500) mflrr6 #ifdef CONFIG_PPC64 @@ -107,14 +108,20 @@ _GLOBAL(__setup_cpu_e6500) bl __setup_cpu_e5500 mtlrr6 blr +#endif /* CONFIG_PPC_E500MC */ #ifdef CONFIG_PPC32 +#ifdef CONFIG_E200 _GLOBAL(__setup_cpu_e200) /* enable dedicated debug exception handling resources (Debug APU) */ mfspr r3,SPRN_HID0 ori r3,r3,HID0_DAPUEN@l mtspr SPRN_HID0,r3 b __setup_e200_ivors +#endif /* CONFIG_E200 */ + +#ifdef CONFIG_E500 +#ifndef CONFIG_PPC_E500MC _GLOBAL(__setup_cpu_e500v1) _GLOBAL(__setup_cpu_e500v2) mflrr4 @@ -129,6 +136,7 @@ _GLOBAL(__setup_cpu_e500v2) #endif mtlrr4 blr +#else /* CONFIG_PPC_E500MC */ _GLOBAL(__setup_cpu_e500mc) _GLOBAL(__setup_cpu_e5500) mflrr5 @@ -159,7 +167,9 @@ _GLOBAL(__setup_cpu_e5500) 2: mtlrr5 blr -#endif +#endif /* CONFIG_PPC_E500MC */ +#endif /* CONFIG_E500 */ +#endif /* CONFIG_PPC32 */ #ifdef CONFIG_PPC_BOOK3E_64 _GLOBAL(__restore_cpu_e6500) diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c index 0c15764..df979c5f 100644 --- a/arch/powerpc/kernel/cputable.c +++ b/arch/powerpc/kernel/cputable.c @@ -2051,6 +2051,7 @@ static struct cpu_spec __initdata cpu_specs[] = { #endif /* CONFIG_PPC32 */ #ifdef CONFIG_E500 #ifdef CONFIG_PPC32 +#ifndef CONFIG_PPC_E500MC { /* e500 */ .pvr_mask = 0x, .pvr_value = 0x8020, @@ -2090,6 +2091,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_e500, .platform = ppc8548, }, +#else { /* e500mc */ .pvr_mask = 0x, .pvr_value = 0x8023, @@ -2108,7 +2110,9 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_e500mc, .platform = ppce500mc, }, +#endif /* CONFIG_PPC_E500MC */ #endif /* CONFIG_PPC32 */ +#ifdef CONFIG_PPC_E500MC { /* e5500 */ .pvr_mask = 0x, .pvr_value = 0x8024, @@ -2152,6 +2156,7 @@ static struct cpu_spec __initdata cpu_specs[] = { .machine_check = machine_check_e500mc, .platform = ppce6500, }, +#endif /* CONFIG_PPC_E500MC */ #ifdef CONFIG_PPC32 { /* default match */ .pvr_mask = 0x, diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index b497188..90f487f 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -613,6 +613,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) mfspr r10, SPRN_SPRG_RSCRATCH0 b InstructionStorage +/* Define SPE handlers for e200 and e500v2 */ #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) @@ -622,10 +623,10 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) b fast_exception_return 1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x2010, KernelSPE) -#else +#elif defined(CONFIG_SPE_POSSIBLE) EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) -#endif /* CONFIG_SPE */ +#endif /* CONFIG_SPE_POSSIBLE */ /* SPE Floating Point Data */ #ifdef CONFIG_SPE @@ -635,12 +636,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) /* SPE Floating Point Round */ EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \ SPEFloatingPointRoundException, EXC_XFER_EE) -#else +#elif defined(CONFIG_SPE_POSSIBLE) EXCEPTION(0x2040,
[PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
Book3E specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which unit to support exclusively. As Alexander Graf suggested, this will improve code readability especially in KVM. Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting c58ce397 and 6b310fc5 patches that added common defines. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/exceptions-64e.S | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 8 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S index bb9cac6..3e68d1c 100644 --- a/arch/powerpc/kernel/exceptions-64e.S +++ b/arch/powerpc/kernel/exceptions-64e.S @@ -635,7 +635,7 @@ interrupt_end_book3e: /* Altivec Unavailable Interrupt */ START_EXCEPTION(altivec_unavailable); - NORMAL_EXCEPTION_PROLOG(0x200, BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL, + NORMAL_EXCEPTION_PROLOG(0x200, BOOKE_INTERRUPT_ALTIVEC_UNAVAIL, PROLOG_ADDITION_NONE) /* we can probably do a shorter exception entry for that one... */ EXCEPTION_COMMON(0x200) @@ -658,7 +658,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC) /* AltiVec Assist */ START_EXCEPTION(altivec_assist); NORMAL_EXCEPTION_PROLOG(0x220, - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST, + BOOKE_INTERRUPT_ALTIVEC_ASSIST, PROLOG_ADDITION_NONE) EXCEPTION_COMMON(0x220) INTS_DISABLE diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S index 90f487f..fffd1f9 100644 --- a/arch/powerpc/kernel/head_fsl_booke.S +++ b/arch/powerpc/kernel/head_fsl_booke.S @@ -617,27 +617,27 @@ END_FTR_SECTION_IFSET(CPU_FTR_EMB_HV) #ifdef CONFIG_SPE /* SPE Unavailable */ START_EXCEPTION(SPEUnavailable) - NORMAL_EXCEPTION_PROLOG(SPE_ALTIVEC_UNAVAIL) + NORMAL_EXCEPTION_PROLOG(SPE_UNAVAIL) beq 1f bl load_up_spe b fast_exception_return 1: addir3,r1,STACK_FRAME_OVERHEAD EXC_XFER_EE_LITE(0x2010, KernelSPE) #elif defined(CONFIG_SPE_POSSIBLE) - EXCEPTION(0x2020, SPE_ALTIVEC_UNAVAIL, SPEUnavailable, \ + EXCEPTION(0x2020, SPE_UNAVAIL, SPEUnavailable, \ unknown_exception, EXC_XFER_EE) #endif /* CONFIG_SPE_POSSIBLE */ /* SPE Floating Point Data */ #ifdef CONFIG_SPE - EXCEPTION(0x2030, SPE_FP_DATA_ALTIVEC_ASSIST, SPEFloatingPointData, + EXCEPTION(0x2030, SPE_FP_DATA, SPEFloatingPointData, SPEFloatingPointException, EXC_XFER_EE) /* SPE Floating Point Round */ EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \ SPEFloatingPointRoundException, EXC_XFER_EE) #elif defined(CONFIG_SPE_POSSIBLE) - EXCEPTION(0x2040, SPE_FP_DATA_ALTIVEC_ASSIST, SPEFloatingPointData, + EXCEPTION(0x2040, SPE_FP_DATA, SPEFloatingPointData, unknown_exception, EXC_XFER_EE) EXCEPTION(0x2050, SPE_FP_ROUND, SPEFloatingPointRound, \ unknown_exception, EXC_XFER_EE) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/6] KVM: PPC: Book3e: AltiVec support
Add KVM Book3e AltiVec support. Changes: v4: - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC - remove SPE handlers from bookehv - split ONE_REG powerpc generic and ONE_REG AltiVec - add setters for IVPR, IVOR2 and IVOR8 - add api documentation for ONE_REG IVPR and IVORs - don't enable e6500 core since hardware threads are not yet supported v3: - use distinct SPE/AltiVec exception handlers - make ONE_REG AltiVec support powerpc generic - add ONE_REG IVORs support v2: - integrate Paul's FP/VMX/VSX changes that landed in kvm-ppc-queue in January and take into account feedback Mihai Caraman (6): KVM: PPC: Book3E: Increase FPU laziness KVM: PPC: Book3e: Add AltiVec support KVM: PPC: Make ONE_REG powerpc generic KVM: PPC: Move ONE_REG AltiVec support to powerpc KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs Documentation/virtual/kvm/api.txt | 7 + arch/powerpc/include/uapi/asm/kvm.h | 30 +++ arch/powerpc/kvm/book3s.c | 151 -- arch/powerpc/kvm/booke.c | 371 -- arch/powerpc/kvm/booke.h | 43 +--- arch/powerpc/kvm/booke_emulate.c | 15 +- arch/powerpc/kvm/bookehv_interrupts.S | 9 +- arch/powerpc/kvm/e500.c | 42 +++- arch/powerpc/kvm/e500_emulate.c | 20 ++ arch/powerpc/kvm/e500mc.c | 18 +- arch/powerpc/kvm/powerpc.c| 97 + 11 files changed, 576 insertions(+), 227 deletions(-) -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 6/6] KVM: PPC: Booke: Add ONE_REG support for IVPR and IVORs
Add ONE_REG support for IVPR and IVORs registers. Implement IVPR, IVORs 0-15 and 35 in booke common layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - add ONE_REG IVPR - use IVPR, IVOR2 and IVOR8 setters - add api documentation for ONE_REG IVPR and IVORs v3: - new patch Documentation/virtual/kvm/api.txt | 7 ++ arch/powerpc/include/uapi/asm/kvm.h | 25 +++ arch/powerpc/kvm/booke.c| 145 arch/powerpc/kvm/e500.c | 42 ++- arch/powerpc/kvm/e500mc.c | 16 5 files changed, 233 insertions(+), 2 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index beae3fd..cd7b171 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1917,6 +1917,13 @@ registers, find a list below: PPC | KVM_REG_PPC_TM_VSCR | 32 PPC | KVM_REG_PPC_TM_DSCR | 64 PPC | KVM_REG_PPC_TM_TAR| 64 + PPC | KVM_REG_PPC_IVPR | 64 + PPC | KVM_REG_PPC_IVOR0 | 32 + ... + PPC | KVM_REG_PPC_IVOR15| 32 + PPC | KVM_REG_PPC_IVOR32| 32 + ... + PPC | KVM_REG_PPC_IVOR37| 32 | | MIPS | KVM_REG_MIPS_R0 | 64 ... diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index ab4d473..c97f119 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -564,6 +564,31 @@ struct kvm_get_htab_header { #define KVM_REG_PPC_SPRG9 (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xba) #define KVM_REG_PPC_DBSR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbb) +/* Booke IVPR IVOR registers */ +#define KVM_REG_PPC_IVPR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xbc) +#define KVM_REG_PPC_IVOR0 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbd) +#define KVM_REG_PPC_IVOR1 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbe) +#define KVM_REG_PPC_IVOR2 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xbf) +#define KVM_REG_PPC_IVOR3 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc0) +#define KVM_REG_PPC_IVOR4 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc1) +#define KVM_REG_PPC_IVOR5 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc2) +#define KVM_REG_PPC_IVOR6 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc3) +#define KVM_REG_PPC_IVOR7 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc4) +#define KVM_REG_PPC_IVOR8 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc5) +#define KVM_REG_PPC_IVOR9 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc6) +#define KVM_REG_PPC_IVOR10 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc7) +#define KVM_REG_PPC_IVOR11 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc8) +#define KVM_REG_PPC_IVOR12 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xc9) +#define KVM_REG_PPC_IVOR13 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xca) +#define KVM_REG_PPC_IVOR14 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcb) +#define KVM_REG_PPC_IVOR15 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcc) +#define KVM_REG_PPC_IVOR32 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcd) +#define KVM_REG_PPC_IVOR33 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xce) +#define KVM_REG_PPC_IVOR34 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xcf) +#define KVM_REG_PPC_IVOR35 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd0) +#define KVM_REG_PPC_IVOR36 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd1) +#define KVM_REG_PPC_IVOR37 (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xd2) + /* Transactional Memory checkpointed state: * This is all GPRs, all VSX regs and a subset of SPRs */ diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index d4df648..1cb2a2a 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1570,6 +1570,75 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, int r = 0; switch (id) { + case KVM_REG_PPC_IVPR: + *val = get_reg_val(id, vcpu-arch.ivpr); + break; + case KVM_REG_PPC_IVOR0: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL]); + break; + case KVM_REG_PPC_IVOR1: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK]); + break; + case KVM_REG_PPC_IVOR2: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE]); + break; + case KVM_REG_PPC_IVOR3: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE]); + break; + case KVM_REG_PPC_IVOR4: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_EXTERNAL]); + break; + case KVM_REG_PPC_IVOR5: + *val = get_reg_val(id, + vcpu-arch.ivor[BOOKE_IRQPRIO_ALIGNMENT]); + break; + case
[PATCH v4 3/6] KVM: PPC: Make ONE_REG powerpc generic
Make ONE_REG generic for server and embedded architectures by moving kvm_vcpu_ioctl_get_one_reg() and kvm_vcpu_ioctl_set_one_reg() functions to powerpc layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - split ONE_REG powerpc generic and ONE_REG AltiVec v3: - make ONE_REG AltiVec support powerpc generic v2: - add comment describing VCSR register representation in KVM vs kernel arch/powerpc/kvm/book3s.c | 121 +++-- arch/powerpc/kvm/booke.c | 91 +- arch/powerpc/kvm/powerpc.c | 55 + 3 files changed, 138 insertions(+), 129 deletions(-) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index dd03f6b..26868e2 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -535,33 +535,28 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) return -ENOTSUPP; } -int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) +int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, + union kvmppc_one_reg *val) { - int r; - union kvmppc_one_reg val; - int size; + int r = 0; long int i; - size = one_reg_size(reg-id); - if (size sizeof(val)) - return -EINVAL; - - r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, reg-id, val); + r = vcpu-kvm-arch.kvm_ops-get_one_reg(vcpu, id, val); if (r == -EINVAL) { r = 0; - switch (reg-id) { + switch (id) { case KVM_REG_PPC_DAR: - val = get_reg_val(reg-id, kvmppc_get_dar(vcpu)); + *val = get_reg_val(id, kvmppc_get_dar(vcpu)); break; case KVM_REG_PPC_DSISR: - val = get_reg_val(reg-id, kvmppc_get_dsisr(vcpu)); + *val = get_reg_val(id, kvmppc_get_dsisr(vcpu)); break; case KVM_REG_PPC_FPR0 ... KVM_REG_PPC_FPR31: - i = reg-id - KVM_REG_PPC_FPR0; - val = get_reg_val(reg-id, VCPU_FPR(vcpu, i)); + i = id - KVM_REG_PPC_FPR0; + *val = get_reg_val(id, VCPU_FPR(vcpu, i)); break; case KVM_REG_PPC_FPSCR: - val = get_reg_val(reg-id, vcpu-arch.fp.fpscr); + *val = get_reg_val(id, vcpu-arch.fp.fpscr); break; #ifdef CONFIG_ALTIVEC case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: @@ -569,110 +564,94 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) r = -ENXIO; break; } - val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0]; + val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0]; break; case KVM_REG_PPC_VSCR: if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { r = -ENXIO; break; } - val = get_reg_val(reg-id, vcpu-arch.vr.vscr.u[3]); + *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]); break; case KVM_REG_PPC_VRSAVE: - val = get_reg_val(reg-id, vcpu-arch.vrsave); + *val = get_reg_val(id, vcpu-arch.vrsave); break; #endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { - long int i = reg-id - KVM_REG_PPC_VSR0; - val.vsxval[0] = vcpu-arch.fp.fpr[i][0]; - val.vsxval[1] = vcpu-arch.fp.fpr[i][1]; + i = id - KVM_REG_PPC_VSR0; + val-vsxval[0] = vcpu-arch.fp.fpr[i][0]; + val-vsxval[1] = vcpu-arch.fp.fpr[i][1]; } else { r = -ENXIO; } break; #endif /* CONFIG_VSX */ - case KVM_REG_PPC_DEBUG_INST: { - u32 opcode = INS_TW; - r = copy_to_user((u32 __user *)(long)reg-addr, -opcode, sizeof(u32)); + case KVM_REG_PPC_DEBUG_INST: + *val = get_reg_val(id, INS_TW); break; - } #ifdef CONFIG_KVM_XICS case KVM_REG_PPC_ICP_STATE: if (!vcpu-arch.icp) { r = -ENXIO; break; }
[PATCH v4 4/6] KVM: PPC: Move ONE_REG AltiVec support to powerpc
Move ONE_REG AltiVec support to powerpc generic layer. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - split ONE_REG powerpc generic and ONE_REG AltiVec v3: - make ONE_REG AltiVec support powerpc generic v2: - add comment describing VCSR register representation in KVM vs kernel arch/powerpc/include/uapi/asm/kvm.h | 5 + arch/powerpc/kvm/book3s.c | 42 - arch/powerpc/kvm/powerpc.c | 42 + 3 files changed, 47 insertions(+), 42 deletions(-) diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h index 3ca357a..ab4d473 100644 --- a/arch/powerpc/include/uapi/asm/kvm.h +++ b/arch/powerpc/include/uapi/asm/kvm.h @@ -476,6 +476,11 @@ struct kvm_get_htab_header { /* FP and vector status/control registers */ #define KVM_REG_PPC_FPSCR (KVM_REG_PPC | KVM_REG_SIZE_U64 | 0x80) +/* + * VSCR register is documented as a 32-bit register in the ISA, but it can + * only be accesses via a vector register. Expose VSCR as a 32-bit register + * even though the kernel represents it as a 128-bit vector. + */ #define KVM_REG_PPC_VSCR (KVM_REG_PPC | KVM_REG_SIZE_U32 | 0x81) /* Virtual processor areas */ diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 26868e2..1b5adda 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -558,25 +558,6 @@ int kvmppc_get_one_reg(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_FPSCR: *val = get_reg_val(id, vcpu-arch.fp.fpscr); break; -#ifdef CONFIG_ALTIVEC - case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - val-vval = vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0]; - break; - case KVM_REG_PPC_VSCR: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - *val = get_reg_val(id, vcpu-arch.vr.vscr.u[3]); - break; - case KVM_REG_PPC_VRSAVE: - *val = get_reg_val(id, vcpu-arch.vrsave); - break; -#endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { @@ -653,29 +634,6 @@ int kvmppc_set_one_reg(struct kvm_vcpu *vcpu, u64 id, case KVM_REG_PPC_FPSCR: vcpu-arch.fp.fpscr = set_reg_val(id, *val); break; -#ifdef CONFIG_ALTIVEC - case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vr.vr[id - KVM_REG_PPC_VR0] = val-vval; - break; - case KVM_REG_PPC_VSCR: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vr.vscr.u[3] = set_reg_val(id, *val); - break; - case KVM_REG_PPC_VRSAVE: - if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { - r = -ENXIO; - break; - } - vcpu-arch.vrsave = set_reg_val(id, *val); - break; -#endif /* CONFIG_ALTIVEC */ #ifdef CONFIG_VSX case KVM_REG_PPC_VSR0 ... KVM_REG_PPC_VSR31: if (cpu_has_feature(CPU_FTR_VSX)) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 1326116..19d4755 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -941,6 +941,25 @@ int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg) if (r == -EINVAL) { r = 0; switch (reg-id) { +#ifdef CONFIG_ALTIVEC + case KVM_REG_PPC_VR0 ... KVM_REG_PPC_VR31: + if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { + r = -ENXIO; + break; + } + val.vval = vcpu-arch.vr.vr[reg-id - KVM_REG_PPC_VR0]; + break; + case KVM_REG_PPC_VSCR: + if (!cpu_has_feature(CPU_FTR_ALTIVEC)) { + r = -ENXIO; + break; + } + val = get_reg_val(reg-id,
[PATCH v4 2/6] KVM: PPC: Book3e: Add AltiVec support
Add AltiVec support in KVM for Book3e. FPU support gracefully reuse host infrastructure so follow the same approach for AltiVec. Book3e specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which of the SPE or AltiVec units to support exclusively by using CONFIG_SPE_POSSIBLE and CONFIG_PPC_E500MC defines. As Alexander Graf suggested, keep SPE and AltiVec exception handlers distinct to improve code readability. Guests have the privilege to enable AltiVec, so we always need to support AltiVec in KVM and implicitly in host to reflect interrupts and to save/restore the unit context. KVM will be loaded on cores with AltiVec unit only if CONFIG_ALTIVEC is defined. Use this define to guard KVM AltiVec logic. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - use CONFIG_SPE_POSSIBLE and a new ifdef for CONFIG_ALTIVEC - remove SPE handlers from bookehv - update commit message v3: - use distinct SPE/AltiVec exception handlers v2: - integrate Paul's FP/VMX/VSX changes arch/powerpc/kvm/booke.c | 74 ++- arch/powerpc/kvm/booke.h | 6 +++ arch/powerpc/kvm/bookehv_interrupts.S | 9 + arch/powerpc/kvm/e500_emulate.c | 20 ++ 4 files changed, 101 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 91e7217..8ace612 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -168,6 +168,40 @@ static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) #endif } +/* + * Simulate AltiVec unavailable fault to load guest state + * from thread to AltiVec unit. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_load_guest_altivec(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + if (!(current-thread.regs-msr MSR_VEC)) { + enable_kernel_altivec(); + load_vr_state(vcpu-arch.vr); + current-thread.vr_save_area = vcpu-arch.vr; + current-thread.regs-msr |= MSR_VEC; + } + } +#endif +} + +/* + * Save guest vcpu AltiVec state into thread. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_save_guest_altivec(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_ALTIVEC + if (cpu_has_feature(CPU_FTR_ALTIVEC)) { + if (current-thread.regs-msr MSR_VEC) + giveup_altivec(current); + current-thread.vr_save_area = NULL; + } +#endif +} + static void kvmppc_vcpu_sync_debug(struct kvm_vcpu *vcpu) { /* Synchronize guest's desire to get debug interrupts into shadow MSR */ @@ -375,9 +409,15 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: +#ifdef CONFIG_SPE_POSSIBLE case BOOKE_IRQPRIO_SPE_UNAVAIL: case BOOKE_IRQPRIO_SPE_FP_DATA: case BOOKE_IRQPRIO_SPE_FP_ROUND: +#endif +#ifdef CONFIG_ALTIVEC + case BOOKE_IRQPRIO_ALTIVEC_UNAVAIL: + case BOOKE_IRQPRIO_ALTIVEC_ASSIST: +#endif case BOOKE_IRQPRIO_AP_UNAVAIL: allowed = 1; msr_mask = MSR_CE | MSR_ME | MSR_DE; @@ -697,6 +737,17 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif +#ifdef CONFIG_ALTIVEC + /* Save userspace AltiVec state in stack */ + if (cpu_has_feature(CPU_FTR_ALTIVEC)) + enable_kernel_altivec(); + /* +* Since we can't trap on MSR_VEC in GS-mode, we consider the guest +* as always using the AltiVec. +*/ + kvmppc_load_guest_altivec(vcpu); +#endif + /* Switch to guest debug context */ debug = vcpu-arch.dbg_reg; switch_booke_debug_regs(debug); @@ -719,6 +770,10 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_save_guest_fp(vcpu); #endif +#ifdef CONFIG_ALTIVEC + kvmppc_save_guest_altivec(vcpu); +#endif + out: vcpu-mode = OUTSIDE_GUEST_MODE; return ret; @@ -1025,7 +1080,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_ROUND); r = RESUME_GUEST; break; -#else +#elif defined(CONFIG_SPE_POSSIBLE) case BOOKE_INTERRUPT_SPE_UNAVAIL: /* * Guest wants SPE, but host kernel doesn't support it. Send @@ -1046,6 +1101,22 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, run-hw.hardware_exit_reason = exit_nr; r = RESUME_HOST; break; +#endif /* CONFIG_SPE_POSSIBLE
[PATCH v4 1/6] KVM: PPC: Book3E: Increase FPU laziness
Increase FPU laziness by loading the guest state into the unit before entering the guest instead of doing it on each vcpu schedule. Without this improvement an interrupt may claim floating point corrupting guest state. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - update commit message v3: - no changes v2: - remove fpu_active - add descriptive comments arch/powerpc/kvm/booke.c | 43 --- arch/powerpc/kvm/booke.h | 34 -- arch/powerpc/kvm/e500mc.c | 2 -- 3 files changed, 36 insertions(+), 43 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 074b7fc..91e7217 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -124,6 +124,40 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu) } #endif +/* + * Load up guest vcpu FP state if it's needed. + * It also set the MSR_FP in thread so that host know + * we're holding FPU, and then host can help to save + * guest vcpu FP state if other threads require to use FPU. + * This simulates an FP unavailable fault. + * + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_PPC_FPU + if (!(current-thread.regs-msr MSR_FP)) { + enable_kernel_fp(); + load_fp_state(vcpu-arch.fp); + current-thread.fp_save_area = vcpu-arch.fp; + current-thread.regs-msr |= MSR_FP; + } +#endif +} + +/* + * Save guest vcpu FP state into thread. + * It requires to be called with preemption disabled. + */ +static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_PPC_FPU + if (current-thread.regs-msr MSR_FP) + giveup_fpu(current); + current-thread.fp_save_area = NULL; +#endif +} + static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu) { #if defined(CONFIG_PPC_FPU) !defined(CONFIG_KVM_BOOKE_HV) @@ -658,12 +692,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) /* * Since we can't trap on MSR_FP in GS-mode, we consider the guest -* as always using the FPU. Kernel usage of FP (via -* enable_kernel_fp()) in this thread must not occur while -* vcpu-fpu_active is set. +* as always using the FPU. */ - vcpu-fpu_active = 1; - kvmppc_load_guest_fp(vcpu); #endif @@ -687,8 +717,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) #ifdef CONFIG_PPC_FPU kvmppc_save_guest_fp(vcpu); - - vcpu-fpu_active = 0; #endif out: @@ -1194,6 +1222,7 @@ out: else { /* interrupts now hard-disabled */ kvmppc_fix_ee_before_entry(); + kvmppc_load_guest_fp(vcpu); } } diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index f753543..e73d513 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -116,40 +116,6 @@ extern int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, int sprn, extern int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, int sprn, ulong *spr_val); -/* - * Load up guest vcpu FP state if it's needed. - * It also set the MSR_FP in thread so that host know - * we're holding FPU, and then host can help to save - * guest vcpu FP state if other threads require to use FPU. - * This simulates an FP unavailable fault. - * - * It requires to be called with preemption disabled. - */ -static inline void kvmppc_load_guest_fp(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_PPC_FPU - if (vcpu-fpu_active !(current-thread.regs-msr MSR_FP)) { - enable_kernel_fp(); - load_fp_state(vcpu-arch.fp); - current-thread.fp_save_area = vcpu-arch.fp; - current-thread.regs-msr |= MSR_FP; - } -#endif -} - -/* - * Save guest vcpu FP state into thread. - * It requires to be called with preemption disabled. - */ -static inline void kvmppc_save_guest_fp(struct kvm_vcpu *vcpu) -{ -#ifdef CONFIG_PPC_FPU - if (vcpu-fpu_active (current-thread.regs-msr MSR_FP)) - giveup_fpu(current); - current-thread.fp_save_area = NULL; -#endif -} - static inline void kvmppc_clear_dbsr(void) { mtspr(SPRN_DBSR, mfspr(SPRN_DBSR)); diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c index 000cf82..4549349 100644 --- a/arch/powerpc/kvm/e500mc.c +++ b/arch/powerpc/kvm/e500mc.c @@ -145,8 +145,6 @@ static void kvmppc_core_vcpu_load_e500mc(struct kvm_vcpu *vcpu, int cpu) kvmppc_e500_tlbil_all(vcpu_e500); __get_cpu_var(last_vcpu_of_lpid)[vcpu-kvm-arch.lpid] = vcpu; } - - kvmppc_load_guest_fp(vcpu); } static void kvmppc_core_vcpu_put_e500mc(struct kvm_vcpu *vcpu) -- 1.7.11.7 -- To unsubscribe from this list:
[PATCH v4 5/6] KVM: PPC: Booke: Add setter functions for IVPR, IVOR2 and IVOR8 emulation
Add setter functions for IVPR, IVOR2 and IVOR8 emulation in preparation for ONE_REG support. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v4: - new patch - add api documentation for ONE_REG IVPR and IVORs arch/powerpc/kvm/booke.c | 24 arch/powerpc/kvm/booke.h | 3 +++ arch/powerpc/kvm/booke_emulate.c | 15 +++ 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 831c1b4..d4df648 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1782,6 +1782,30 @@ void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits) update_timer_ints(vcpu); } +void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr) +{ + vcpu-arch.ivpr = new_ivpr; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVPR, new_ivpr); +#endif +} + +void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor) +{ + vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = new_ivor; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVOR2, new_ivor); +#endif +} + +void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor) +{ + vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = new_ivor; +#ifdef CONFIG_KVM_BOOKE_HV + mtspr(SPRN_GIVOR8, new_ivor); +#endif +} + void kvmppc_decrementer_func(unsigned long data) { struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data; diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index 22ba08e..0242530 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -80,6 +80,9 @@ void kvmppc_set_epcr(struct kvm_vcpu *vcpu, u32 new_epcr); void kvmppc_set_tcr(struct kvm_vcpu *vcpu, u32 new_tcr); void kvmppc_set_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits); void kvmppc_clr_tsr_bits(struct kvm_vcpu *vcpu, u32 tsr_bits); +void kvmppc_set_ivpr(struct kvm_vcpu *vcpu, ulong new_ivpr); +void kvmppc_set_ivor2(struct kvm_vcpu *vcpu, u32 new_ivor); +void kvmppc_set_ivor8(struct kvm_vcpu *vcpu, u32 new_ivor); int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned int inst, int *advance); diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 92bc668..94c64e3 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -191,10 +191,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) break; case SPRN_IVPR: - vcpu-arch.ivpr = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVPR, spr_val); -#endif + kvmppc_set_ivpr(vcpu, spr_val); break; case SPRN_IVOR0: vcpu-arch.ivor[BOOKE_IRQPRIO_CRITICAL] = spr_val; @@ -203,10 +200,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) vcpu-arch.ivor[BOOKE_IRQPRIO_MACHINE_CHECK] = spr_val; break; case SPRN_IVOR2: - vcpu-arch.ivor[BOOKE_IRQPRIO_DATA_STORAGE] = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVOR2, spr_val); -#endif + kvmppc_set_ivor2(vcpu, spr_val); break; case SPRN_IVOR3: vcpu-arch.ivor[BOOKE_IRQPRIO_INST_STORAGE] = spr_val; @@ -224,10 +218,7 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, ulong spr_val) vcpu-arch.ivor[BOOKE_IRQPRIO_FP_UNAVAIL] = spr_val; break; case SPRN_IVOR8: - vcpu-arch.ivor[BOOKE_IRQPRIO_SYSCALL] = spr_val; -#ifdef CONFIG_KVM_BOOKE_HV - mtspr(SPRN_GIVOR8, spr_val); -#endif + kvmppc_set_ivor8(vcpu, spr_val); break; case SPRN_IVOR9: vcpu-arch.ivor[BOOKE_IRQPRIO_AP_UNAVAIL] = spr_val; -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] powerpc/booke: Restrict SPE exception handlers to e200/e500 cores
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote: SPE exception handlers are now defined for 32-bit e500mc cores even though SPE unit is not present and CONFIG_SPE is undefined. Restrict SPE exception handlers to e200/e500 cores adding CONFIG_SPE_POSSIBLE and consequently guard __stup_ivors and __setup_cpu functions. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- v2: - use CONFIG_PPC_E500MC without CONFIG_E500 - use elif defined() arch/powerpc/kernel/cpu_setup_fsl_booke.S | 12 +++- arch/powerpc/kernel/cputable.c| 5 + arch/powerpc/kernel/head_fsl_booke.S | 18 +- arch/powerpc/platforms/Kconfig.cputype| 6 +- 4 files changed, 34 insertions(+), 7 deletions(-) Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] powerpc/booke: Revert SPE/AltiVec common defines for interrupt numbers
On Wed, 2014-08-20 at 16:09 +0300, Mihai Caraman wrote: Book3E specification defines shared interrupt numbers for SPE and AltiVec units. Still SPE is present in e200/e500v2 cores while AltiVec is present in e6500 core. So we can currently decide at compile-time which unit to support exclusively. As Alexander Graf suggested, this will improve code readability especially in KVM. Use distinct defines to identify SPE/AltiVec interrupt numbers, reverting c58ce397 and 6b310fc5 patches that added common defines. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com Cc: Scott Wood scottw...@freescale.com Cc: Alexander Graf ag...@suse.de --- arch/powerpc/kernel/exceptions-64e.S | 4 ++-- arch/powerpc/kernel/head_fsl_booke.S | 8 2 files changed, 6 insertions(+), 6 deletions(-) Acked-by: Scott Wood scottw...@freescale.com -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html