Re: qemu-kvm crash with
On Thu, Mar 24, 2011 at 1:38 PM, Conor Murphy conor_murphy_v...@hotmail.com wrote: #4 _int_free (av=value optimized out, p=0x7fa24c0009f0, have_lock=0) at malloc.c:4795 #5 0x004a18fe in qemu_vfree (ptr=0x7fa24c000a00) at oslib-posix.c:76 #6 0x0045af3d in handle_aiocb_rw (aiocb=0x7fa2dc034cd0) at posix-aio-compat.c:301 I don't see a way for a double-free to occur so I think something has overwritten the memory preceeding the allocated buffer. In gdb you could inspect the aiocb structure to look at its aio_iov[], aio_niov, and aio_nbytes fields. They might be invalid or corrupted somehow. You could also dump out the memory before 0x7fa24c000a00, specifically 0x7fa24c0009f0, to see if you notice any pattern or printable characters that give a clue as to what has corrupted the memory here. Are you running qemu-kvm.git/master? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] fix regression caused by e48672fa25e879f7ae21785c7efd187738139593
On 03/09/2011 05:36 PM, Nikola Ciprich wrote: commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: Nikola Ciprichnikola.cipr...@linuxbox.cz --- diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4c27144..ba3f76f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2101,8 +2101,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); vcpu-arch.tsc_catchup = 1; - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); } + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); if (vcpu-cpu != cpu) kvm_migrate_timers(vcpu); vcpu-cpu = cpu; So something bothers me still about this bug. What you did correctly restores the old behavior - but it shouldn't be fixing a bug. The only reason you need to schedule an update for the KVM clock area is if a new VCPU has been created, you have an unstable TSC.. or something changes the VM's kvmclock offset. So this change could in fact be hiding an underlying bug - either an unstable TSC is not being properly reported, the KVM clock offset is being changed, we are missing a KVM clock update for secondary VCPUs - or something else we don't yet understand is going on. Nikola, can you try the patch below, which reverts your change and attempts to fix other possible sources of the problem, and see if it still reproduces? Thanks, Zach diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 58f517b..42618fb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2127,8 +2127,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) if (check_tsc_unstable()) { kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta); vcpu-arch.tsc_catchup = 1; + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); } - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + if (vcpu-cpu == -1) + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); if (vcpu-cpu != cpu) kvm_migrate_timers(vcpu); vcpu-cpu = cpu; @@ -3534,6 +3536,8 @@ long kvm_arch_vm_ioctl(struct file *filp, struct kvm_clock_data user_ns; u64 now_ns; s64 delta; + struct kvm_vcpu *vcpu; + int i; r = -EFAULT; if (copy_from_user(user_ns, argp, sizeof(user_ns))) @@ -3549,6 +3553,8 @@ long kvm_arch_vm_ioctl(struct file *filp, delta = user_ns.clock - now_ns; local_irq_enable(); kvm-arch.kvmclock_offset = delta; + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); break; } case KVM_GET_CLOCK: {
[PATCH 1/6] KVM: SVM: Implement infrastructure for TSC_RATE_MSR
This patch enhances the kvm_amd module with functions to support the TSC_RATE_MSR which can be used to set a given tsc frequency for the guest vcpu. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/msr-index.h |1 + arch/x86/kvm/svm.c | 54 +- 2 files changed, 54 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index fd5a1f3..a7b3e40 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -114,6 +114,7 @@ complete list. */ #define MSR_AMD64_PATCH_LEVEL 0x008b +#define MSR_AMD64_TSC_RATIO0xc104 #define MSR_AMD64_NB_CFG 0xc001001f #define MSR_AMD64_PATCH_LOADER 0xc0010020 #define MSR_AMD64_OSVW_ID_LENGTH 0xc0010140 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2a19322..2ce734c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -63,6 +63,8 @@ MODULE_LICENSE(GPL); #define DEBUGCTL_RESERVED_BITS (~(0x3fULL)) +#define TSC_RATIO_RSVD 0xff00ULL + static bool erratum_383_found __read_mostly; static const u32 host_save_user_msrs[] = { @@ -144,8 +146,13 @@ struct vcpu_svm { unsigned int3_injected; unsigned long int3_rip; u32 apf_reason; + + u64 tsc_ratio; }; +static DEFINE_PER_CPU(u64, current_tsc_ratio); +#define TSC_RATIO_DEFAULT 0x01ULL + #define MSR_INVALID0xU static struct svm_direct_access_msrs { @@ -569,6 +576,10 @@ static int has_svm(void) static void svm_hardware_disable(void *garbage) { + /* Make sure we clean up behind us */ + if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) + wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT); + cpu_svm_disable(); } @@ -610,6 +621,11 @@ static int svm_hardware_enable(void *garbage) wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd-save_area) PAGE_SHIFT); + if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { + wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT); + __get_cpu_var(current_tsc_ratio) = TSC_RATIO_DEFAULT; + } + svm_init_erratum_383(); return 0; @@ -854,6 +870,32 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t type) seg-base = 0; } +static u64 __scale_tsc(u64 ratio, u64 tsc) +{ + u64 mult, frac, _tsc; + + mult = ratio 32; + frac = ratio ((1ULL 32) - 1); + + _tsc = tsc; + _tsc *= mult; + _tsc += (tsc 32) * frac; + _tsc += ((tsc ((1ULL 32) - 1)) * frac) 32; + + return _tsc; +} + +static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc) +{ + struct vcpu_svm *svm = to_svm(vcpu); + u64 _tsc = tsc; + + if (svm-tsc_ratio != TSC_RATIO_DEFAULT) + _tsc = __scale_tsc(svm-tsc_ratio, tsc); + + return _tsc; +} + static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) { struct vcpu_svm *svm = to_svm(vcpu); @@ -1048,6 +1090,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id) goto out; } + svm-tsc_ratio = TSC_RATIO_DEFAULT; + err = kvm_vcpu_init(svm-vcpu, kvm, id); if (err) goto free_svm; @@ -1141,6 +1185,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) for (i = 0; i NR_HOST_SAVE_USER_MSRS; i++) rdmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]); + + if (static_cpu_has(X86_FEATURE_TSCRATEMSR) + svm-tsc_ratio != __get_cpu_var(current_tsc_ratio)) { + __get_cpu_var(current_tsc_ratio) = svm-tsc_ratio; + wrmsrl(MSR_AMD64_TSC_RATIO, svm-tsc_ratio); + } } static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -2813,7 +2863,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_IA32_TSC: { struct vmcb *vmcb = get_host_vmcb(svm); - *data = vmcb-control.tsc_offset + native_read_tsc(); + *data = vmcb-control.tsc_offset + + svm_scale_tsc(vcpu, native_read_tsc()); + break; } case MSR_STAR: -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] KVM: X86: Delegate tsc-offset calculation to architecture code
With TSC scaling in SVM the tsc-offset needs to be calculated differently. This patch propagates this calculation into the architecture specific modules so that this complexity can be handled there. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |2 ++ arch/x86/kvm/svm.c | 10 ++ arch/x86/kvm/vmx.c |6 ++ arch/x86/kvm/x86.c | 10 +- 4 files changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 9958dd8..7f48528 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -591,6 +591,8 @@ struct kvm_x86_ops { void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz); void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); + u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc); + void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2); const struct trace_print_flags *exit_reasons_str; }; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index f6d66c2..38a4bcc 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -954,6 +954,15 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) mark_dirty(svm-vmcb, VMCB_INTERCEPTS); } +static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) +{ + u64 tsc; + + tsc = svm_scale_tsc(vcpu, native_read_tsc()); + + return target_tsc - tsc; +} + static void init_vmcb(struct vcpu_svm *svm) { struct vmcb_control_area *control = svm-vmcb-control; @@ -4039,6 +4048,7 @@ static struct kvm_x86_ops svm_x86_ops = { .set_tsc_khz = svm_set_tsc_khz, .write_tsc_offset = svm_write_tsc_offset, .adjust_tsc_offset = svm_adjust_tsc_offset, + .compute_tsc_offset = svm_compute_tsc_offset, .set_tdp_cr3 = set_tdp_cr3, }; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0e5dfc6..c4f077a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1184,6 +1184,11 @@ static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) vmcs_write64(TSC_OFFSET, offset + adjustment); } +static u64 vmx_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) +{ + return target_tsc - native_read_tsc(); +} + /* * Reads an msr value (of 'msr_index') into 'pdata'. * Returns 0 on success, non-0 otherwise. @@ -4509,6 +4514,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .set_tsc_khz = vmx_set_tsc_khz, .write_tsc_offset = vmx_write_tsc_offset, .adjust_tsc_offset = vmx_adjust_tsc_offset, + .compute_tsc_offset = vmx_compute_tsc_offset, .set_tdp_cr3 = vmx_set_cr3, }; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 47dd6ed..2f0b552 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -990,7 +990,7 @@ static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu) return __this_cpu_read(cpu_tsc_khz); } -static inline u64 nsec_to_cycles(u64 nsec) +static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec) { u64 ret; @@ -998,7 +998,7 @@ static inline u64 nsec_to_cycles(u64 nsec) if (kvm_tsc_changes_freq()) printk_once(KERN_WARNING kvm: unreliable cycle conversion on adjustable rate TSC\n); - ret = nsec * __this_cpu_read(cpu_tsc_khz); + ret = nsec * vcpu_tsc_khz(vcpu); do_div(ret, USEC_PER_SEC); return ret; } @@ -1028,7 +1028,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data) s64 sdiff; raw_spin_lock_irqsave(kvm-arch.tsc_write_lock, flags); - offset = data - native_read_tsc(); + offset = kvm_x86_ops-compute_tsc_offset(vcpu, data); ns = get_kernel_ns(); elapsed = ns - kvm-arch.last_tsc_nsec; sdiff = data - kvm-arch.last_tsc_write; @@ -1044,13 +1044,13 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data) * In that case, for a reliable TSC, we can match TSC offsets, * or make a best guest using elapsed value. */ - if (sdiff nsec_to_cycles(5ULL * NSEC_PER_SEC) + if (sdiff nsec_to_cycles(vcpu, 5ULL * NSEC_PER_SEC) elapsed 5ULL * NSEC_PER_SEC) { if (!check_tsc_unstable()) { offset = kvm-arch.last_tsc_offset; pr_debug(kvm: matched tsc offset for %llu\n, data); } else { - u64 delta = nsec_to_cycles(elapsed); + u64 delta = nsec_to_cycles(vcpu, elapsed); offset += delta; pr_debug(kvm: adjusted tsc offset by %llu\n, delta); } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] KVM: X86: Implement call-back to propagate virtual_tsc_khz
This patch implements a call-back into the architecture code to allow the propagation of changes to the virtual tsc_khz of the vcpu. On SVM it updates the tsc_ratio variable, on VMX it does nothing. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/svm.c | 33 + arch/x86/kvm/vmx.c | 11 +++ 3 files changed, 45 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0344b94..9958dd8 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -588,6 +588,7 @@ struct kvm_x86_ops { bool (*has_wbinvd_exit)(void); + void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz); void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2); diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2ce734c..f6d66c2 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -896,6 +896,38 @@ static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc) return _tsc; } +static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz) +{ + struct vcpu_svm *svm = to_svm(vcpu); + u64 ratio; + u64 khz; + + /* TSC scaling supported? */ + if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) + return; + + /* TSC-Scaling disabled or guest TSC same frequency as host TSC? */ + if (user_tsc_khz == 0) { + vcpu-arch.virtual_tsc_khz = 0; + svm-tsc_ratio = TSC_RATIO_DEFAULT; + return; + } + + khz = user_tsc_khz; + + /* TSC scaling required - calculate ratio */ + ratio = khz 32; + do_div(ratio, tsc_khz); + + if (ratio == 0 || ratio TSC_RATIO_RSVD) { + WARN_ONCE(1, Invalid TSC ratio - virtual-tsc-khz=%u\n, + user_tsc_khz); + return; + } + vcpu-arch.virtual_tsc_khz = user_tsc_khz; + svm-tsc_ratio = ratio; +} + static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) { struct vcpu_svm *svm = to_svm(vcpu); @@ -4004,6 +4036,7 @@ static struct kvm_x86_ops svm_x86_ops = { .has_wbinvd_exit = svm_has_wbinvd_exit, + .set_tsc_khz = svm_set_tsc_khz, .write_tsc_offset = svm_write_tsc_offset, .adjust_tsc_offset = svm_adjust_tsc_offset, diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 1bdb49d..0e5dfc6 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1161,6 +1161,16 @@ static u64 guest_read_tsc(void) } /* + * Empty call-back. Needs to be implemented when VMX enables the SET_TSC_KHZ + * ioctl. In this case the call-back should update internal vmx state to make + * the changes effective. + */ +static void vmx_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz) +{ + /* Nothing to do here */ +} + +/* * writes 'offset' into guest's timestamp counter offset register */ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) @@ -4496,6 +4506,7 @@ static struct kvm_x86_ops vmx_x86_ops = { .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, + .set_tsc_khz = vmx_set_tsc_khz, .write_tsc_offset = vmx_write_tsc_offset, .adjust_tsc_offset = vmx_adjust_tsc_offset, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/6] KVM: X86: Let kvm-clock report the right tsc frequency
This patch changes the kvm_guest_time_update function to use TSC frequency the guest actually has for updating its clock. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |6 +++--- arch/x86/kvm/x86.c | 25 +++-- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 35f81b1..0344b94 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -380,7 +380,10 @@ struct kvm_vcpu_arch { u64 last_kernel_ns; u64 last_tsc_nsec; u64 last_tsc_write; + u32 virtual_tsc_khz; bool tsc_catchup; + u32 tsc_catchup_mult; + s8 tsc_catchup_shift; bool nmi_pending; bool nmi_injected; @@ -450,9 +453,6 @@ struct kvm_arch { u64 last_tsc_nsec; u64 last_tsc_offset; u64 last_tsc_write; - u32 virtual_tsc_khz; - u32 virtual_tsc_mult; - s8 virtual_tsc_shift; struct kvm_xen_hvm_config xen_hvm_config; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1b8b16a..1e7af86 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -982,6 +982,14 @@ static inline int kvm_tsc_changes_freq(void) return ret; } +static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu) +{ + if (vcpu-arch.virtual_tsc_khz) + return vcpu-arch.virtual_tsc_khz; + else + return __this_cpu_read(cpu_tsc_khz); +} + static inline u64 nsec_to_cycles(u64 nsec) { u64 ret; @@ -995,20 +1003,19 @@ static inline u64 nsec_to_cycles(u64 nsec) return ret; } -static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz) +static void kvm_init_tsc_catchup(struct kvm_vcpu *vcpu, u32 this_tsc_khz) { /* Compute a scale to convert nanoseconds in TSC cycles */ kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000, - kvm-arch.virtual_tsc_shift, - kvm-arch.virtual_tsc_mult); - kvm-arch.virtual_tsc_khz = this_tsc_khz; + vcpu-arch.tsc_catchup_shift, + vcpu-arch.tsc_catchup_mult); } static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) { u64 tsc = pvclock_scale_delta(kernel_ns-vcpu-arch.last_tsc_nsec, - vcpu-kvm-arch.virtual_tsc_mult, - vcpu-kvm-arch.virtual_tsc_shift); + vcpu-arch.tsc_catchup_mult, + vcpu-arch.tsc_catchup_shift); tsc += vcpu-arch.last_tsc_write; return tsc; } @@ -1075,8 +1082,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) local_irq_save(flags); kvm_get_msr(v, MSR_IA32_TSC, tsc_timestamp); kernel_ns = get_kernel_ns(); - this_tsc_khz = __this_cpu_read(cpu_tsc_khz); - + this_tsc_khz = vcpu_tsc_khz(v); if (unlikely(this_tsc_khz == 0)) { local_irq_restore(flags); kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); @@ -5955,8 +5961,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) } vcpu-arch.pio_data = page_address(page); - if (!kvm-arch.virtual_tsc_khz) - kvm_arch_set_tsc_khz(kvm, max_tsc_khz); + kvm_init_tsc_catchup(vcpu, max_tsc_khz); r = kvm_mmu_create(vcpu); if (r 0) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/6] KVM: X86: Make tsc_delta calculation a function of guest tsc
The calculation of the tsc_delta value to ensure a forward-going tsc for the guest is a function of the host-tsc. This works as long as the guests tsc_khz is equal to the hosts tsc_khz. With tsc-scaling hardware support this is not longer true and the tsc_delta needs to be calculated using guest_tsc values. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/x86.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 1e7af86..47dd6ed 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2126,8 +2126,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_x86_ops-vcpu_load(vcpu, cpu); if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) { /* Make sure TSC doesn't go backwards */ - s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 : - native_read_tsc() - vcpu-arch.last_host_tsc; + s64 tsc_delta; + u64 tsc; + + kvm_get_msr(vcpu, MSR_IA32_TSC, tsc); + tsc_delta = !vcpu-arch.last_guest_tsc ? 0 : +tsc - vcpu-arch.last_guest_tsc; + if (tsc_delta 0) mark_tsc_unstable(KVM discovered backwards TSC); if (check_tsc_unstable()) { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] KVM: X86: Implement userspace interface to set virtual_tsc_khz
This patch implements two new vm-ioctls to get and set the virtual_tsc_khz if the machine supports tsc-scaling. Setting the tsc-frequency is only possible before userspace creates any vcpu. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- Documentation/kvm/api.txt | 23 +++ arch/x86/include/asm/kvm_host.h |7 +++ arch/x86/kvm/svm.c | 20 arch/x86/kvm/x86.c | 35 +++ include/linux/kvm.h |5 + 5 files changed, 90 insertions(+), 0 deletions(-) diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 9bef4e4..1b9eaa7 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt @@ -1263,6 +1263,29 @@ struct kvm_assigned_msix_entry { __u16 padding[3]; }; +4.54 KVM_SET_TSC_KHZ + +Capability: KVM_CAP_TSC_CONTROL +Architectures: x86 +Type: vcpu ioctl +Parameters: virtual tsc_khz +Returns: 0 on success, -1 on error + +Specifies the tsc frequency for the virtual machine. The unit of the +frequency is KHz. + +4.55 KVM_GET_TSC_KHZ + +Capability: KVM_CAP_GET_TSC_KHZ +Architectures: x86 +Type: vcpu ioctl +Parameters: none +Returns: virtual tsc-khz on success, negative value on error + +Returns the tsc frequency of the guest. The unit of the return value is +KHz. If the host has unstable tsc this ioctl returns -EIO instead as an +error. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 7f48528..473a3be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -632,6 +632,13 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn); extern bool tdp_enabled; +/* control of guest tsc rate supported? */ +extern bool kvm_has_tsc_control; +/* minimum supported tsc_khz for guests */ +extern u32 kvm_min_guest_tsc_khz; +/* maximum supported tsc_khz for guests */ +extern u32 kvm_max_guest_tsc_khz; + enum emulation_result { EMULATE_DONE, /* no further processing */ EMULATE_DO_MMIO, /* kvm_run filled with mmio request */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 38a4bcc..a5c1b5b 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -64,6 +64,8 @@ MODULE_LICENSE(GPL); #define DEBUGCTL_RESERVED_BITS (~(0x3fULL)) #define TSC_RATIO_RSVD 0xff00ULL +#define TSC_RATIO_MIN 0x0001ULL +#define TSC_RATIO_MAX 0x00ffULL static bool erratum_383_found __read_mostly; @@ -197,6 +199,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm); static int nested_svm_vmexit(struct vcpu_svm *svm); static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, bool has_error_code, u32 error_code); +static u64 __scale_tsc(u64 ratio, u64 tsc); enum { VMCB_INTERCEPTS, /* Intercept vectors, TSC offset, @@ -807,6 +810,23 @@ static __init int svm_hardware_setup(void) if (boot_cpu_has(X86_FEATURE_FXSR_OPT)) kvm_enable_efer_bits(EFER_FFXSR); + if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) { + u64 max; + + kvm_has_tsc_control = true; + + /* +* Make sure the user can only configure tsc_khz values that +* fit into a signed integer. +* A min value is not calculated needed because it will always +* be 1 on all machines and a value of 0 is used to disable +* tsc-scaling for the vcpu. +*/ + max = min(0x7fffULL, __scale_tsc(tsc_khz, TSC_RATIO_MAX)); + + kvm_max_guest_tsc_khz = max; + } + if (nested) { printk(KERN_INFO kvm: Nested Virtualization enabled\n); kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2f0b552..5cc9a44 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -100,6 +100,11 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops); int ignore_msrs = 0; module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR); +bool kvm_has_tsc_control; +EXPORT_SYMBOL_GPL(kvm_has_tsc_control); +u32 kvm_max_guest_tsc_khz; +EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz); + #define KVM_NR_SHARED_MSRS 16 struct kvm_shared_msrs_global { @@ -1999,6 +2004,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_X86_ROBUST_SINGLESTEP: case KVM_CAP_XSAVE: case KVM_CAP_ASYNC_PF: + case KVM_CAP_GET_TSC_KHZ: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -2025,6 +2031,9 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_XCRS: r = cpu_has_xsave; break; + case KVM_CAP_TSC_CONTROL: + r = kvm_has_tsc_control; +
[PATCH 0/6] TSC scaling support for KVM v3
Hi, this is the third round of my patches to support tsc-scaling in KVM. The changes to v2 address Avi's comments from yesterday. Besides that the whole virtual_tsc_khz thing has been moved out of the vm into the vcpu data structure. The mult and shift parts where renamed to tsc_catchup_* because this is their actual use (and because the handling of virtual_tsc_khz has changed so that is made sense to seperate them). Comments and feedback (or merging) appreciated :-) Regards, Joerg Diffstat: Documentation/kvm/api.txt| 23 arch/x86/include/asm/kvm_host.h | 16 - arch/x86/include/asm/msr-index.h |1 + arch/x86/kvm/svm.c | 117 +- arch/x86/kvm/vmx.c | 17 ++ arch/x86/kvm/x86.c | 79 -- include/linux/kvm.h |5 ++ 7 files changed, 237 insertions(+), 21 deletions(-) Shortlog: Joerg Roedel (6): KVM: SVM: Implement infrastructure for TSC_RATE_MSR KVM: X86: Let kvm-clock report the right tsc frequency KVM: X86: Make tsc_delta calculation a function of guest tsc KVM: X86: Implement call-back to propagate virtual_tsc_khz KVM: X86: Delegate tsc-offset calculation to architecture code KVM: X86: Implement userspace interface to set virtual_tsc_khz -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm crash with
Hi, The content of aiocb (gdb) print *aiocb $1 = {common = {pool = 0x9aced0, bs = 0x1270230, cb = 0x45591f multiwrite_cb, opaque = 0x7f54b0034f60, next = 0x0}, aio_fildes = 16, {aio_iov = 0x7f54b006cd48, aio_ioctl_buf = 0x7f54b006cd48}, aio_niov = 17, aio_nbytes = 65024, ev_signo = 12, aio_offset = 1081344, node = {tqe_next = 0x0, tqe_prev = 0x9f10a0}, aio_type = 2, ret = -115, active = 1, next = 0x7f54b00409f0, async_context_id = 0} (gdb) print aiocb-aio_iov[0] $2 = {iov_base = 0x7f54a9f141f8, iov_len = 3592} (gdb) print aiocb-aio_iov[1] $3 = {iov_base = 0x7f54a27d5000, iov_len = 4096} (gdb) print aiocb-aio_iov[2] $4 = {iov_base = 0x7f54a30d6000, iov_len = 4096} (gdb) print aiocb-aio_iov[3] $5 = {iov_base = 0x7f5433a57000, iov_len = 4096} (gdb) print aiocb-aio_iov[5] $6 = {iov_base = 0x7f54a2fd9000, iov_len = 4096} (gdb) print aiocb-aio_iov[6] $7 = {iov_base = 0x7f54a275a000, iov_len = 4096} (gdb) print aiocb-aio_iov[7] $8 = {iov_base = 0x7f54a2fdb000, iov_len = 4096} (gdb) print aiocb-aio_iov[8] $9 = {iov_base = 0x7f54ab55c000, iov_len = 4096} (gdb) print aiocb-aio_iov[9] $10 = {iov_base = 0x7f543639d000, iov_len = 4096} (gdb) print aiocb-aio_iov[10] $11 = {iov_base = 0x7f543115e000, iov_len = 4096} (gdb) print aiocb-aio_iov[11] $12 = {iov_base = 0x7f54361df000, iov_len = 4096} (gdb) print aiocb-aio_iov[12] $13 = {iov_base = 0x7f54a962, iov_len = 4096} (gdb) print aiocb-aio_iov[13] $14 = {iov_base = 0x7f54a23a1000, iov_len = 4096} (gdb) print aiocb-aio_iov[14] $15 = {iov_base = 0x7f54ae122000, iov_len = 4096} (gdb) print aiocb-aio_iov[15] $16 = {iov_base = 0x7f54312a3000, iov_len = 4096} (gdb) print aiocb-aio_iov[16] $17 = {iov_base = 0x7f54a28a4000, iov_len = 503} (gdb) The one thing that seems odd is that the sum of iov_len is 65535 which is then aio_nbtyes of 65024 Does this mean the code ends up writing past the end of buf? /Conor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 01/13] KVM: x86 emulator: add framework for instruction
From: Avi Kivity a...@redhat.com When running in guest mode, certain instructions can be intercepted by hardware. This also holds for nested guests running on emulated virtualization hardware, in particular instructions emulated by kvm itself. This patch adds a framework for intercepting instructions. If an instruction is marked for interception, and if we're running in guest mode, a callback is called to check whether an intercept is needed or not. The callback is called at three points in time: immediately after beginning execution, after checking privilge exceptions, and after checking memory exception. This suits the different interception points defined for different instructions and for the various virtualization instruction sets. In addition, a new X86EMUL_INTERCEPT is defined, which any callback or memory access may define, allowing the more complicated intercepts to be implemented in existing callbacks. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h | 20 arch/x86/kvm/emulate.c | 26 ++ arch/x86/kvm/x86.c |9 + 3 files changed, 55 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 0f52135..4b9efb7 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -14,6 +14,8 @@ #include asm/desc_defs.h struct x86_emulate_ctxt; +enum x86_intercept; +enum x86_intercept_stage; struct x86_exception { u8 vector; @@ -62,6 +64,7 @@ struct x86_exception { #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */ #define X86EMUL_CMPXCHG_FAILED 4 /* cmpxchg did not see expected value */ #define X86EMUL_IO_NEEDED 5 /* IO is needed to complete emulation */ +#define X86EMUL_INTERCEPTED 6 /* Intercepted by nested VMCB/VMCS */ struct x86_emulate_ops { /* @@ -158,6 +161,9 @@ struct x86_emulate_ops { int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu); int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data); int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata); + int (*intercept)(struct x86_emulate_ctxt *ctxt, +enum x86_intercept intercept, +enum x86_intercept_stage stage); }; /* Type, address-of, and value of an instruction's operand. */ @@ -197,6 +203,7 @@ struct read_cache { struct decode_cache { u8 twobyte; u8 b; + u8 intercept; u8 lock_prefix; u8 rep_prefix; u8 op_bytes; @@ -238,6 +245,7 @@ struct x86_emulate_ctxt { /* interruptibility state, as a result of execution of STI or MOV SS */ int interruptibility; + bool guest_mode; /* guest running a nested guest */ bool perm_ok; /* do not check permissions if true */ bool only_vendor_specific_insn; @@ -259,6 +267,18 @@ struct x86_emulate_ctxt { #define X86EMUL_MODE_PROT32 4/* 32-bit protected mode. */ #define X86EMUL_MODE_PROT64 8/* 64-bit (long) mode.*/ +enum x86_intercept_stage { + x86_icpt_pre_except, + x86_icpt_post_except, + x86_icpt_post_memaccess, +}; + +enum x86_intercept { + x86_intercept_none, + + nr_x86_intercepts +}; + /* Host execution mode. */ #if defined(CONFIG_X86_32) #define X86EMUL_MODE_HOST X86EMUL_MODE_PROT32 diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 14c5ad5..8c6af7e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -102,6 +102,7 @@ struct opcode { u32 flags; + u8 intercept; union { int (*execute)(struct x86_emulate_ctxt *ctxt); struct opcode *group; @@ -2326,10 +2327,13 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) } #define D(_y) { .flags = (_y) } +#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i } #define ND(0) #define G(_f, _g) { .flags = ((_f) | Group), .u.group = (_g) } #define GD(_f, _g) { .flags = ((_f) | Group | GroupDual), .u.gdual = (_g) } #define I(_f, _e) { .flags = (_f), .u.execute = (_e) } +#define II(_f, _e, _i) \ + { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i } #define D2bv(_f) D((_f) | ByteOp), D(_f) #define I2bv(_f, _e) I((_f) | ByteOp, _e), I(_f, _e) @@ -2745,6 +2749,7 @@ done_prefixes: } c-execute = opcode.u.execute; + c-intercept = opcode.intercept; /* Unrecognised? */ if (c-d == 0 || (c-d Undefined)) @@ -2979,12 +2984,26 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt) goto done; } + if (unlikely(ctxt-guest_mode) c-intercept) { + rc = ops-intercept(ctxt, c-intercept, + x86_icpt_pre_except); + if (rc !=
[PATCH 07/13] KVM: SVM: Add intercept checks for descriptor table accesses
This patch add intercept checks into the KVM instruction emulator to check for the 8 instructions that access the descriptor table addresses. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/emulate.c | 13 +++-- arch/x86/kvm/svm.c | 13 + 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0719954..505348f 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2370,8 +2370,17 @@ static struct opcode group5[] = { D(SrcMem | ModRM | Stack), N, }; +static struct opcode group6[] = { + DI(ModRM,sldt), + DI(ModRM,str), + DI(ModRM | Priv, lldt), + DI(ModRM | Priv, ltr), + N, N, N, N, +}; + static struct group_dual group7 = { { - N, N, DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt), + DI(ModRM | DstMem | Priv, sgdt), DI(ModRM | DstMem | Priv, sidt), + DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt), DI(SrcNone | ModRM | DstMem | Mov, smsw), N, DI(SrcMem16 | ModRM | Mov | Priv, lmsw), DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg), @@ -2502,7 +2511,7 @@ static struct opcode opcode_table[256] = { static struct opcode twobyte_table[256] = { /* 0x00 - 0x0F */ - N, GD(0, group7), N, N, + G(0, group6), GD(0, group7), N, N, N, D(ImplicitOps | VendorSpecific), DI(ImplicitOps | Priv, clts), N, DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, N, D(ImplicitOps | ModRM), N, N, diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 25d7460..faa959e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3874,6 +3874,10 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) #define POST_EX(exit) { .exit_code = (exit), \ .stage = x86_icpt_post_except, \ .valid = true } +#define POST_MEM(exit) { .exit_code = (exit), \ +.stage = x86_icpt_post_memaccess, \ +.valid = true } + static struct __x86_intercept { u32 exit_code; @@ -3887,9 +3891,18 @@ static struct __x86_intercept { [x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0), [x86_intercept_dr_read] = POST_EX(SVM_EXIT_READ_DR0), [x86_intercept_dr_write]= POST_EX(SVM_EXIT_WRITE_DR0), + [x86_intercept_sldt]= POST_MEM(SVM_EXIT_LDTR_READ), + [x86_intercept_str] = POST_MEM(SVM_EXIT_TR_READ), + [x86_intercept_lldt]= POST_MEM(SVM_EXIT_LDTR_WRITE), + [x86_intercept_ltr] = POST_MEM(SVM_EXIT_TR_WRITE), + [x86_intercept_sgdt]= POST_MEM(SVM_EXIT_GDTR_READ), + [x86_intercept_sidt]= POST_MEM(SVM_EXIT_IDTR_READ), + [x86_intercept_lgdt]= POST_MEM(SVM_EXIT_GDTR_WRITE), + [x86_intercept_lidt]= POST_MEM(SVM_EXIT_IDTR_WRITE), }; #undef POST_EX +#undef POST_MEM static int svm_check_intercept(struct kvm_vcpu *vcpu, struct x86_instruction_info *info, -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 09/13] KVM: SVM: Add intercept checks for remaining group7 instructions
This patch implements the emulator intercept checks for the RDTSCP, MONITOR, and MWAIT instructions. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/emulate.c | 15 +-- arch/x86/kvm/svm.c |3 +++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dc53806..0aaba1e 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2346,6 +2346,12 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) D2bv(((_f) ~Lock) | DstAcc | SrcImm) +static struct opcode group7_rm1[] = { + DI(SrcNone | ModRM | Priv, monitor), + DI(SrcNone | ModRM | Priv, mwait), + N, N, N, N, N, N, +}; + static struct opcode group7_rm3[] = { DI(SrcNone | ModRM | Priv, vmrun), DI(SrcNone | ModRM | Priv, vmmcall), @@ -2357,6 +2363,11 @@ static struct opcode group7_rm3[] = { DI(SrcNone | ModRM | Priv, invlpga), }; +static struct opcode group7_rm7[] = { + N, + DI(SrcNone | ModRM, rdtscp), + N, N, N, N, N, N, +}; static struct opcode group1[] = { X7(D(Lock)), N }; @@ -2399,10 +2410,10 @@ static struct group_dual group7 = { { DI(SrcMem16 | ModRM | Mov | Priv, lmsw), DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg), }, { - D(SrcNone | ModRM | Priv | VendorSpecific), N, + D(SrcNone | ModRM | Priv | VendorSpecific), EXT(0, group7_rm1), N, EXT(0, group7_rm3), DI(SrcNone | ModRM | DstMem | Mov, smsw), N, - DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N, + DI(SrcMem16 | ModRM | Mov | Priv, lmsw), EXT(0, group7_rm7), } }; static struct opcode group8[] = { diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index dded390..958697e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3907,6 +3907,9 @@ static struct __x86_intercept { [x86_intercept_clgi]= POST_EX(SVM_EXIT_CLGI), [x86_intercept_skinit] = POST_EX(SVM_EXIT_SKINIT), [x86_intercept_invlpga] = POST_EX(SVM_EXIT_INVLPGA), + [x86_intercept_rdtscp] = POST_EX(SVM_EXIT_RDTSCP), + [x86_intercept_monitor] = POST_MEM(SVM_EXIT_MONITOR), + [x86_intercept_mwait] = POST_EX(SVM_EXIT_MWAIT), }; #undef POST_EX -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 11/13] KVM: SVM: Add intercept checks for one-byte instructions
This patch add intercept checks for emulated one-byte instructions to the KVM instruction emulation path. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/svm.c | 14 ++ 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8947643..4c0939d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2484,7 +2484,7 @@ static struct opcode opcode_table[256] = { D(DstMem | SrcNone | ModRM | Mov), D(ModRM | SrcMem | NoAccess | DstReg), D(ImplicitOps | SrcMem16 | ModRM), G(0, group1A), /* 0x90 - 0x97 */ - X8(D(SrcAcc | DstReg)), + DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)), /* 0x98 - 0x9F */ D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), I(SrcImmFAddr | No64, em_call_far), N, @@ -2526,7 +2526,7 @@ static struct opcode opcode_table[256] = { D(SrcImmFAddr | No64), D(SrcImmByte | ImplicitOps), D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps), /* 0xF0 - 0xF7 */ - N, N, N, N, + N, DI(ImplicitOps, icebp), N, N, DI(ImplicitOps | Priv, hlt), D(ImplicitOps), G(ByteOp, group3), G(0, group3), /* 0xF8 - 0xFF */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index c2e90bb..847a3f9 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3922,6 +3922,13 @@ static struct __x86_intercept { [x86_intercept_rdpmc] = POST_EX(SVM_EXIT_RDPMC), [x86_intercept_cpuid] = PRE_EX(SVM_EXIT_CPUID), [x86_intercept_rsm] = PRE_EX(SVM_EXIT_RSM), + [x86_intercept_pause] = PRE_EX(SVM_EXIT_PAUSE), + [x86_intercept_pushf] = PRE_EX(SVM_EXIT_PUSHF), + [x86_intercept_popf]= PRE_EX(SVM_EXIT_POPF), + [x86_intercept_intn]= PRE_EX(SVM_EXIT_SWINT), + [x86_intercept_iret]= PRE_EX(SVM_EXIT_IRET), + [x86_intercept_icebp] = PRE_EX(SVM_EXIT_ICEBP), + [x86_intercept_hlt] = POST_EX(SVM_EXIT_HLT), }; #undef PRE_EX @@ -3990,6 +3997,13 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, else vmcb-control.exit_info_1 = 0; break; + case SVM_EXIT_PAUSE: + /* +* We get this for NOP only, but pause +* is rep not, check this here +*/ + if (info-rep_prefix != REPE_PREFIX) + goto out; default: break; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 04/13] KVM: X86: Add x86 callback for intercept check
This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h | 21 + arch/x86/kvm/svm.c |9 + arch/x86/kvm/x86.c | 20 +++- 3 files changed, 49 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 35f81b1..7544964 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -504,6 +504,22 @@ struct kvm_vcpu_stat { u32 nmi_injections; }; +/* + * This struct is used to carry enough information from the instruction + * decoder to main KVM so that a decision can be made whether the + * instruction needs to be intercepted or not. + */ +struct x86_instruction_info { + u8 intercept; /* which intercept */ + u8 rep_prefix; /* rep prefix? */ + u8 modrm; /* index of register used */ + u64 src_val;/* value of source operand */ + u8 src_bytes; /* size of source operand */ + u8 dst_bytes; /* size of destination operand */ + u8 ad_bytes; /* size of src/dst address */ + u64 next_rip; /* rip following the instruction*/ +}; + struct kvm_x86_ops { int (*cpu_has_kvm_support)(void); /* __init */ int (*disabled_by_bios)(void); /* __init */ @@ -591,6 +607,11 @@ struct kvm_x86_ops { void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2); + + int (*check_intercept)(struct kvm_vcpu *vcpu, + struct x86_instruction_info *info, + enum x86_intercept_stage stage); + const struct trace_print_flags *exit_reasons_str; }; diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2a19322..b36df64 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3871,6 +3871,13 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) update_cr0_intercept(svm); } +static int svm_check_intercept(struct kvm_vcpu *vcpu, + struct x86_instruction_info *info, + enum x86_intercept_stage stage) +{ + return X86EMUL_CONTINUE; +} + static struct kvm_x86_ops svm_x86_ops = { .cpu_has_kvm_support = has_svm, .disabled_by_bios = is_disabled, @@ -3956,6 +3963,8 @@ static struct kvm_x86_ops svm_x86_ops = { .adjust_tsc_offset = svm_adjust_tsc_offset, .set_tdp_cr3 = set_tdp_cr3, + + .check_intercept = svm_check_intercept, }; static int __init svm_init(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 90a41aa..bf72ec6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4245,7 +4245,25 @@ static int emulator_intercept(struct x86_emulate_ctxt *ctxt, enum x86_intercept intercept, enum x86_intercept_stage stage) { - return X86EMUL_CONTINUE; + struct x86_instruction_info info = { + .intercept = intercept, + .rep_prefix = ctxt-decode.rep_prefix, + .modrm = ctxt-decode.modrm, + .src_val= ctxt-decode.src.val64, + .src_bytes = ctxt-decode.src.bytes, + .dst_bytes = ctxt-decode.dst.bytes, + .ad_bytes = ctxt-decode.ad_bytes, + .next_rip = ctxt-eip, + }; + + /* +* The callback only needs to be implemented if the architecture +* supports emulated guest-mode. This BUG_ON reminds the +* programmer that this callback needs to be implemented. +*/ + BUG_ON(kvm_x86_ops-check_intercept == NULL); + + return kvm_x86_ops-check_intercept(ctxt-vcpu, info, stage); } static struct x86_emulate_ops emulate_ops = { -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 02/13] KVM: x86 emulator: add SVM intercepts
From: Avi Kivity a...@redhat.com Add intercept codes for instructions defined by SVM as interceptable. Signed-off-by: Avi Kivity a...@redhat.com Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h | 35 +++ arch/x86/kvm/emulate.c | 24 +--- 2 files changed, 48 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 4b9efb7..277f189 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -275,6 +275,41 @@ enum x86_intercept_stage { enum x86_intercept { x86_intercept_none, + x86_intercept_lmsw, + x86_intercept_smsw, + x86_intercept_lidt, + x86_intercept_sidt, + x86_intercept_lgdt, + x86_intercept_sgdt, + x86_intercept_lldt, + x86_intercept_sldt, + x86_intercept_ltr, + x86_intercept_str, + x86_intercept_rdtsc, + x86_intercept_rdpmc, + x86_intercept_pushf, + x86_intercept_popf, + x86_intercept_cpuid, + x86_intercept_rsm, + x86_intercept_iret, + x86_intercept_intn, + x86_intercept_invd, + x86_intercept_pause, + x86_intercept_hlt, + x86_intercept_invlpg, + x86_intercept_invlpga, + x86_intercept_vmrun, + x86_intercept_vmload, + x86_intercept_vmsave, + x86_intercept_vmmcall, + x86_intercept_stgi, + x86_intercept_clgi, + x86_intercept_skinit, + x86_intercept_rdtscp, + x86_intercept_icebp, + x86_intercept_wbinvd, + x86_intercept_monitor, + x86_intercept_mwait, nr_x86_intercepts }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8c6af7e..cf5f396 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2371,15 +2371,15 @@ static struct opcode group5[] = { }; static struct group_dual group7 = { { - N, N, D(ModRM | SrcMem | Priv), D(ModRM | SrcMem | Priv), - D(SrcNone | ModRM | DstMem | Mov), N, - D(SrcMem16 | ModRM | Mov | Priv), - D(SrcMem | ModRM | ByteOp | Priv | NoAccess), + N, N, DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt), + DI(SrcNone | ModRM | DstMem | Mov, smsw), N, + DI(SrcMem16 | ModRM | Mov | Priv, lmsw), + DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg), }, { D(SrcNone | ModRM | Priv | VendorSpecific), N, N, D(SrcNone | ModRM | Priv | VendorSpecific), - D(SrcNone | ModRM | DstMem | Mov), N, - D(SrcMem16 | ModRM | Mov | Priv), N, + DI(SrcNone | ModRM | DstMem | Mov, smsw), N, + DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N, } }; static struct opcode group8[] = { @@ -2454,7 +2454,7 @@ static struct opcode opcode_table[256] = { /* 0x98 - 0x9F */ D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd), I(SrcImmFAddr | No64, em_call_far), N, - D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N, + DI(ImplicitOps | Stack, pushf), DI(ImplicitOps | Stack, popf), N, N, /* 0xA0 - 0xA7 */ I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov), I2bv(DstMem | SrcAcc | Mov | MemAbs, em_mov), @@ -2477,7 +2477,8 @@ static struct opcode opcode_table[256] = { G(ByteOp, group11), G(0, group11), /* 0xC8 - 0xCF */ N, N, N, D(ImplicitOps | Stack), - D(ImplicitOps), D(SrcImmByte), D(ImplicitOps | No64), D(ImplicitOps), + D(ImplicitOps), DI(SrcImmByte, intn), + D(ImplicitOps | No64), DI(ImplicitOps, iret), /* 0xD0 - 0xD7 */ D2bv(DstMem | SrcOne | ModRM), D2bv(DstMem | ModRM), N, N, N, N, @@ -2492,7 +2493,8 @@ static struct opcode opcode_table[256] = { D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps), /* 0xF0 - 0xF7 */ N, N, N, N, - D(ImplicitOps | Priv), D(ImplicitOps), G(ByteOp, group3), G(0, group3), + DI(ImplicitOps | Priv, hlt), D(ImplicitOps), + G(ByteOp, group3), G(0, group3), /* 0xF8 - 0xFF */ D(ImplicitOps), D(ImplicitOps), D(ImplicitOps), D(ImplicitOps), D(ImplicitOps), D(ImplicitOps), G(0, group4), G(0, group5), @@ -2502,7 +2504,7 @@ static struct opcode twobyte_table[256] = { /* 0x00 - 0x0F */ N, GD(0, group7), N, N, N, D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv), N, - D(ImplicitOps | Priv), D(ImplicitOps | Priv), N, N, + DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, N, D(ImplicitOps | ModRM), N, N, /* 0x10 - 0x1F */ N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N, @@ -2512,7 +2514,7 @@ static struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, N, N, /* 0x30 - 0x3F */ - D(ImplicitOps | Priv), I(ImplicitOps, em_rdtsc), + D(ImplicitOps | Priv), II(ImplicitOps, em_rdtsc, rdtsc),
[PATCH 05/13] KVM: SVM: Add intercept check for emulated cr accesses
This patch adds all necessary intercept checks for instructions that access the crX registers. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h |3 + arch/x86/kvm/emulate.c |8 ++- arch/x86/kvm/svm.c | 80 +++- 3 files changed, 87 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 7960eeb..c1489e1 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -275,6 +275,9 @@ enum x86_intercept_stage { enum x86_intercept { x86_intercept_none, + x86_intercept_cr_read, + x86_intercept_cr_write, + x86_intercept_clts, x86_intercept_lmsw, x86_intercept_smsw, x86_intercept_lidt, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 078acc4..384cfa2 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2503,14 +2503,16 @@ static struct opcode opcode_table[256] = { static struct opcode twobyte_table[256] = { /* 0x00 - 0x0F */ N, GD(0, group7), N, N, - N, D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv), N, + N, D(ImplicitOps | VendorSpecific), DI(ImplicitOps | Priv, clts), N, DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N, N, D(ImplicitOps | ModRM), N, N, /* 0x10 - 0x1F */ N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N, /* 0x20 - 0x2F */ - D(ModRM | DstMem | Priv | Op3264), D(ModRM | DstMem | Priv | Op3264), - D(ModRM | SrcMem | Priv | Op3264), D(ModRM | SrcMem | Priv | Op3264), + DI(ModRM | DstMem | Priv | Op3264, cr_read), + D(ModRM | DstMem | Priv | Op3264), + DI(ModRM | SrcMem | Priv | Op3264, cr_write), + D(ModRM | SrcMem | Priv | Op3264), N, N, N, N, N, N, N, N, N, N, N, N, /* 0x30 - 0x3F */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index b36df64..3b6992e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3871,11 +3871,89 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) update_cr0_intercept(svm); } +#define POST_EX(exit) { .exit_code = (exit), \ + .stage = x86_icpt_post_except, \ + .valid = true } + +static struct __x86_intercept { + u32 exit_code; + enum x86_intercept_stage stage; + bool valid; +} x86_intercept_map[] = { + [x86_intercept_cr_read] = POST_EX(SVM_EXIT_READ_CR0), + [x86_intercept_cr_write]= POST_EX(SVM_EXIT_WRITE_CR0), + [x86_intercept_clts]= POST_EX(SVM_EXIT_WRITE_CR0), + [x86_intercept_lmsw]= POST_EX(SVM_EXIT_WRITE_CR0), + [x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0), +}; + +#undef POST_EX + static int svm_check_intercept(struct kvm_vcpu *vcpu, struct x86_instruction_info *info, enum x86_intercept_stage stage) { - return X86EMUL_CONTINUE; + struct vcpu_svm *svm = to_svm(vcpu); + int vmexit, ret = X86EMUL_CONTINUE; + struct __x86_intercept icpt_info; + struct vmcb *vmcb = svm-vmcb; + int reg; + + if (info-intercept = ARRAY_SIZE(x86_intercept_map)) + goto out; + + icpt_info = x86_intercept_map[info-intercept]; + + if (!icpt_info.valid || stage != icpt_info.stage) + goto out; + + reg = (info-modrm 3) 7; + + switch (icpt_info.exit_code) { + case SVM_EXIT_READ_CR0: + if (info-intercept == x86_intercept_cr_read) + icpt_info.exit_code += reg; + case SVM_EXIT_WRITE_CR0: { + unsigned long cr0, val; + u64 intercept; + + if (info-intercept == x86_intercept_cr_write) + icpt_info.exit_code += reg; + + if (icpt_info.exit_code != SVM_EXIT_WRITE_CR0) + break; + + intercept = svm-nested.intercept; + + if (!(intercept (1ULL INTERCEPT_SELECTIVE_CR0))) + break; + + cr0 = vcpu-arch.cr0 ~SVM_CR0_SELECTIVE_MASK; + val = info-src_val ~SVM_CR0_SELECTIVE_MASK; + + if (info-intercept == x86_intercept_lmsw) { + cr0 = 0xfUL; + val = 0xfUL; + } + + if (cr0 ^ val) + icpt_info.exit_code = SVM_EXIT_CR0_SEL_WRITE; + + break; + } + default: + break; + } + + vmcb-control.next_rip = info-next_rip; + vmcb-control.exit_code = icpt_info.exit_code; + vmexit = nested_svm_exit_handled(svm); + + ret = (vmexit == NESTED_EXIT_DONE) ? X86EMUL_INTERCEPTED + : X86EMUL_CONTINUE; + +out: +
[PATCH 06/13] KVM: SVM: Add intercept check for accessing dr registers
This patch adds the intercept checks for instruction accessing the debug registers. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h |2 ++ arch/x86/kvm/emulate.c |4 ++-- arch/x86/kvm/svm.c |6 ++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c1489e1..db744c9 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -280,6 +280,8 @@ enum x86_intercept { x86_intercept_clts, x86_intercept_lmsw, x86_intercept_smsw, + x86_intercept_dr_read, + x86_intercept_dr_write, x86_intercept_lidt, x86_intercept_sidt, x86_intercept_lgdt, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 384cfa2..0719954 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2510,9 +2510,9 @@ static struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N, /* 0x20 - 0x2F */ DI(ModRM | DstMem | Priv | Op3264, cr_read), - D(ModRM | DstMem | Priv | Op3264), + DI(ModRM | DstMem | Priv | Op3264, dr_read), DI(ModRM | SrcMem | Priv | Op3264, cr_write), - D(ModRM | SrcMem | Priv | Op3264), + DI(ModRM | SrcMem | Priv | Op3264, dr_write), N, N, N, N, N, N, N, N, N, N, N, N, /* 0x30 - 0x3F */ diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 3b6992e..25d7460 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3885,6 +3885,8 @@ static struct __x86_intercept { [x86_intercept_clts]= POST_EX(SVM_EXIT_WRITE_CR0), [x86_intercept_lmsw]= POST_EX(SVM_EXIT_WRITE_CR0), [x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0), + [x86_intercept_dr_read] = POST_EX(SVM_EXIT_READ_DR0), + [x86_intercept_dr_write]= POST_EX(SVM_EXIT_WRITE_DR0), }; #undef POST_EX @@ -3941,6 +3943,10 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, break; } + case SVM_EXIT_READ_DR0: + case SVM_EXIT_WRITE_DR0: + icpt_info.exit_code += reg; + break; default: break; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/13] KVM: Make the instruction emulator aware of Nested Virtualization v2
Hi, this is version 2 of the patch-set to make the KVM instruction emulator aware of intercepted instructions. Noting the differences to v1 does not make a lot of sense this this is basically a re-implementation so that almost everything changed :-) The re-write was done on the basis of Avi's patches he sent after the last discussion. With these changes the implementation in the SVM code got a lot smaller and more generic (and easier to extend). The big switch is now only necessary for handling special cases. Comments and feedback is appreciated. Regards, Joerg Diffstat: arch/x86/include/asm/kvm_emulate.h | 67 + arch/x86/include/asm/kvm_host.h| 21 +++ arch/x86/kvm/emulate.c | 128 ++ arch/x86/kvm/svm.c | 264 +--- arch/x86/kvm/x86.c | 30 5 files changed, 432 insertions(+), 78 deletions(-) Shortlog: Avi Kivity (2): KVM: x86 emulator: add framework for instruction KVM: x86 emulator: add SVM intercepts Joerg Roedel (11): KVM: X86: Don't write-back cpu-state on X86EMUL_INTERCEPTED KVM: X86: Add x86 callback for intercept check KVM: SVM: Add intercept check for emulated cr accesses KVM: SVM: Add intercept check for accessing dr registers KVM: SVM: Add intercept checks for descriptor table accesses KVM: SVM: Add intercept checks for SVM instructions KVM: SVM: Add intercept checks for remaining group7 instructions KVM: SVM: Add intercept checks for remaining twobyte instructions KVM: SVM: Add intercept checks for one-byte instructions KVM: SVM: Add checks for IO instructions KVM: SVM: Remove nested sel_cr0_write handling code -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 12/13] KVM: SVM: Add checks for IO instructions
This patch adds code to check for IOIO intercepts on instructions decoded by the KVM instruction emulator. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h |4 arch/x86/kvm/emulate.c | 10 ++ arch/x86/kvm/svm.c | 36 3 files changed, 46 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 41c0120..0b2e2de 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -317,6 +317,10 @@ enum x86_intercept { x86_intercept_mwait, x86_intercept_rdmsr, x86_intercept_wrmsr, + x86_intercept_in, + x86_intercept_ins, + x86_intercept_out, + x86_intercept_outs, nr_x86_intercepts }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 4c0939d..879ce78 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2339,6 +2339,7 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i } #define D2bv(_f) D((_f) | ByteOp), D(_f) +#define D2bvI(_f, _i) DI((_f) | ByteOp, _i), DI((_f), _i) #define I2bv(_f, _e) I((_f) | ByteOp, _e), I(_f, _e) #define D6ALU(_f) D2bv((_f) | DstMem | SrcReg | ModRM), \ @@ -2468,8 +2469,8 @@ static struct opcode opcode_table[256] = { I(DstReg | SrcMem | ModRM | Src2Imm, em_imul_3op), I(SrcImmByte | Mov | Stack, em_push), I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op), - D2bv(DstDI | Mov | String), /* insb, insw/insd */ - D2bv(SrcSI | ImplicitOps | String), /* outsb, outsw/outsd */ + D2bvI(DstDI | Mov | String, ins), /* insb, insw/insd */ + D2bvI(SrcSI | ImplicitOps | String, outs), /* outsb, outsw/outsd */ /* 0x70 - 0x7F */ X16(D(SrcImmByte)), /* 0x80 - 0x87 */ @@ -2520,11 +2521,11 @@ static struct opcode opcode_table[256] = { N, N, N, N, N, N, N, N, /* 0xE0 - 0xE7 */ X4(D(SrcImmByte)), - D2bv(SrcImmUByte | DstAcc), D2bv(SrcAcc | DstImmUByte), + D2bvI(SrcImmUByte | DstAcc, in), D2bvI(SrcAcc | DstImmUByte, out), /* 0xE8 - 0xEF */ D(SrcImm | Stack), D(SrcImm | ImplicitOps), D(SrcImmFAddr | No64), D(SrcImmByte | ImplicitOps), - D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps), + D2bvI(SrcNone | DstAcc, in), D2bvI(SrcAcc | ImplicitOps, out), /* 0xF0 - 0xF7 */ N, DI(ImplicitOps, icebp), N, N, DI(ImplicitOps | Priv, hlt), D(ImplicitOps), @@ -2609,6 +2610,7 @@ static struct opcode twobyte_table[256] = { #undef EXT #undef D2bv +#undef D2bvI #undef I2bv #undef D6ALU diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 847a3f9..1672e3c 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3929,6 +3929,10 @@ static struct __x86_intercept { [x86_intercept_iret]= PRE_EX(SVM_EXIT_IRET), [x86_intercept_icebp] = PRE_EX(SVM_EXIT_ICEBP), [x86_intercept_hlt] = POST_EX(SVM_EXIT_HLT), + [x86_intercept_in] = POST_EX(SVM_EXIT_IOIO), + [x86_intercept_ins] = POST_EX(SVM_EXIT_IOIO), + [x86_intercept_out] = POST_EX(SVM_EXIT_IOIO), + [x86_intercept_outs]= POST_EX(SVM_EXIT_IOIO), }; #undef PRE_EX @@ -4004,6 +4008,38 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, */ if (info-rep_prefix != REPE_PREFIX) goto out; + case SVM_EXIT_IOIO: { + u64 exit_info; + u32 bytes; + + exit_info = (vcpu-arch.regs[VCPU_REGS_RDX] 0x) 16; + + if (info-intercept == x86_intercept_in || + info-intercept == x86_intercept_ins) { + exit_info |= SVM_IOIO_TYPE_MASK; + bytes = info-src_bytes; + } else { + bytes = info-dst_bytes; + } + + if (info-intercept == x86_intercept_outs || + info-intercept == x86_intercept_ins) + exit_info |= SVM_IOIO_STR_MASK; + + if (info-rep_prefix) + exit_info |= SVM_IOIO_REP_MASK; + + bytes = min(bytes, 4u); + + exit_info |= bytes SVM_IOIO_SIZE_SHIFT; + + exit_info |= (u32)info-ad_bytes (SVM_IOIO_ASIZE_SHIFT - 1); + + vmcb-control.exit_info_1 = exit_info; + vmcb-control.exit_info_2 = info-next_rip; + + break; + } default: break; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 10/13] KVM: SVM: Add intercept checks for remaining twobyte instructions
This patch adds intercepts checks for the remaining twobyte instructions to the KVM instruction emulator. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h |2 ++ arch/x86/kvm/emulate.c |8 arch/x86/kvm/svm.c | 19 +++ 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index db744c9..41c0120 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -315,6 +315,8 @@ enum x86_intercept { x86_intercept_wbinvd, x86_intercept_monitor, x86_intercept_mwait, + x86_intercept_rdmsr, + x86_intercept_wrmsr, nr_x86_intercepts }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0aaba1e..8947643 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2550,8 +2550,8 @@ static struct opcode twobyte_table[256] = { N, N, N, N, N, N, N, N, N, N, N, N, /* 0x30 - 0x3F */ - D(ImplicitOps | Priv), II(ImplicitOps, em_rdtsc, rdtsc), - D(ImplicitOps | Priv), N, + DI(ImplicitOps | Priv, wrmsr), II(ImplicitOps, em_rdtsc, rdtsc), + DI(ImplicitOps | Priv, rdmsr), DI(ImplicitOps | Priv, rdpmc), D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv | VendorSpecific), N, N, N, N, N, N, N, N, N, N, @@ -2569,12 +2569,12 @@ static struct opcode twobyte_table[256] = { X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)), /* 0xA0 - 0xA7 */ D(ImplicitOps | Stack), D(ImplicitOps | Stack), - N, D(DstMem | SrcReg | ModRM | BitOp), + DI(ImplicitOps, cpuid), D(DstMem | SrcReg | ModRM | BitOp), D(DstMem | SrcReg | Src2ImmByte | ModRM), D(DstMem | SrcReg | Src2CL | ModRM), N, N, /* 0xA8 - 0xAF */ D(ImplicitOps | Stack), D(ImplicitOps | Stack), - N, D(DstMem | SrcReg | ModRM | BitOp | Lock), + DI(ImplicitOps, rsm), D(DstMem | SrcReg | ModRM | BitOp | Lock), D(DstMem | SrcReg | Src2ImmByte | ModRM), D(DstMem | SrcReg | Src2CL | ModRM), D(ModRM), I(DstReg | SrcMem | ModRM, em_imul), diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 958697e..c2e90bb 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3871,6 +3871,9 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu) update_cr0_intercept(svm); } +#define PRE_EX(exit) { .exit_code = (exit), \ + .stage = x86_icpt_pre_except, \ + .valid = true } #define POST_EX(exit) { .exit_code = (exit), \ .stage = x86_icpt_post_except, \ .valid = true } @@ -3910,8 +3913,18 @@ static struct __x86_intercept { [x86_intercept_rdtscp] = POST_EX(SVM_EXIT_RDTSCP), [x86_intercept_monitor] = POST_MEM(SVM_EXIT_MONITOR), [x86_intercept_mwait] = POST_EX(SVM_EXIT_MWAIT), + [x86_intercept_invlpg] = POST_EX(SVM_EXIT_INVLPG), + [x86_intercept_invd]= POST_EX(SVM_EXIT_INVD), + [x86_intercept_wbinvd] = POST_EX(SVM_EXIT_WBINVD), + [x86_intercept_wrmsr] = POST_EX(SVM_EXIT_MSR), + [x86_intercept_rdtsc] = POST_EX(SVM_EXIT_RDTSC), + [x86_intercept_rdmsr] = POST_EX(SVM_EXIT_MSR), + [x86_intercept_rdpmc] = POST_EX(SVM_EXIT_RDPMC), + [x86_intercept_cpuid] = PRE_EX(SVM_EXIT_CPUID), + [x86_intercept_rsm] = PRE_EX(SVM_EXIT_RSM), }; +#undef PRE_EX #undef POST_EX #undef POST_MEM @@ -3971,6 +3984,12 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, case SVM_EXIT_WRITE_DR0: icpt_info.exit_code += reg; break; + case SVM_EXIT_MSR: + if (info-intercept == x86_intercept_wrmsr) + vmcb-control.exit_info_1 = 1; + else + vmcb-control.exit_info_1 = 0; + break; default: break; } -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 03/13] KVM: X86: Don't write-back cpu-state on X86EMUL_INTERCEPTED
This patch prevents the changed CPU state to be written back when the emulator detected that the instruction was intercepted by the guest. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm/emulate.c |3 +++ arch/x86/kvm/x86.c |3 +++ 3 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 277f189..7960eeb 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -325,6 +325,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len); #define EMULATION_FAILED -1 #define EMULATION_OK 0 #define EMULATION_RESTART 1 +#define EMULATION_INTERCEPTED 2 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt); int emulator_task_switch(struct x86_emulate_ctxt *ctxt, u16 tss_selector, int reason, diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index cf5f396..078acc4 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -3455,6 +3455,9 @@ writeback: done: if (rc == X86EMUL_PROPAGATE_FAULT) ctxt-have_exception = true; + if (rc == X86EMUL_INTERCEPTED) + return EMULATION_INTERCEPTED; + return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK; twobyte_insn: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2338309..90a41aa 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4458,6 +4458,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, restart: r = x86_emulate_insn(vcpu-arch.emulate_ctxt); + if (r == EMULATION_INTERCEPTED) + return EMULATE_DONE; + if (r == EMULATION_FAILED) { if (reexecute_instruction(vcpu, cr2)) return EMULATE_DONE; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 08/13] KVM: SVM: Add intercept checks for SVM instructions
This patch adds the necessary code changes in the instruction emulator and the extensions to svm.c to implement intercept checks for the svm instructions. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/emulate.c | 23 ++- arch/x86/kvm/svm.c |8 2 files changed, 30 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 505348f..dc53806 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -75,6 +75,8 @@ #define Stack (113) /* Stack instruction (push/pop) */ #define Group (114) /* Bits 3:5 of modrm byte extend opcode */ #define GroupDual (115) /* Alternate decoding of mod == 3 */ +#define RMExt (116) /* Opcode extension in ModRM r/m if mod == 3 */ + /* Misc flags */ #define VendorSpecific (122) /* Vendor specific instruction */ #define NoAccess(123) /* Don't access memory (lea/invlpg/verr etc) */ @@ -2329,6 +2331,7 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) #define D(_y) { .flags = (_y) } #define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i } #define ND(0) +#define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) } #define G(_f, _g) { .flags = ((_f) | Group), .u.group = (_g) } #define GD(_f, _g) { .flags = ((_f) | Group | GroupDual), .u.gdual = (_g) } #define I(_f, _e) { .flags = (_f), .u.execute = (_e) } @@ -2343,6 +2346,17 @@ static int em_mov(struct x86_emulate_ctxt *ctxt) D2bv(((_f) ~Lock) | DstAcc | SrcImm) +static struct opcode group7_rm3[] = { + DI(SrcNone | ModRM | Priv, vmrun), + DI(SrcNone | ModRM | Priv, vmmcall), + DI(SrcNone | ModRM | Priv, vmload), + DI(SrcNone | ModRM | Priv, vmsave), + DI(SrcNone | ModRM | Priv, stgi), + DI(SrcNone | ModRM | Priv, clgi), + DI(SrcNone | ModRM | Priv, skinit), + DI(SrcNone | ModRM | Priv, invlpga), +}; + static struct opcode group1[] = { X7(D(Lock)), N }; @@ -2386,7 +2400,7 @@ static struct group_dual group7 = { { DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg), }, { D(SrcNone | ModRM | Priv | VendorSpecific), N, - N, D(SrcNone | ModRM | Priv | VendorSpecific), + N, EXT(0, group7_rm3), DI(SrcNone | ModRM | DstMem | Mov, smsw), N, DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N, } }; @@ -2581,6 +2595,7 @@ static struct opcode twobyte_table[256] = { #undef G #undef GD #undef I +#undef EXT #undef D2bv #undef I2bv @@ -2758,6 +2773,12 @@ done_prefixes: opcode = g_mod3[goffset]; else opcode = g_mod012[goffset]; + + if (opcode.flags RMExt) { + goffset = c-modrm 7; + opcode = opcode.u.group[goffset]; + } + c-d |= opcode.flags; } diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index faa959e..dded390 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3899,6 +3899,14 @@ static struct __x86_intercept { [x86_intercept_sidt]= POST_MEM(SVM_EXIT_IDTR_READ), [x86_intercept_lgdt]= POST_MEM(SVM_EXIT_GDTR_WRITE), [x86_intercept_lidt]= POST_MEM(SVM_EXIT_IDTR_WRITE), + [x86_intercept_vmrun] = POST_EX(SVM_EXIT_VMRUN), + [x86_intercept_vmmcall] = POST_EX(SVM_EXIT_VMMCALL), + [x86_intercept_vmload] = POST_EX(SVM_EXIT_VMLOAD), + [x86_intercept_vmsave] = POST_EX(SVM_EXIT_VMSAVE), + [x86_intercept_stgi]= POST_EX(SVM_EXIT_STGI), + [x86_intercept_clgi]= POST_EX(SVM_EXIT_CLGI), + [x86_intercept_skinit] = POST_EX(SVM_EXIT_SKINIT), + [x86_intercept_invlpga] = POST_EX(SVM_EXIT_INVLPGA), }; #undef POST_EX -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 13/13] KVM: SVM: Remove nested sel_cr0_write handling code
This patch removes all the old code which handled the nested selective cr0 write intercepts. This code was only in place as a work-around until the instruction emulator is capable of doing the same. This is the case with this patch-set and so the code can be removed. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c | 78 +-- 1 files changed, 26 insertions(+), 52 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 1672e3c..37c0060 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -93,14 +93,6 @@ struct nested_state { /* A VMEXIT is required but not yet emulated */ bool exit_required; - /* -* If we vmexit during an instruction emulation we need this to restore -* the l1 guest rip after the emulation -*/ - unsigned long vmexit_rip; - unsigned long vmexit_rsp; - unsigned long vmexit_rax; - /* cache for intercepts of the guest */ u32 intercept_cr; u32 intercept_dr; @@ -1365,31 +1357,6 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { struct vcpu_svm *svm = to_svm(vcpu); - if (is_guest_mode(vcpu)) { - /* -* We are here because we run in nested mode, the host kvm -* intercepts cr0 writes but the l1 hypervisor does not. -* But the L1 hypervisor may intercept selective cr0 writes. -* This needs to be checked here. -*/ - unsigned long old, new; - - /* Remove bits that would trigger a real cr0 write intercept */ - old = vcpu-arch.cr0 SVM_CR0_SELECTIVE_MASK; - new = cr0 SVM_CR0_SELECTIVE_MASK; - - if (old == new) { - /* cr0 write with ts and mp unchanged */ - svm-vmcb-control.exit_code = SVM_EXIT_CR0_SEL_WRITE; - if (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE) { - svm-nested.vmexit_rip = kvm_rip_read(vcpu); - svm-nested.vmexit_rsp = kvm_register_read(vcpu, VCPU_REGS_RSP); - svm-nested.vmexit_rax = kvm_register_read(vcpu, VCPU_REGS_RAX); - return; - } - } - } - #ifdef CONFIG_X86_64 if (vcpu-arch.efer EFER_LME) { if (!is_paging(vcpu) (cr0 X86_CR0_PG)) { @@ -2676,6 +2643,29 @@ static int emulate_on_interception(struct vcpu_svm *svm) return emulate_instruction(svm-vcpu, 0) == EMULATE_DONE; } +bool check_selective_cr0_intercepted(struct vcpu_svm *svm, unsigned long val) +{ + unsigned long cr0 = svm-vcpu.arch.cr0; + bool ret = false; + u64 intercept; + + intercept = svm-nested.intercept; + + if (!is_guest_mode(svm-vcpu) || + (!(intercept (1ULL INTERCEPT_SELECTIVE_CR0 + return false; + + cr0 = ~SVM_CR0_SELECTIVE_MASK; + val = ~SVM_CR0_SELECTIVE_MASK; + + if (cr0 ^ val) { + svm-vmcb-control.exit_code = SVM_EXIT_CR0_SEL_WRITE; + ret = (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE); + } + + return ret; +} + #define CR_VALID (1ULL 63) static int cr_interception(struct vcpu_svm *svm) @@ -2699,7 +2689,8 @@ static int cr_interception(struct vcpu_svm *svm) val = kvm_register_read(svm-vcpu, reg); switch (cr) { case 0: - err = kvm_set_cr0(svm-vcpu, val); + if (!check_selective_cr0_intercepted(svm, val)) + err = kvm_set_cr0(svm-vcpu, val); break; case 3: err = kvm_set_cr3(svm-vcpu, val); @@ -2744,23 +2735,6 @@ static int cr_interception(struct vcpu_svm *svm) return 1; } -static int cr0_write_interception(struct vcpu_svm *svm) -{ - struct kvm_vcpu *vcpu = svm-vcpu; - int r; - - r = cr_interception(svm); - - if (svm-nested.vmexit_rip) { - kvm_register_write(vcpu, VCPU_REGS_RIP, svm-nested.vmexit_rip); - kvm_register_write(vcpu, VCPU_REGS_RSP, svm-nested.vmexit_rsp); - kvm_register_write(vcpu, VCPU_REGS_RAX, svm-nested.vmexit_rax); - svm-nested.vmexit_rip = 0; - } - - return r; -} - static int dr_interception(struct vcpu_svm *svm) { int reg, dr; @@ -3048,7 +3022,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = { [SVM_EXIT_READ_CR4] = cr_interception, [SVM_EXIT_READ_CR8] = cr_interception, [SVM_EXIT_CR0_SEL_WRITE]= emulate_on_interception, - [SVM_EXIT_WRITE_CR0]= cr0_write_interception, + [SVM_EXIT_WRITE_CR0]=
2.6.38.1 general protection fault
I got this on a 2.6.38.1 system which (I think) had some problem accessing guest image on a btrfs filesystem. general protection fault: [#1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 0 Modules linked in: ipt_MASQUERADE vhost_net kvm_intel kvm iptable_filter xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp btrfs zlib_deflate crc32c libcrc32c coretemp f71882fg snd_pcm snd_timer snd soundcore i2c_i801 snd_page_alloc tpm_tis tpm tpm_bios pcspkr i7core_edac edac_core r8169 mii raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 ahci libahci sata_nv sata_sil sata_via 3w_9xxx 3w_ [last unloaded: scsi_wait_scan] Pid: 10199, comm: kvm Not tainted 2.6.38.1 #1 MSI MS-7522/MSI X58 Pro-E (MS-7522) RIP: 0010:[a02cae20] [a02cae20] kvm_unmap_rmapp+0x20/0x70 [kvm] RSP: 0018:880508ee9bf0 EFLAGS: 00010202 RAX: 8805d6b087f8 RBX: 8805b7b1 RCX: 0050 RDX: RSI: 8805d6b087f8 RDI: 8805b7b1 RBP: 880508ee9c10 R08: 8801061d4000 R09: c9001f19aff0 R10: 0030 R11: R12: R13: c9001f19aff8 R14: 0060 R15: 8801061d4000 FS: 7f7ca25d6730() GS:8800bf40() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 00462b10 CR3: 0003ac47f000 CR4: 26e0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process kvm (pid: 10199, threadinfo 880508ee8000, task 88001b5a5b00) Stack: ffcf 000220ff 0001 8801061d4050 880508ee9c80 a02c8a54 0030 a02cae00 7f7c80a2b000 8805b7b1 0001 Call Trace: [a02c8a54] kvm_handle_hva+0xb4/0x170 [kvm] [a02cae00] ? kvm_unmap_rmapp+0x0/0x70 [kvm] [a02c8b27] kvm_unmap_hva+0x17/0x20 [kvm] [a02b1e72] kvm_mmu_notifier_invalidate_range_start+0x62/0xb0 [kvm] [8113ea11] __mmu_notifier_invalidate_range_start+0x51/0x70 [8111e2c1] copy_page_range+0x3b1/0x460 [812c5628] ? rb_insert_color+0x98/0x140 [81060cdc] dup_mm+0x2fc/0x500 [810617fe] copy_process+0x8be/0x11b0 [81062165] do_fork+0x75/0x350 [81177bcd] ? mntput+0x1d/0x40 [8115b095] ? fput+0x1e5/0x270 [815aa7f5] ? _raw_spin_lock_irq+0x15/0x20 [81075141] ? sigprocmask+0x91/0x110 [81014ab8] sys_clone+0x28/0x30 [8100c3e3] stub_clone+0x13/0x20 [8100c0c2] ? system_call_fastpath+0x16/0x1b Code: 49 89 01 eb 91 66 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f 1f 44 00 00 45 31 e4 48 89 fb 49 89 f5 eb 1d 0f 1f 00 f6 06 01 74 38 48 8b 15 a4 66 02 00 48 89 df 41 bc 01 00 00 00 RIP [a02cae20] kvm_unmap_rmapp+0x20/0x70 [kvm] RSP 880508ee9bf0 ---[ end trace 85201a339b7635fc ]--- -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop
On Thu, 24 Mar 2011 10:46:49 -0700, Shirley Ma mashi...@us.ibm.com wrote: On Thu, 2011-03-24 at 16:28 +0200, Michael S. Tsirkin wrote: Several other things I am looking at, wellcome cooperation: 1. It's probably a good idea to update avail index immediately instead of upon kick: for RX this might help parallelism with the host. Is that possible to use the same idea for publishing last used idx to publish avail idx? Then we can save guest iowrite/exits. Yes, it should be symmetrical. Test independently of course, but the same logic applies. Thanks! Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop
On Thu, 24 Mar 2011 16:28:22 +0200, Michael S. Tsirkin m...@redhat.com wrote: On Thu, Mar 24, 2011 at 11:00:53AM +1030, Rusty Russell wrote: With simply removing the notify here, it does help the case when TX overrun hits too often, for example for 1K message size, the single TCP_STREAM performance improved from 2.xGb/s to 4.xGb/s. OK, we'll be getting rid of the kick on full, so please delete that on all benchmarks. Now, does the capacity check before add_buf() still win anything? I can't see how unless we have some weird bug. Once we've sorted that out, we should look at the more radical change of publishing last_used and using that to intuit whether interrupts should be sent. If we're not careful with ordering and barriers that could introduce more bugs. Right. I am working on this, and trying to be careful. One thing I'm in doubt about: sometimes we just want to disable interrupts. Should still use flags in that case? I thought that if we make the published index 0 to vq-num - 1, then a special value in the index field could disable interrupts completely. We could even reuse the space for the flags field to stick the index in. Too complex? Making the index free-running avoids the full or empty confusion, plus offers and extra debugging insight. I think that if they really want to disable interrrupts, the flag should still work, and when the client accepts the publish last_idx feature they are accepting that interrupts may be omitted if they haven't updated last_idx yet. Anything else on the optimization agenda I've missed? Thanks, Rusty. Several other things I am looking at, wellcome cooperation: 1. It's probably a good idea to update avail index immediately instead of upon kick: for RX this might help parallelism with the host. Yes, once we've done everything else, we should measure this. It makes sense. 2. Adding an API to add a single buffer instead of s/g, seems to help a bit. This goes last, since it's kind of an ugly hack, but all internal to Linux if we decide it's a win. 3. For TX sometimes we free a single buffer, sometimes a ton of them, which might make the transmit latency vary. It's probably a good idea to limit this, maybe free the minimal number possible to keep the device going without stops, maybe free up to MAX_SKB_FRAGS. This kind of heuristic is going to be quite variable depending on circumstance, I think, so it's a lot of work to make sure we get it right. 4. If the ring is full, we now notify right after the first entry is consumed. For TX this is suboptimal, we should try delaying the interrupt on host. Lguest already does that: only sends an interrupt when it's run out of things to do. It does update the used ring, however, as it processes them. This seems sensible to me, but needs to be measured separately as well. More ideas, would be nice if someone can try them out: 1. We are allocating/freeing buffers for indirect descriptors. Use some kind of pool instead? And we could preformat part of the descriptor. We need some poolish mechanism for virtio_blk too; perhaps an allocation callback which both can use (virtio_blk to alloc from a pool, virtio_net to recycle?). Along similar lines to preformatting, we could actually try to prepend the skb_vnet_hdr to the vnet data, and use a single descriptor for the hdr and the first part of the packet. Though IIRC, qemu's virtio barfs if the first descriptor isn't just the hdr (barf...). 2. I didn't have time to work on virtio2 ideas presented at the kvm forum yet, any takers? I didn't even attend. But I think that virtio is moribund for the moment; there wasn't enough demand and it's clear that there are optimization unexplored in virtio1. Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM, iSCSI and High Availability
Hi. Over the last several days I've been reading, asking questions, searching the Internet to find a viable HA stack for Ubuntu with KVM virtualization and shared iSCSI storage. And I'm nearly as confused as when I started. Basically I'm trying to build a KVM enviroment with an iSCSI SAN and I'm not quite sure what approach to use for storing the virtual guests. What I understand to get max speed I should install directly to iSCSI exported raw devices instead of backing disks. I'm not sure creating many small LUNs, one for each of the guests is a good idea. Would it be better to create just one big LUN and then use LVM to devide it and assign one chunk for each of the guests? In the same setup I would also like to implement some kind of automatic failover so if one of the KVM hosts is down I could automatically move guests over to the other one. Or just perform live migration and move one of the guest over to a different host with spare capacity. What would be the best approach to implement a solution like that? Thanks in advance. -- Marcin M. Jessa -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM internal error. Suberror: 1 with ancient 2.4 kernel as guest
Jiri Avi: I attached the patched I did for movq and movdqa emulation. Please note: (1) I only implemented those two. Other instructions like addq may be following same way. (2) I use same guest_fx_image to hold value and fxsave/fxrstor to copy to/from registers. This is not very efficient I admit. Any suggestions let me know. Thanks! Wei Xu On 3/21/11 2:23 PM, Wei Xu we...@cisco.com wrote: Avi and Jiri: I implemented emulation of movq(64bit) and movdqa(128 bit). If you guys still need it let me know and I can post somewhere... Wei Xu On 8/31/10 9:30 AM, Avi Kivity a...@redhat.com wrote: On 08/31/2010 06:49 PM, Avi Kivity wrote: On 08/31/2010 05:32 PM, Jiri Kosina wrote: (qemu) x/5i $eip 0xc027a841: movq (%esi),%mm0 0xc027a844: movq 0x8(%esi),%mm1 0xc027a848: movq 0x10(%esi),%mm2 0xc027a84c: movq 0x18(%esi),%mm3 0xc027a850: movq %mm0,(%edx) === Is there any issue with emulating MMX? Yes. MMX is not currently emulated. If there's a command line option to disable the use of MMX you can try it, otherwise wait for it to be implemented (or implement it yourself). I'll try to do it for 2.6.37, but can't promise anything. You can also run qemu with -cpu qemu32,-mmx. That will expose a cpu without mmx support; hopefully the guest kernel will see that and avoid mmx instructions. mmx-kvm.patch Description: Binary data mmx-qemu.patch Description: Binary data
[GIT PULL] More power management updates for 2.6.39
Hi Linus, Please pull additional power management updates for 2.6.39 from: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git syscore They make subsystems that x86 depends on use struct syscore_ops objects instead of sysdevs for core power management, which reduces the code size and kernel memory footprint a bit and sipmlifies the core suspend/resume and shutdown code paths. arch/x86/Kconfig |1 + arch/x86/kernel/amd_iommu_init.c | 26 ++ arch/x86/kernel/apic/apic.c | 33 - arch/x86/kernel/apic/io_apic.c | 97 ++ arch/x86/kernel/cpu/mcheck/mce.c | 21 + arch/x86/kernel/cpu/mtrr/main.c | 10 ++-- arch/x86/kernel/i8237.c | 30 +++- arch/x86/kernel/i8259.c | 33 - arch/x86/kernel/microcode_core.c | 34 ++ arch/x86/kernel/pci-gart_64.c| 32 +++-- arch/x86/oprofile/nmi_int.c | 44 + drivers/base/Kconfig |7 +++ drivers/base/sys.c |3 +- drivers/cpufreq/cpufreq.c| 66 ++ drivers/pci/intel-iommu.c| 38 --- include/linux/device.h |4 ++ include/linux/pm.h | 10 +++- include/linux/sysdev.h |7 ++- kernel/time/timekeeping.c| 27 +++--- virt/kvm/kvm_main.c | 34 +++-- 20 files changed, 206 insertions(+), 351 deletions(-) --- Rafael J. Wysocki (6): x86: Use syscore_ops instead of sysdev classes and sysdevs timekeeping: Use syscore_ops instead of sysdev class and sysdev PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev KVM: Use syscore_ops instead of sysdev class and sysdev cpufreq: Use syscore_ops for boot CPU suspend/resume (v2) Introduce ARCH_NO_SYSDEV_OPS config option (v2) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html