Re: qemu-kvm crash with

2011-03-25 Thread Stefan Hajnoczi
On Thu, Mar 24, 2011 at 1:38 PM, Conor Murphy
conor_murphy_v...@hotmail.com wrote:
 #4  _int_free (av=value optimized out, p=0x7fa24c0009f0, have_lock=0) at
 malloc.c:4795
 #5  0x004a18fe in qemu_vfree (ptr=0x7fa24c000a00) at oslib-posix.c:76
 #6  0x0045af3d in handle_aiocb_rw (aiocb=0x7fa2dc034cd0) at
 posix-aio-compat.c:301

I don't see a way for a double-free to occur so I think something has
overwritten the memory preceeding the allocated buffer.

In gdb you could inspect the aiocb structure to look at its aio_iov[],
aio_niov, and aio_nbytes fields.  They might be invalid or corrupted
somehow.

You could also dump out the memory before 0x7fa24c000a00, specifically
0x7fa24c0009f0, to see if you notice any pattern or printable
characters that give a clue as to what has corrupted the memory here.

Are you running qemu-kvm.git/master?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] fix regression caused by e48672fa25e879f7ae21785c7efd187738139593

2011-03-25 Thread Zachary Amsden

On 03/09/2011 05:36 PM, Nikola Ciprich wrote:

commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved 
kvm_request_guest_time_update(vcpu),
breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock 
update function
to proper place.

Signed-off-by: Nikola Ciprichnikola.cipr...@linuxbox.cz
---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4c27144..ba3f76f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2101,8 +2101,8 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (check_tsc_unstable()) {
kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
vcpu-arch.tsc_catchup = 1;
-   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
}
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
if (vcpu-cpu != cpu)
kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;

   



So something bothers me still about this bug.  What you did correctly 
restores the old behavior - but it shouldn't be fixing a bug.


The only reason you need to schedule an update for the KVM clock area is 
if a new VCPU has been created, you have an unstable TSC.. or something 
changes the VM's kvmclock offset.


So this change could in fact be hiding an underlying bug - either an 
unstable TSC is not being properly reported, the KVM clock offset is 
being changed, we are missing a KVM clock update for secondary VCPUs - 
or something else we don't yet understand is going on.


Nikola, can you try the patch below, which reverts your change and 
attempts to fix other possible sources of the problem, and see if it 
still reproduces?


Thanks,

Zach
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58f517b..42618fb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2127,8 +2127,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
if (check_tsc_unstable()) {
kvm_x86_ops-adjust_tsc_offset(vcpu, -tsc_delta);
vcpu-arch.tsc_catchup = 1;
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
}
-   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+   if (vcpu-cpu == -1)
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
if (vcpu-cpu != cpu)
kvm_migrate_timers(vcpu);
vcpu-cpu = cpu;
@@ -3534,6 +3536,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
struct kvm_clock_data user_ns;
u64 now_ns;
s64 delta;
+   struct kvm_vcpu *vcpu;
+   int i;
 
r = -EFAULT;
if (copy_from_user(user_ns, argp, sizeof(user_ns)))
@@ -3549,6 +3553,8 @@ long kvm_arch_vm_ioctl(struct file *filp,
delta = user_ns.clock - now_ns;
local_irq_enable();
kvm-arch.kvmclock_offset = delta;
+   kvm_for_each_vcpu(i, vcpu, kvm)
+   kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
break;
}
case KVM_GET_CLOCK: {


[PATCH 1/6] KVM: SVM: Implement infrastructure for TSC_RATE_MSR

2011-03-25 Thread Joerg Roedel
This patch enhances the kvm_amd module with functions to
support the TSC_RATE_MSR which can be used to set a given
tsc frequency for the guest vcpu.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/msr-index.h |1 +
 arch/x86/kvm/svm.c   |   54 +-
 2 files changed, 54 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index fd5a1f3..a7b3e40 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -114,6 +114,7 @@
complete list. */
 
 #define MSR_AMD64_PATCH_LEVEL  0x008b
+#define MSR_AMD64_TSC_RATIO0xc104
 #define MSR_AMD64_NB_CFG   0xc001001f
 #define MSR_AMD64_PATCH_LOADER 0xc0010020
 #define MSR_AMD64_OSVW_ID_LENGTH   0xc0010140
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a19322..2ce734c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -63,6 +63,8 @@ MODULE_LICENSE(GPL);
 
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
+#define TSC_RATIO_RSVD  0xff00ULL
+
 static bool erratum_383_found __read_mostly;
 
 static const u32 host_save_user_msrs[] = {
@@ -144,8 +146,13 @@ struct vcpu_svm {
unsigned int3_injected;
unsigned long int3_rip;
u32 apf_reason;
+
+   u64  tsc_ratio;
 };
 
+static DEFINE_PER_CPU(u64, current_tsc_ratio);
+#define TSC_RATIO_DEFAULT  0x01ULL
+
 #define MSR_INVALID0xU
 
 static struct svm_direct_access_msrs {
@@ -569,6 +576,10 @@ static int has_svm(void)
 
 static void svm_hardware_disable(void *garbage)
 {
+   /* Make sure we clean up behind us */
+   if (static_cpu_has(X86_FEATURE_TSCRATEMSR))
+   wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
+
cpu_svm_disable();
 }
 
@@ -610,6 +621,11 @@ static int svm_hardware_enable(void *garbage)
 
wrmsrl(MSR_VM_HSAVE_PA, page_to_pfn(sd-save_area)  PAGE_SHIFT);
 
+   if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+   wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT);
+   __get_cpu_var(current_tsc_ratio) = TSC_RATIO_DEFAULT;
+   }
+
svm_init_erratum_383();
 
return 0;
@@ -854,6 +870,32 @@ static void init_sys_seg(struct vmcb_seg *seg, uint32_t 
type)
seg-base = 0;
 }
 
+static u64 __scale_tsc(u64 ratio, u64 tsc)
+{
+   u64 mult, frac, _tsc;
+
+   mult  = ratio  32;
+   frac  = ratio  ((1ULL  32) - 1);
+
+   _tsc  = tsc;
+   _tsc *= mult;
+   _tsc += (tsc  32) * frac;
+   _tsc += ((tsc  ((1ULL  32) - 1)) * frac)  32;
+
+   return _tsc;
+}
+
+static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   u64 _tsc = tsc;
+
+   if (svm-tsc_ratio != TSC_RATIO_DEFAULT)
+   _tsc = __scale_tsc(svm-tsc_ratio, tsc);
+
+   return _tsc;
+}
+
 static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -1048,6 +1090,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
goto out;
}
 
+   svm-tsc_ratio = TSC_RATIO_DEFAULT;
+
err = kvm_vcpu_init(svm-vcpu, kvm, id);
if (err)
goto free_svm;
@@ -1141,6 +1185,12 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 
for (i = 0; i  NR_HOST_SAVE_USER_MSRS; i++)
rdmsrl(host_save_user_msrs[i], svm-host_user_msrs[i]);
+
+   if (static_cpu_has(X86_FEATURE_TSCRATEMSR) 
+   svm-tsc_ratio != __get_cpu_var(current_tsc_ratio)) {
+   __get_cpu_var(current_tsc_ratio) = svm-tsc_ratio;
+   wrmsrl(MSR_AMD64_TSC_RATIO, svm-tsc_ratio);
+   }
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -2813,7 +2863,9 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned 
ecx, u64 *data)
case MSR_IA32_TSC: {
struct vmcb *vmcb = get_host_vmcb(svm);
 
-   *data = vmcb-control.tsc_offset + native_read_tsc();
+   *data = vmcb-control.tsc_offset +
+   svm_scale_tsc(vcpu, native_read_tsc());
+
break;
}
case MSR_STAR:
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/6] KVM: X86: Delegate tsc-offset calculation to architecture code

2011-03-25 Thread Joerg Roedel
With TSC scaling in SVM the tsc-offset needs to be
calculated differently. This patch propagates this
calculation into the architecture specific modules so that
this complexity can be handled there.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_host.h |2 ++
 arch/x86/kvm/svm.c  |   10 ++
 arch/x86/kvm/vmx.c  |6 ++
 arch/x86/kvm/x86.c  |   10 +-
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9958dd8..7f48528 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -591,6 +591,8 @@ struct kvm_x86_ops {
void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz);
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
+   u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
+
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
const struct trace_print_flags *exit_reasons_str;
 };
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f6d66c2..38a4bcc 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -954,6 +954,15 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment)
mark_dirty(svm-vmcb, VMCB_INTERCEPTS);
 }
 
+static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+   u64 tsc;
+
+   tsc = svm_scale_tsc(vcpu, native_read_tsc());
+
+   return target_tsc - tsc;
+}
+
 static void init_vmcb(struct vcpu_svm *svm)
 {
struct vmcb_control_area *control = svm-vmcb-control;
@@ -4039,6 +4048,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_tsc_khz = svm_set_tsc_khz,
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
+   .compute_tsc_offset = svm_compute_tsc_offset,
 
.set_tdp_cr3 = set_tdp_cr3,
 };
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e5dfc6..c4f077a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1184,6 +1184,11 @@ static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, 
s64 adjustment)
vmcs_write64(TSC_OFFSET, offset + adjustment);
 }
 
+static u64 vmx_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+{
+   return target_tsc - native_read_tsc();
+}
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -4509,6 +4514,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_tsc_khz = vmx_set_tsc_khz,
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
+   .compute_tsc_offset = vmx_compute_tsc_offset,
 
.set_tdp_cr3 = vmx_set_cr3,
 };
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 47dd6ed..2f0b552 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -990,7 +990,7 @@ static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu)
return __this_cpu_read(cpu_tsc_khz);
 }
 
-static inline u64 nsec_to_cycles(u64 nsec)
+static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)
 {
u64 ret;
 
@@ -998,7 +998,7 @@ static inline u64 nsec_to_cycles(u64 nsec)
if (kvm_tsc_changes_freq())
printk_once(KERN_WARNING
 kvm: unreliable cycle conversion on adjustable rate TSC\n);
-   ret = nsec * __this_cpu_read(cpu_tsc_khz);
+   ret = nsec * vcpu_tsc_khz(vcpu);
do_div(ret, USEC_PER_SEC);
return ret;
 }
@@ -1028,7 +1028,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
s64 sdiff;
 
raw_spin_lock_irqsave(kvm-arch.tsc_write_lock, flags);
-   offset = data - native_read_tsc();
+   offset = kvm_x86_ops-compute_tsc_offset(vcpu, data);
ns = get_kernel_ns();
elapsed = ns - kvm-arch.last_tsc_nsec;
sdiff = data - kvm-arch.last_tsc_write;
@@ -1044,13 +1044,13 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, u64 data)
 * In that case, for a reliable TSC, we can match TSC offsets,
 * or make a best guest using elapsed value.
 */
-   if (sdiff  nsec_to_cycles(5ULL * NSEC_PER_SEC) 
+   if (sdiff  nsec_to_cycles(vcpu, 5ULL * NSEC_PER_SEC) 
elapsed  5ULL * NSEC_PER_SEC) {
if (!check_tsc_unstable()) {
offset = kvm-arch.last_tsc_offset;
pr_debug(kvm: matched tsc offset for %llu\n, data);
} else {
-   u64 delta = nsec_to_cycles(elapsed);
+   u64 delta = nsec_to_cycles(vcpu, elapsed);
offset += delta;
pr_debug(kvm: adjusted tsc offset by %llu\n, delta);
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/6] KVM: X86: Implement call-back to propagate virtual_tsc_khz

2011-03-25 Thread Joerg Roedel
This patch implements a call-back into the architecture code
to allow the propagation of changes to the virtual tsc_khz
of the vcpu.
On SVM it updates the tsc_ratio variable, on VMX it does
nothing.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/svm.c  |   33 +
 arch/x86/kvm/vmx.c  |   11 +++
 3 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0344b94..9958dd8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -588,6 +588,7 @@ struct kvm_x86_ops {
 
bool (*has_wbinvd_exit)(void);
 
+   void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz);
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2ce734c..f6d66c2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -896,6 +896,38 @@ static u64 svm_scale_tsc(struct kvm_vcpu *vcpu, u64 tsc)
return _tsc;
 }
 
+static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
+{
+   struct vcpu_svm *svm = to_svm(vcpu);
+   u64 ratio;
+   u64 khz;
+
+   /* TSC scaling supported? */
+   if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR))
+   return;
+
+   /* TSC-Scaling disabled or guest TSC same frequency as host TSC? */
+   if (user_tsc_khz == 0) {
+   vcpu-arch.virtual_tsc_khz = 0;
+   svm-tsc_ratio = TSC_RATIO_DEFAULT;
+   return;
+   }
+
+   khz = user_tsc_khz;
+
+   /* TSC scaling required  - calculate ratio */
+   ratio = khz  32;
+   do_div(ratio, tsc_khz);
+
+   if (ratio == 0 || ratio  TSC_RATIO_RSVD) {
+   WARN_ONCE(1, Invalid TSC ratio - virtual-tsc-khz=%u\n,
+   user_tsc_khz);
+   return;
+   }
+   vcpu-arch.virtual_tsc_khz = user_tsc_khz;
+   svm-tsc_ratio = ratio;
+}
+
 static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4004,6 +4036,7 @@ static struct kvm_x86_ops svm_x86_ops = {
 
.has_wbinvd_exit = svm_has_wbinvd_exit,
 
+   .set_tsc_khz = svm_set_tsc_khz,
.write_tsc_offset = svm_write_tsc_offset,
.adjust_tsc_offset = svm_adjust_tsc_offset,
 
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1bdb49d..0e5dfc6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1161,6 +1161,16 @@ static u64 guest_read_tsc(void)
 }
 
 /*
+ * Empty call-back. Needs to be implemented when VMX enables the SET_TSC_KHZ
+ * ioctl. In this case the call-back should update internal vmx state to make
+ * the changes effective.
+ */
+static void vmx_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz)
+{
+   /* Nothing to do here */
+}
+
+/*
  * writes 'offset' into guest's timestamp counter offset register
  */
 static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
@@ -4496,6 +4506,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
 
.has_wbinvd_exit = cpu_has_vmx_wbinvd_exit,
 
+   .set_tsc_khz = vmx_set_tsc_khz,
.write_tsc_offset = vmx_write_tsc_offset,
.adjust_tsc_offset = vmx_adjust_tsc_offset,
 
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/6] KVM: X86: Let kvm-clock report the right tsc frequency

2011-03-25 Thread Joerg Roedel
This patch changes the kvm_guest_time_update function to use
TSC frequency the guest actually has for updating its clock.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_host.h |6 +++---
 arch/x86/kvm/x86.c  |   25 +++--
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35f81b1..0344b94 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -380,7 +380,10 @@ struct kvm_vcpu_arch {
u64 last_kernel_ns;
u64 last_tsc_nsec;
u64 last_tsc_write;
+   u32 virtual_tsc_khz;
bool tsc_catchup;
+   u32  tsc_catchup_mult;
+   s8   tsc_catchup_shift;
 
bool nmi_pending;
bool nmi_injected;
@@ -450,9 +453,6 @@ struct kvm_arch {
u64 last_tsc_nsec;
u64 last_tsc_offset;
u64 last_tsc_write;
-   u32 virtual_tsc_khz;
-   u32 virtual_tsc_mult;
-   s8 virtual_tsc_shift;
 
struct kvm_xen_hvm_config xen_hvm_config;
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b8b16a..1e7af86 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -982,6 +982,14 @@ static inline int kvm_tsc_changes_freq(void)
return ret;
 }
 
+static u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu)
+{
+   if (vcpu-arch.virtual_tsc_khz)
+   return vcpu-arch.virtual_tsc_khz;
+   else
+   return __this_cpu_read(cpu_tsc_khz);
+}
+
 static inline u64 nsec_to_cycles(u64 nsec)
 {
u64 ret;
@@ -995,20 +1003,19 @@ static inline u64 nsec_to_cycles(u64 nsec)
return ret;
 }
 
-static void kvm_arch_set_tsc_khz(struct kvm *kvm, u32 this_tsc_khz)
+static void kvm_init_tsc_catchup(struct kvm_vcpu *vcpu, u32 this_tsc_khz)
 {
/* Compute a scale to convert nanoseconds in TSC cycles */
kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000,
-  kvm-arch.virtual_tsc_shift,
-  kvm-arch.virtual_tsc_mult);
-   kvm-arch.virtual_tsc_khz = this_tsc_khz;
+  vcpu-arch.tsc_catchup_shift,
+  vcpu-arch.tsc_catchup_mult);
 }
 
 static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 {
u64 tsc = pvclock_scale_delta(kernel_ns-vcpu-arch.last_tsc_nsec,
- vcpu-kvm-arch.virtual_tsc_mult,
- vcpu-kvm-arch.virtual_tsc_shift);
+ vcpu-arch.tsc_catchup_mult,
+ vcpu-arch.tsc_catchup_shift);
tsc += vcpu-arch.last_tsc_write;
return tsc;
 }
@@ -1075,8 +1082,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
local_irq_save(flags);
kvm_get_msr(v, MSR_IA32_TSC, tsc_timestamp);
kernel_ns = get_kernel_ns();
-   this_tsc_khz = __this_cpu_read(cpu_tsc_khz);
-
+   this_tsc_khz = vcpu_tsc_khz(v);
if (unlikely(this_tsc_khz == 0)) {
local_irq_restore(flags);
kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
@@ -5955,8 +5961,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
}
vcpu-arch.pio_data = page_address(page);
 
-   if (!kvm-arch.virtual_tsc_khz)
-   kvm_arch_set_tsc_khz(kvm, max_tsc_khz);
+   kvm_init_tsc_catchup(vcpu, max_tsc_khz);
 
r = kvm_mmu_create(vcpu);
if (r  0)
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] KVM: X86: Make tsc_delta calculation a function of guest tsc

2011-03-25 Thread Joerg Roedel
The calculation of the tsc_delta value to ensure a
forward-going tsc for the guest is a function of the
host-tsc. This works as long as the guests tsc_khz is equal
to the hosts tsc_khz. With tsc-scaling hardware support this
is not longer true and the tsc_delta needs to be calculated
using guest_tsc values.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/x86.c |9 +++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1e7af86..47dd6ed 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2126,8 +2126,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_ops-vcpu_load(vcpu, cpu);
if (unlikely(vcpu-cpu != cpu) || check_tsc_unstable()) {
/* Make sure TSC doesn't go backwards */
-   s64 tsc_delta = !vcpu-arch.last_host_tsc ? 0 :
-   native_read_tsc() - vcpu-arch.last_host_tsc;
+   s64 tsc_delta;
+   u64 tsc;
+
+   kvm_get_msr(vcpu, MSR_IA32_TSC, tsc);
+   tsc_delta = !vcpu-arch.last_guest_tsc ? 0 :
+tsc - vcpu-arch.last_guest_tsc;
+
if (tsc_delta  0)
mark_tsc_unstable(KVM discovered backwards TSC);
if (check_tsc_unstable()) {
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] KVM: X86: Implement userspace interface to set virtual_tsc_khz

2011-03-25 Thread Joerg Roedel
This patch implements two new vm-ioctls to get and set the
virtual_tsc_khz if the machine supports tsc-scaling. Setting
the tsc-frequency is only possible before userspace creates
any vcpu.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 Documentation/kvm/api.txt   |   23 +++
 arch/x86/include/asm/kvm_host.h |7 +++
 arch/x86/kvm/svm.c  |   20 
 arch/x86/kvm/x86.c  |   35 +++
 include/linux/kvm.h |5 +
 5 files changed, 90 insertions(+), 0 deletions(-)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 9bef4e4..1b9eaa7 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -1263,6 +1263,29 @@ struct kvm_assigned_msix_entry {
__u16 padding[3];
 };
 
+4.54 KVM_SET_TSC_KHZ
+
+Capability: KVM_CAP_TSC_CONTROL
+Architectures: x86
+Type: vcpu ioctl
+Parameters: virtual tsc_khz
+Returns: 0 on success, -1 on error
+
+Specifies the tsc frequency for the virtual machine. The unit of the
+frequency is KHz.
+
+4.55 KVM_GET_TSC_KHZ
+
+Capability: KVM_CAP_GET_TSC_KHZ
+Architectures: x86
+Type: vcpu ioctl
+Parameters: none
+Returns: virtual tsc-khz on success, negative value on error
+
+Returns the tsc frequency of the guest. The unit of the return value is
+KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
+error.
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7f48528..473a3be 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -632,6 +632,13 @@ u8 kvm_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t 
gfn);
 
 extern bool tdp_enabled;
 
+/* control of guest tsc rate supported? */
+extern bool kvm_has_tsc_control;
+/* minimum supported tsc_khz for guests */
+extern u32  kvm_min_guest_tsc_khz;
+/* maximum supported tsc_khz for guests */
+extern u32  kvm_max_guest_tsc_khz;
+
 enum emulation_result {
EMULATE_DONE,   /* no further processing */
EMULATE_DO_MMIO,  /* kvm_run filled with mmio request */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 38a4bcc..a5c1b5b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -64,6 +64,8 @@ MODULE_LICENSE(GPL);
 #define DEBUGCTL_RESERVED_BITS (~(0x3fULL))
 
 #define TSC_RATIO_RSVD  0xff00ULL
+#define TSC_RATIO_MIN  0x0001ULL
+#define TSC_RATIO_MAX  0x00ffULL
 
 static bool erratum_383_found __read_mostly;
 
@@ -197,6 +199,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm);
 static int nested_svm_vmexit(struct vcpu_svm *svm);
 static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
  bool has_error_code, u32 error_code);
+static u64 __scale_tsc(u64 ratio, u64 tsc);
 
 enum {
VMCB_INTERCEPTS, /* Intercept vectors, TSC offset,
@@ -807,6 +810,23 @@ static __init int svm_hardware_setup(void)
if (boot_cpu_has(X86_FEATURE_FXSR_OPT))
kvm_enable_efer_bits(EFER_FFXSR);
 
+   if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+   u64 max;
+
+   kvm_has_tsc_control = true;
+
+   /*
+* Make sure the user can only configure tsc_khz values that
+* fit into a signed integer.
+* A min value is not calculated needed because it will always
+* be 1 on all machines and a value of 0 is used to disable
+* tsc-scaling for the vcpu.
+*/
+   max = min(0x7fffULL, __scale_tsc(tsc_khz, TSC_RATIO_MAX));
+
+   kvm_max_guest_tsc_khz = max;
+   }
+
if (nested) {
printk(KERN_INFO kvm: Nested Virtualization enabled\n);
kvm_enable_efer_bits(EFER_SVME | EFER_LMSLE);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2f0b552..5cc9a44 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -100,6 +100,11 @@ EXPORT_SYMBOL_GPL(kvm_x86_ops);
 int ignore_msrs = 0;
 module_param_named(ignore_msrs, ignore_msrs, bool, S_IRUGO | S_IWUSR);
 
+bool kvm_has_tsc_control;
+EXPORT_SYMBOL_GPL(kvm_has_tsc_control);
+u32  kvm_max_guest_tsc_khz;
+EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz);
+
 #define KVM_NR_SHARED_MSRS 16
 
 struct kvm_shared_msrs_global {
@@ -1999,6 +2004,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
case KVM_CAP_ASYNC_PF:
+   case KVM_CAP_GET_TSC_KHZ:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
@@ -2025,6 +2031,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_XCRS:
r = cpu_has_xsave;
break;
+   case KVM_CAP_TSC_CONTROL:
+   r = kvm_has_tsc_control;
+   

[PATCH 0/6] TSC scaling support for KVM v3

2011-03-25 Thread Joerg Roedel
Hi,

this is the third round of my patches to support tsc-scaling in KVM. The
changes to v2 address Avi's comments from yesterday. Besides that the
whole virtual_tsc_khz thing has been moved out of the vm into the vcpu
data structure. The mult and shift parts where renamed to tsc_catchup_*
because this is their actual use (and because the handling of
virtual_tsc_khz has changed so that is made sense to seperate them).

Comments and feedback (or merging) appreciated :-)

Regards,

Joerg

Diffstat:

 Documentation/kvm/api.txt|   23 
 arch/x86/include/asm/kvm_host.h  |   16 -
 arch/x86/include/asm/msr-index.h |1 +
 arch/x86/kvm/svm.c   |  117 +-
 arch/x86/kvm/vmx.c   |   17 ++
 arch/x86/kvm/x86.c   |   79 --
 include/linux/kvm.h  |5 ++
 7 files changed, 237 insertions(+), 21 deletions(-)

Shortlog:

Joerg Roedel (6):
  KVM: SVM: Implement infrastructure for TSC_RATE_MSR
  KVM: X86: Let kvm-clock report the right tsc frequency
  KVM: X86: Make tsc_delta calculation a function of guest tsc
  KVM: X86: Implement call-back to propagate virtual_tsc_khz
  KVM: X86: Delegate tsc-offset calculation to architecture code
  KVM: X86: Implement userspace interface to set virtual_tsc_khz


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qemu-kvm crash with

2011-03-25 Thread Conor Murphy
Hi,

The content of aiocb

(gdb) print *aiocb
$1 = {common = {pool = 0x9aced0, bs = 0x1270230, cb = 0x45591f multiwrite_cb,
 opaque = 0x7f54b0034f60, next = 0x0}, aio_fildes = 16, 
 {aio_iov = 0x7f54b006cd48, aio_ioctl_buf = 0x7f54b006cd48}, 
 aio_niov = 17, 
  aio_nbytes = 65024, ev_signo = 12, aio_offset = 1081344, 
 node = {tqe_next = 0x0, tqe_prev = 0x9f10a0}, 
  aio_type = 2, ret = -115, active = 1, next = 0x7f54b00409f0, 
 async_context_id = 0}

(gdb) print aiocb-aio_iov[0]
$2 = {iov_base = 0x7f54a9f141f8, iov_len = 3592}
(gdb) print aiocb-aio_iov[1]
$3 = {iov_base = 0x7f54a27d5000, iov_len = 4096}
(gdb) print aiocb-aio_iov[2]
$4 = {iov_base = 0x7f54a30d6000, iov_len = 4096}
(gdb) print aiocb-aio_iov[3]
$5 = {iov_base = 0x7f5433a57000, iov_len = 4096}
(gdb) print aiocb-aio_iov[5]
$6 = {iov_base = 0x7f54a2fd9000, iov_len = 4096}
(gdb) print aiocb-aio_iov[6]
$7 = {iov_base = 0x7f54a275a000, iov_len = 4096}
(gdb) print aiocb-aio_iov[7]
$8 = {iov_base = 0x7f54a2fdb000, iov_len = 4096}
(gdb) print aiocb-aio_iov[8]
$9 = {iov_base = 0x7f54ab55c000, iov_len = 4096}
(gdb) print aiocb-aio_iov[9]
$10 = {iov_base = 0x7f543639d000, iov_len = 4096}
(gdb) print aiocb-aio_iov[10]
$11 = {iov_base = 0x7f543115e000, iov_len = 4096}
(gdb) print aiocb-aio_iov[11]
$12 = {iov_base = 0x7f54361df000, iov_len = 4096}
(gdb) print aiocb-aio_iov[12]
$13 = {iov_base = 0x7f54a962, iov_len = 4096}
(gdb) print aiocb-aio_iov[13]
$14 = {iov_base = 0x7f54a23a1000, iov_len = 4096}
(gdb) print aiocb-aio_iov[14]
$15 = {iov_base = 0x7f54ae122000, iov_len = 4096}
(gdb) print aiocb-aio_iov[15]
$16 = {iov_base = 0x7f54312a3000, iov_len = 4096}
(gdb) print aiocb-aio_iov[16]
$17 = {iov_base = 0x7f54a28a4000, iov_len = 503}
(gdb) 

The one thing that seems odd is that the sum of iov_len is 65535 
which is  then aio_nbtyes of 65024

Does this mean the code ends up writing past the end of buf?

/Conor

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/13] KVM: x86 emulator: add framework for instruction

2011-03-25 Thread Joerg Roedel
From: Avi Kivity a...@redhat.com

When running in guest mode, certain instructions can be intercepted by
hardware.  This also holds for nested guests running on emulated
virtualization hardware, in particular instructions emulated by kvm
itself.

This patch adds a framework for intercepting instructions.  If an
instruction is marked for interception, and if we're running in guest
mode, a callback is called to check whether an intercept is needed or
not.  The callback is called at three points in time: immediately after
beginning execution, after checking privilge exceptions, and after
checking memory exception.  This suits the different interception points
defined for different instructions and for the various virtualization
instruction sets.

In addition, a new X86EMUL_INTERCEPT is defined, which any callback or
memory access may define, allowing the more complicated intercepts to be
implemented in existing callbacks.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |   20 
 arch/x86/kvm/emulate.c |   26 ++
 arch/x86/kvm/x86.c |9 +
 3 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 0f52135..4b9efb7 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -14,6 +14,8 @@
 #include asm/desc_defs.h
 
 struct x86_emulate_ctxt;
+enum x86_intercept;
+enum x86_intercept_stage;
 
 struct x86_exception {
u8 vector;
@@ -62,6 +64,7 @@ struct x86_exception {
 #define X86EMUL_RETRY_INSTR 3 /* retry the instruction for some reason */
 #define X86EMUL_CMPXCHG_FAILED  4 /* cmpxchg did not see expected value */
 #define X86EMUL_IO_NEEDED   5 /* IO is needed to complete emulation */
+#define X86EMUL_INTERCEPTED 6 /* Intercepted by nested VMCB/VMCS */
 
 struct x86_emulate_ops {
/*
@@ -158,6 +161,9 @@ struct x86_emulate_ops {
int (*set_dr)(int dr, unsigned long value, struct kvm_vcpu *vcpu);
int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+   int (*intercept)(struct x86_emulate_ctxt *ctxt,
+enum x86_intercept intercept,
+enum x86_intercept_stage stage);
 };
 
 /* Type, address-of, and value of an instruction's operand. */
@@ -197,6 +203,7 @@ struct read_cache {
 struct decode_cache {
u8 twobyte;
u8 b;
+   u8 intercept;
u8 lock_prefix;
u8 rep_prefix;
u8 op_bytes;
@@ -238,6 +245,7 @@ struct x86_emulate_ctxt {
/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
 
+   bool guest_mode; /* guest running a nested guest */
bool perm_ok; /* do not check permissions if true */
bool only_vendor_specific_insn;
 
@@ -259,6 +267,18 @@ struct x86_emulate_ctxt {
 #define X86EMUL_MODE_PROT32   4/* 32-bit protected mode. */
 #define X86EMUL_MODE_PROT64   8/* 64-bit (long) mode.*/
 
+enum x86_intercept_stage {
+   x86_icpt_pre_except,
+   x86_icpt_post_except,
+   x86_icpt_post_memaccess,
+};
+
+enum x86_intercept {
+   x86_intercept_none,
+
+   nr_x86_intercepts
+};
+
 /* Host execution mode. */
 #if defined(CONFIG_X86_32)
 #define X86EMUL_MODE_HOST X86EMUL_MODE_PROT32
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 14c5ad5..8c6af7e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -102,6 +102,7 @@
 
 struct opcode {
u32 flags;
+   u8 intercept;
union {
int (*execute)(struct x86_emulate_ctxt *ctxt);
struct opcode *group;
@@ -2326,10 +2327,13 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
 }
 
 #define D(_y) { .flags = (_y) }
+#define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
 #define ND(0)
 #define G(_f, _g) { .flags = ((_f) | Group), .u.group = (_g) }
 #define GD(_f, _g) { .flags = ((_f) | Group | GroupDual), .u.gdual = (_g) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
+#define II(_f, _e, _i) \
+   { .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
 
 #define D2bv(_f)  D((_f) | ByteOp), D(_f)
 #define I2bv(_f, _e)  I((_f) | ByteOp, _e), I(_f, _e)
@@ -2745,6 +2749,7 @@ done_prefixes:
}
 
c-execute = opcode.u.execute;
+   c-intercept = opcode.intercept;
 
/* Unrecognised? */
if (c-d == 0 || (c-d  Undefined))
@@ -2979,12 +2984,26 @@ x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}
 
+   if (unlikely(ctxt-guest_mode)  c-intercept) {
+   rc = ops-intercept(ctxt, c-intercept,
+   x86_icpt_pre_except);
+   if (rc != 

[PATCH 07/13] KVM: SVM: Add intercept checks for descriptor table accesses

2011-03-25 Thread Joerg Roedel
This patch add intercept checks into the KVM instruction
emulator to check for the 8 instructions that access the
descriptor table addresses.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/emulate.c |   13 +++--
 arch/x86/kvm/svm.c |   13 +
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0719954..505348f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2370,8 +2370,17 @@ static struct opcode group5[] = {
D(SrcMem | ModRM | Stack), N,
 };
 
+static struct opcode group6[] = {
+   DI(ModRM,sldt),
+   DI(ModRM,str),
+   DI(ModRM | Priv, lldt),
+   DI(ModRM | Priv, ltr),
+   N, N, N, N,
+};
+
 static struct group_dual group7 = { {
-   N, N, DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt),
+   DI(ModRM | DstMem | Priv, sgdt), DI(ModRM | DstMem | Priv, sidt),
+   DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt),
DI(SrcNone | ModRM | DstMem | Mov, smsw), N,
DI(SrcMem16 | ModRM | Mov | Priv, lmsw),
DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg),
@@ -2502,7 +2511,7 @@ static struct opcode opcode_table[256] = {
 
 static struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
-   N, GD(0, group7), N, N,
+   G(0, group6), GD(0, group7), N, N,
N, D(ImplicitOps | VendorSpecific), DI(ImplicitOps | Priv, clts), N,
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM), N, N,
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 25d7460..faa959e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3874,6 +3874,10 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
 #define POST_EX(exit) { .exit_code = (exit), \
.stage = x86_icpt_post_except, \
.valid = true }
+#define POST_MEM(exit) { .exit_code = (exit), \
+.stage = x86_icpt_post_memaccess, \
+.valid = true }
+
 
 static struct __x86_intercept {
u32 exit_code;
@@ -3887,9 +3891,18 @@ static struct __x86_intercept {
[x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0),
[x86_intercept_dr_read] = POST_EX(SVM_EXIT_READ_DR0),
[x86_intercept_dr_write]= POST_EX(SVM_EXIT_WRITE_DR0),
+   [x86_intercept_sldt]= POST_MEM(SVM_EXIT_LDTR_READ),
+   [x86_intercept_str] = POST_MEM(SVM_EXIT_TR_READ),
+   [x86_intercept_lldt]= POST_MEM(SVM_EXIT_LDTR_WRITE),
+   [x86_intercept_ltr] = POST_MEM(SVM_EXIT_TR_WRITE),
+   [x86_intercept_sgdt]= POST_MEM(SVM_EXIT_GDTR_READ),
+   [x86_intercept_sidt]= POST_MEM(SVM_EXIT_IDTR_READ),
+   [x86_intercept_lgdt]= POST_MEM(SVM_EXIT_GDTR_WRITE),
+   [x86_intercept_lidt]= POST_MEM(SVM_EXIT_IDTR_WRITE),
 };
 
 #undef POST_EX
+#undef POST_MEM
 
 static int svm_check_intercept(struct kvm_vcpu *vcpu,
   struct x86_instruction_info *info,
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/13] KVM: SVM: Add intercept checks for remaining group7 instructions

2011-03-25 Thread Joerg Roedel
This patch implements the emulator intercept checks for the
RDTSCP, MONITOR, and MWAIT instructions.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/emulate.c |   15 +--
 arch/x86/kvm/svm.c |3 +++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dc53806..0aaba1e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2346,6 +2346,12 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
D2bv(((_f)  ~Lock) | DstAcc | SrcImm)
 
 
+static struct opcode group7_rm1[] = {
+   DI(SrcNone | ModRM | Priv, monitor),
+   DI(SrcNone | ModRM | Priv, mwait),
+   N, N, N, N, N, N,
+};
+
 static struct opcode group7_rm3[] = {
DI(SrcNone | ModRM | Priv, vmrun),
DI(SrcNone | ModRM | Priv, vmmcall),
@@ -2357,6 +2363,11 @@ static struct opcode group7_rm3[] = {
DI(SrcNone | ModRM | Priv, invlpga),
 };
 
+static struct opcode group7_rm7[] = {
+   N,
+   DI(SrcNone | ModRM, rdtscp),
+   N, N, N, N, N, N,
+};
 static struct opcode group1[] = {
X7(D(Lock)), N
 };
@@ -2399,10 +2410,10 @@ static struct group_dual group7 = { {
DI(SrcMem16 | ModRM | Mov | Priv, lmsw),
DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg),
 }, {
-   D(SrcNone | ModRM | Priv | VendorSpecific), N,
+   D(SrcNone | ModRM | Priv | VendorSpecific), EXT(0, group7_rm1),
N, EXT(0, group7_rm3),
DI(SrcNone | ModRM | DstMem | Mov, smsw), N,
-   DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N,
+   DI(SrcMem16 | ModRM | Mov | Priv, lmsw), EXT(0, group7_rm7),
 } };
 
 static struct opcode group8[] = {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dded390..958697e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3907,6 +3907,9 @@ static struct __x86_intercept {
[x86_intercept_clgi]= POST_EX(SVM_EXIT_CLGI),
[x86_intercept_skinit]  = POST_EX(SVM_EXIT_SKINIT),
[x86_intercept_invlpga] = POST_EX(SVM_EXIT_INVLPGA),
+   [x86_intercept_rdtscp]  = POST_EX(SVM_EXIT_RDTSCP),
+   [x86_intercept_monitor] = POST_MEM(SVM_EXIT_MONITOR),
+   [x86_intercept_mwait]   = POST_EX(SVM_EXIT_MWAIT),
 };
 
 #undef POST_EX
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/13] KVM: SVM: Add intercept checks for one-byte instructions

2011-03-25 Thread Joerg Roedel
This patch add intercept checks for emulated one-byte
instructions to the KVM instruction emulation path.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/emulate.c |4 ++--
 arch/x86/kvm/svm.c |   14 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8947643..4c0939d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2484,7 +2484,7 @@ static struct opcode opcode_table[256] = {
D(DstMem | SrcNone | ModRM | Mov), D(ModRM | SrcMem | NoAccess | 
DstReg),
D(ImplicitOps | SrcMem16 | ModRM), G(0, group1A),
/* 0x90 - 0x97 */
-   X8(D(SrcAcc | DstReg)),
+   DI(SrcAcc | DstReg, pause), X7(D(SrcAcc | DstReg)),
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
I(SrcImmFAddr | No64, em_call_far), N,
@@ -2526,7 +2526,7 @@ static struct opcode opcode_table[256] = {
D(SrcImmFAddr | No64), D(SrcImmByte | ImplicitOps),
D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps),
/* 0xF0 - 0xF7 */
-   N, N, N, N,
+   N, DI(ImplicitOps, icebp), N, N,
DI(ImplicitOps | Priv, hlt), D(ImplicitOps),
G(ByteOp, group3), G(0, group3),
/* 0xF8 - 0xFF */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c2e90bb..847a3f9 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3922,6 +3922,13 @@ static struct __x86_intercept {
[x86_intercept_rdpmc]   = POST_EX(SVM_EXIT_RDPMC),
[x86_intercept_cpuid]   = PRE_EX(SVM_EXIT_CPUID),
[x86_intercept_rsm] = PRE_EX(SVM_EXIT_RSM),
+   [x86_intercept_pause]   = PRE_EX(SVM_EXIT_PAUSE),
+   [x86_intercept_pushf]   = PRE_EX(SVM_EXIT_PUSHF),
+   [x86_intercept_popf]= PRE_EX(SVM_EXIT_POPF),
+   [x86_intercept_intn]= PRE_EX(SVM_EXIT_SWINT),
+   [x86_intercept_iret]= PRE_EX(SVM_EXIT_IRET),
+   [x86_intercept_icebp]   = PRE_EX(SVM_EXIT_ICEBP),
+   [x86_intercept_hlt] = POST_EX(SVM_EXIT_HLT),
 };
 
 #undef PRE_EX
@@ -3990,6 +3997,13 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
else
vmcb-control.exit_info_1 = 0;
break;
+   case SVM_EXIT_PAUSE:
+   /*
+* We get this for NOP only, but pause
+* is rep not, check this here
+*/
+   if (info-rep_prefix != REPE_PREFIX)
+   goto out;
default:
break;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/13] KVM: X86: Add x86 callback for intercept check

2011-03-25 Thread Joerg Roedel
This patch adds a callback into kvm_x86_ops so that svm and
vmx code can do intercept checks on emulated instructions.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_host.h |   21 +
 arch/x86/kvm/svm.c  |9 +
 arch/x86/kvm/x86.c  |   20 +++-
 3 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 35f81b1..7544964 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -504,6 +504,22 @@ struct kvm_vcpu_stat {
u32 nmi_injections;
 };
 
+/*
+ * This struct is used to carry enough information from the instruction
+ * decoder to main KVM so that a decision can be made whether the
+ * instruction needs to be intercepted or not.
+ */
+struct x86_instruction_info {
+   u8  intercept;  /* which intercept  */
+   u8  rep_prefix; /* rep prefix?  */
+   u8  modrm;  /* index of register used   */
+   u64 src_val;/* value of source operand  */
+   u8  src_bytes;  /* size of source operand   */
+   u8  dst_bytes;  /* size of destination operand  */
+   u8  ad_bytes;   /* size of src/dst address  */
+   u64 next_rip;   /* rip following the instruction*/
+};
+
 struct kvm_x86_ops {
int (*cpu_has_kvm_support)(void);  /* __init */
int (*disabled_by_bios)(void); /* __init */
@@ -591,6 +607,11 @@ struct kvm_x86_ops {
void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 
void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2);
+
+   int (*check_intercept)(struct kvm_vcpu *vcpu,
+  struct x86_instruction_info *info,
+  enum x86_intercept_stage stage);
+
const struct trace_print_flags *exit_reasons_str;
 };
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2a19322..b36df64 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3871,6 +3871,13 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
update_cr0_intercept(svm);
 }
 
+static int svm_check_intercept(struct kvm_vcpu *vcpu,
+  struct x86_instruction_info *info,
+  enum x86_intercept_stage stage)
+{
+   return X86EMUL_CONTINUE;
+}
+
 static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -3956,6 +3963,8 @@ static struct kvm_x86_ops svm_x86_ops = {
.adjust_tsc_offset = svm_adjust_tsc_offset,
 
.set_tdp_cr3 = set_tdp_cr3,
+
+   .check_intercept = svm_check_intercept,
 };
 
 static int __init svm_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 90a41aa..bf72ec6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4245,7 +4245,25 @@ static int emulator_intercept(struct x86_emulate_ctxt 
*ctxt,
  enum x86_intercept intercept,
  enum x86_intercept_stage stage)
 {
-   return X86EMUL_CONTINUE;
+   struct x86_instruction_info info = {
+   .intercept  = intercept,
+   .rep_prefix = ctxt-decode.rep_prefix,
+   .modrm  = ctxt-decode.modrm,
+   .src_val= ctxt-decode.src.val64,
+   .src_bytes  = ctxt-decode.src.bytes,
+   .dst_bytes  = ctxt-decode.dst.bytes,
+   .ad_bytes   = ctxt-decode.ad_bytes,
+   .next_rip   = ctxt-eip,
+   };
+
+   /*
+* The callback only needs to be implemented if the architecture
+* supports emulated guest-mode. This BUG_ON reminds the
+* programmer that this callback needs to be implemented.
+*/
+   BUG_ON(kvm_x86_ops-check_intercept == NULL);
+
+   return kvm_x86_ops-check_intercept(ctxt-vcpu, info, stage);
 }
 
 static struct x86_emulate_ops emulate_ops = {
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/13] KVM: x86 emulator: add SVM intercepts

2011-03-25 Thread Joerg Roedel
From: Avi Kivity a...@redhat.com

Add intercept codes for instructions defined by SVM as
interceptable.

Signed-off-by: Avi Kivity a...@redhat.com
Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |   35 +++
 arch/x86/kvm/emulate.c |   24 +---
 2 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 4b9efb7..277f189 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -275,6 +275,41 @@ enum x86_intercept_stage {
 
 enum x86_intercept {
x86_intercept_none,
+   x86_intercept_lmsw,
+   x86_intercept_smsw,
+   x86_intercept_lidt,
+   x86_intercept_sidt,
+   x86_intercept_lgdt,
+   x86_intercept_sgdt,
+   x86_intercept_lldt,
+   x86_intercept_sldt,
+   x86_intercept_ltr,
+   x86_intercept_str,
+   x86_intercept_rdtsc,
+   x86_intercept_rdpmc,
+   x86_intercept_pushf,
+   x86_intercept_popf,
+   x86_intercept_cpuid,
+   x86_intercept_rsm,
+   x86_intercept_iret,
+   x86_intercept_intn,
+   x86_intercept_invd,
+   x86_intercept_pause,
+   x86_intercept_hlt,
+   x86_intercept_invlpg,
+   x86_intercept_invlpga,
+   x86_intercept_vmrun,
+   x86_intercept_vmload,
+   x86_intercept_vmsave,
+   x86_intercept_vmmcall,
+   x86_intercept_stgi,
+   x86_intercept_clgi,
+   x86_intercept_skinit,
+   x86_intercept_rdtscp,
+   x86_intercept_icebp,
+   x86_intercept_wbinvd,
+   x86_intercept_monitor,
+   x86_intercept_mwait,
 
nr_x86_intercepts
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8c6af7e..cf5f396 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2371,15 +2371,15 @@ static struct opcode group5[] = {
 };
 
 static struct group_dual group7 = { {
-   N, N, D(ModRM | SrcMem | Priv), D(ModRM | SrcMem | Priv),
-   D(SrcNone | ModRM | DstMem | Mov), N,
-   D(SrcMem16 | ModRM | Mov | Priv),
-   D(SrcMem | ModRM | ByteOp | Priv | NoAccess),
+   N, N, DI(ModRM | SrcMem | Priv, lgdt), DI(ModRM | SrcMem | Priv, lidt),
+   DI(SrcNone | ModRM | DstMem | Mov, smsw), N,
+   DI(SrcMem16 | ModRM | Mov | Priv, lmsw),
+   DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg),
 }, {
D(SrcNone | ModRM | Priv | VendorSpecific), N,
N, D(SrcNone | ModRM | Priv | VendorSpecific),
-   D(SrcNone | ModRM | DstMem | Mov), N,
-   D(SrcMem16 | ModRM | Mov | Priv), N,
+   DI(SrcNone | ModRM | DstMem | Mov, smsw), N,
+   DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N,
 } };
 
 static struct opcode group8[] = {
@@ -2454,7 +2454,7 @@ static struct opcode opcode_table[256] = {
/* 0x98 - 0x9F */
D(DstAcc | SrcNone), I(ImplicitOps | SrcAcc, em_cwd),
I(SrcImmFAddr | No64, em_call_far), N,
-   D(ImplicitOps | Stack), D(ImplicitOps | Stack), N, N,
+   DI(ImplicitOps | Stack, pushf), DI(ImplicitOps | Stack, popf), N, N,
/* 0xA0 - 0xA7 */
I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
I2bv(DstMem | SrcAcc | Mov | MemAbs, em_mov),
@@ -2477,7 +2477,8 @@ static struct opcode opcode_table[256] = {
G(ByteOp, group11), G(0, group11),
/* 0xC8 - 0xCF */
N, N, N, D(ImplicitOps | Stack),
-   D(ImplicitOps), D(SrcImmByte), D(ImplicitOps | No64), D(ImplicitOps),
+   D(ImplicitOps), DI(SrcImmByte, intn),
+   D(ImplicitOps | No64), DI(ImplicitOps, iret),
/* 0xD0 - 0xD7 */
D2bv(DstMem | SrcOne | ModRM), D2bv(DstMem | ModRM),
N, N, N, N,
@@ -2492,7 +2493,8 @@ static struct opcode opcode_table[256] = {
D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps),
/* 0xF0 - 0xF7 */
N, N, N, N,
-   D(ImplicitOps | Priv), D(ImplicitOps), G(ByteOp, group3), G(0, group3),
+   DI(ImplicitOps | Priv, hlt), D(ImplicitOps),
+   G(ByteOp, group3), G(0, group3),
/* 0xF8 - 0xFF */
D(ImplicitOps), D(ImplicitOps), D(ImplicitOps), D(ImplicitOps),
D(ImplicitOps), D(ImplicitOps), G(0, group4), G(0, group5),
@@ -2502,7 +2504,7 @@ static struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
N, GD(0, group7), N, N,
N, D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv), N,
-   D(ImplicitOps | Priv), D(ImplicitOps | Priv), N, N,
+   DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM), N, N,
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
@@ -2512,7 +2514,7 @@ static struct opcode twobyte_table[256] = {
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
-   D(ImplicitOps | Priv), I(ImplicitOps, em_rdtsc),
+   D(ImplicitOps | Priv), II(ImplicitOps, em_rdtsc, rdtsc),

[PATCH 05/13] KVM: SVM: Add intercept check for emulated cr accesses

2011-03-25 Thread Joerg Roedel
This patch adds all necessary intercept checks for
instructions that access the crX registers.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |3 +
 arch/x86/kvm/emulate.c |8 ++-
 arch/x86/kvm/svm.c |   80 +++-
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 7960eeb..c1489e1 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -275,6 +275,9 @@ enum x86_intercept_stage {
 
 enum x86_intercept {
x86_intercept_none,
+   x86_intercept_cr_read,
+   x86_intercept_cr_write,
+   x86_intercept_clts,
x86_intercept_lmsw,
x86_intercept_smsw,
x86_intercept_lidt,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 078acc4..384cfa2 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2503,14 +2503,16 @@ static struct opcode opcode_table[256] = {
 static struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
N, GD(0, group7), N, N,
-   N, D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv), N,
+   N, D(ImplicitOps | VendorSpecific), DI(ImplicitOps | Priv, clts), N,
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM), N, N,
/* 0x10 - 0x1F */
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
/* 0x20 - 0x2F */
-   D(ModRM | DstMem | Priv | Op3264), D(ModRM | DstMem | Priv | Op3264),
-   D(ModRM | SrcMem | Priv | Op3264), D(ModRM | SrcMem | Priv | Op3264),
+   DI(ModRM | DstMem | Priv | Op3264, cr_read),
+   D(ModRM | DstMem | Priv | Op3264),
+   DI(ModRM | SrcMem | Priv | Op3264, cr_write),
+   D(ModRM | SrcMem | Priv | Op3264),
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index b36df64..3b6992e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3871,11 +3871,89 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
update_cr0_intercept(svm);
 }
 
+#define POST_EX(exit) { .exit_code = (exit), \
+   .stage = x86_icpt_post_except, \
+   .valid = true }
+
+static struct __x86_intercept {
+   u32 exit_code;
+   enum x86_intercept_stage stage;
+   bool valid;
+} x86_intercept_map[] = {
+   [x86_intercept_cr_read] = POST_EX(SVM_EXIT_READ_CR0),
+   [x86_intercept_cr_write]= POST_EX(SVM_EXIT_WRITE_CR0),
+   [x86_intercept_clts]= POST_EX(SVM_EXIT_WRITE_CR0),
+   [x86_intercept_lmsw]= POST_EX(SVM_EXIT_WRITE_CR0),
+   [x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0),
+};
+
+#undef POST_EX
+
 static int svm_check_intercept(struct kvm_vcpu *vcpu,
   struct x86_instruction_info *info,
   enum x86_intercept_stage stage)
 {
-   return X86EMUL_CONTINUE;
+   struct vcpu_svm *svm = to_svm(vcpu);
+   int vmexit, ret = X86EMUL_CONTINUE;
+   struct __x86_intercept icpt_info;
+   struct vmcb *vmcb = svm-vmcb;
+   int reg;
+
+   if (info-intercept = ARRAY_SIZE(x86_intercept_map))
+   goto out;
+
+   icpt_info = x86_intercept_map[info-intercept];
+
+   if (!icpt_info.valid || stage != icpt_info.stage)
+   goto out;
+
+   reg = (info-modrm  3)  7;
+
+   switch (icpt_info.exit_code) {
+   case SVM_EXIT_READ_CR0:
+   if (info-intercept == x86_intercept_cr_read)
+   icpt_info.exit_code += reg;
+   case SVM_EXIT_WRITE_CR0: {
+   unsigned long cr0, val;
+   u64 intercept;
+
+   if (info-intercept == x86_intercept_cr_write)
+   icpt_info.exit_code += reg;
+
+   if (icpt_info.exit_code != SVM_EXIT_WRITE_CR0)
+   break;
+
+   intercept = svm-nested.intercept;
+
+   if (!(intercept  (1ULL  INTERCEPT_SELECTIVE_CR0)))
+   break;
+
+   cr0 = vcpu-arch.cr0  ~SVM_CR0_SELECTIVE_MASK;
+   val = info-src_val   ~SVM_CR0_SELECTIVE_MASK;
+
+   if (info-intercept == x86_intercept_lmsw) {
+   cr0 = 0xfUL;
+   val = 0xfUL;
+   }
+
+   if (cr0 ^ val)
+   icpt_info.exit_code = SVM_EXIT_CR0_SEL_WRITE;
+
+   break;
+   }
+   default:
+   break;
+   }
+
+   vmcb-control.next_rip  = info-next_rip;
+   vmcb-control.exit_code = icpt_info.exit_code;
+   vmexit = nested_svm_exit_handled(svm);
+
+   ret = (vmexit == NESTED_EXIT_DONE) ? X86EMUL_INTERCEPTED
+  : X86EMUL_CONTINUE;
+
+out:
+   

[PATCH 06/13] KVM: SVM: Add intercept check for accessing dr registers

2011-03-25 Thread Joerg Roedel
This patch adds the intercept checks for instruction
accessing the debug registers.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |2 ++
 arch/x86/kvm/emulate.c |4 ++--
 arch/x86/kvm/svm.c |6 ++
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c1489e1..db744c9 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -280,6 +280,8 @@ enum x86_intercept {
x86_intercept_clts,
x86_intercept_lmsw,
x86_intercept_smsw,
+   x86_intercept_dr_read,
+   x86_intercept_dr_write,
x86_intercept_lidt,
x86_intercept_sidt,
x86_intercept_lgdt,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 384cfa2..0719954 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2510,9 +2510,9 @@ static struct opcode twobyte_table[256] = {
N, N, N, N, N, N, N, N, D(ImplicitOps | ModRM), N, N, N, N, N, N, N,
/* 0x20 - 0x2F */
DI(ModRM | DstMem | Priv | Op3264, cr_read),
-   D(ModRM | DstMem | Priv | Op3264),
+   DI(ModRM | DstMem | Priv | Op3264, dr_read),
DI(ModRM | SrcMem | Priv | Op3264, cr_write),
-   D(ModRM | SrcMem | Priv | Op3264),
+   DI(ModRM | SrcMem | Priv | Op3264, dr_write),
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3b6992e..25d7460 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3885,6 +3885,8 @@ static struct __x86_intercept {
[x86_intercept_clts]= POST_EX(SVM_EXIT_WRITE_CR0),
[x86_intercept_lmsw]= POST_EX(SVM_EXIT_WRITE_CR0),
[x86_intercept_smsw]= POST_EX(SVM_EXIT_READ_CR0),
+   [x86_intercept_dr_read] = POST_EX(SVM_EXIT_READ_DR0),
+   [x86_intercept_dr_write]= POST_EX(SVM_EXIT_WRITE_DR0),
 };
 
 #undef POST_EX
@@ -3941,6 +3943,10 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 
break;
}
+   case SVM_EXIT_READ_DR0:
+   case SVM_EXIT_WRITE_DR0:
+   icpt_info.exit_code += reg;
+   break;
default:
break;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/13] KVM: Make the instruction emulator aware of Nested Virtualization v2

2011-03-25 Thread Joerg Roedel
Hi,

this is version 2 of the patch-set to make the KVM instruction emulator
aware of intercepted instructions. Noting the differences to v1 does not
make a lot of sense this this is basically a re-implementation so that
almost everything changed :-)
The re-write was done on the basis of Avi's patches he sent after the
last discussion. With these changes the implementation in the SVM code
got a lot smaller and more generic (and easier to extend). The big
switch is now only necessary for handling special cases.

Comments and feedback is appreciated.

Regards,

Joerg

Diffstat:

 arch/x86/include/asm/kvm_emulate.h |   67 +
 arch/x86/include/asm/kvm_host.h|   21 +++
 arch/x86/kvm/emulate.c |  128 ++
 arch/x86/kvm/svm.c |  264 +---
 arch/x86/kvm/x86.c |   30 
 5 files changed, 432 insertions(+), 78 deletions(-)

Shortlog:

Avi Kivity (2):
  KVM: x86 emulator: add framework for instruction
  KVM: x86 emulator: add SVM intercepts

Joerg Roedel (11):
  KVM: X86: Don't write-back cpu-state on X86EMUL_INTERCEPTED
  KVM: X86: Add x86 callback for intercept check
  KVM: SVM: Add intercept check for emulated cr accesses
  KVM: SVM: Add intercept check for accessing dr registers
  KVM: SVM: Add intercept checks for descriptor table accesses
  KVM: SVM: Add intercept checks for SVM instructions
  KVM: SVM: Add intercept checks for remaining group7 instructions
  KVM: SVM: Add intercept checks for remaining twobyte instructions
  KVM: SVM: Add intercept checks for one-byte instructions
  KVM: SVM: Add checks for IO instructions
  KVM: SVM: Remove nested sel_cr0_write handling code


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/13] KVM: SVM: Add checks for IO instructions

2011-03-25 Thread Joerg Roedel
This patch adds code to check for IOIO intercepts on
instructions decoded by the KVM instruction emulator.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |4 
 arch/x86/kvm/emulate.c |   10 ++
 arch/x86/kvm/svm.c |   36 
 3 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 41c0120..0b2e2de 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -317,6 +317,10 @@ enum x86_intercept {
x86_intercept_mwait,
x86_intercept_rdmsr,
x86_intercept_wrmsr,
+   x86_intercept_in,
+   x86_intercept_ins,
+   x86_intercept_out,
+   x86_intercept_outs,
 
nr_x86_intercepts
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 4c0939d..879ce78 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2339,6 +2339,7 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
{ .flags = (_f), .u.execute = (_e), .intercept = x86_intercept_##_i }
 
 #define D2bv(_f)  D((_f) | ByteOp), D(_f)
+#define D2bvI(_f, _i) DI((_f) | ByteOp, _i), DI((_f), _i)
 #define I2bv(_f, _e)  I((_f) | ByteOp, _e), I(_f, _e)
 
 #define D6ALU(_f) D2bv((_f) | DstMem | SrcReg | ModRM),
\
@@ -2468,8 +2469,8 @@ static struct opcode opcode_table[256] = {
I(DstReg | SrcMem | ModRM | Src2Imm, em_imul_3op),
I(SrcImmByte | Mov | Stack, em_push),
I(DstReg | SrcMem | ModRM | Src2ImmByte, em_imul_3op),
-   D2bv(DstDI | Mov | String), /* insb, insw/insd */
-   D2bv(SrcSI | ImplicitOps | String), /* outsb, outsw/outsd */
+   D2bvI(DstDI | Mov | String, ins), /* insb, insw/insd */
+   D2bvI(SrcSI | ImplicitOps | String, outs), /* outsb, outsw/outsd */
/* 0x70 - 0x7F */
X16(D(SrcImmByte)),
/* 0x80 - 0x87 */
@@ -2520,11 +2521,11 @@ static struct opcode opcode_table[256] = {
N, N, N, N, N, N, N, N,
/* 0xE0 - 0xE7 */
X4(D(SrcImmByte)),
-   D2bv(SrcImmUByte | DstAcc), D2bv(SrcAcc | DstImmUByte),
+   D2bvI(SrcImmUByte | DstAcc, in), D2bvI(SrcAcc | DstImmUByte, out),
/* 0xE8 - 0xEF */
D(SrcImm | Stack), D(SrcImm | ImplicitOps),
D(SrcImmFAddr | No64), D(SrcImmByte | ImplicitOps),
-   D2bv(SrcNone | DstAcc), D2bv(SrcAcc | ImplicitOps),
+   D2bvI(SrcNone | DstAcc, in), D2bvI(SrcAcc | ImplicitOps, out),
/* 0xF0 - 0xF7 */
N, DI(ImplicitOps, icebp), N, N,
DI(ImplicitOps | Priv, hlt), D(ImplicitOps),
@@ -2609,6 +2610,7 @@ static struct opcode twobyte_table[256] = {
 #undef EXT
 
 #undef D2bv
+#undef D2bvI
 #undef I2bv
 #undef D6ALU
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 847a3f9..1672e3c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3929,6 +3929,10 @@ static struct __x86_intercept {
[x86_intercept_iret]= PRE_EX(SVM_EXIT_IRET),
[x86_intercept_icebp]   = PRE_EX(SVM_EXIT_ICEBP),
[x86_intercept_hlt] = POST_EX(SVM_EXIT_HLT),
+   [x86_intercept_in]  = POST_EX(SVM_EXIT_IOIO),
+   [x86_intercept_ins] = POST_EX(SVM_EXIT_IOIO),
+   [x86_intercept_out] = POST_EX(SVM_EXIT_IOIO),
+   [x86_intercept_outs]= POST_EX(SVM_EXIT_IOIO),
 };
 
 #undef PRE_EX
@@ -4004,6 +4008,38 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
 */
if (info-rep_prefix != REPE_PREFIX)
goto out;
+   case SVM_EXIT_IOIO: {
+   u64 exit_info;
+   u32 bytes;
+
+   exit_info = (vcpu-arch.regs[VCPU_REGS_RDX]  0x)  16;
+
+   if (info-intercept == x86_intercept_in ||
+   info-intercept == x86_intercept_ins) {
+   exit_info |= SVM_IOIO_TYPE_MASK;
+   bytes = info-src_bytes;
+   } else {
+   bytes = info-dst_bytes;
+   }
+
+   if (info-intercept == x86_intercept_outs ||
+   info-intercept == x86_intercept_ins)
+   exit_info |= SVM_IOIO_STR_MASK;
+
+   if (info-rep_prefix)
+   exit_info |= SVM_IOIO_REP_MASK;
+
+   bytes = min(bytes, 4u);
+
+   exit_info |= bytes  SVM_IOIO_SIZE_SHIFT;
+
+   exit_info |= (u32)info-ad_bytes  (SVM_IOIO_ASIZE_SHIFT - 1);
+
+   vmcb-control.exit_info_1 = exit_info;
+   vmcb-control.exit_info_2 = info-next_rip;
+
+   break;
+   }
default:
break;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/13] KVM: SVM: Add intercept checks for remaining twobyte instructions

2011-03-25 Thread Joerg Roedel
This patch adds intercepts checks for the remaining twobyte
instructions to the KVM instruction emulator.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |2 ++
 arch/x86/kvm/emulate.c |8 
 arch/x86/kvm/svm.c |   19 +++
 3 files changed, 25 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index db744c9..41c0120 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -315,6 +315,8 @@ enum x86_intercept {
x86_intercept_wbinvd,
x86_intercept_monitor,
x86_intercept_mwait,
+   x86_intercept_rdmsr,
+   x86_intercept_wrmsr,
 
nr_x86_intercepts
 };
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 0aaba1e..8947643 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2550,8 +2550,8 @@ static struct opcode twobyte_table[256] = {
N, N, N, N,
N, N, N, N, N, N, N, N,
/* 0x30 - 0x3F */
-   D(ImplicitOps | Priv), II(ImplicitOps, em_rdtsc, rdtsc),
-   D(ImplicitOps | Priv), N,
+   DI(ImplicitOps | Priv, wrmsr), II(ImplicitOps, em_rdtsc, rdtsc),
+   DI(ImplicitOps | Priv, rdmsr), DI(ImplicitOps | Priv, rdpmc),
D(ImplicitOps | VendorSpecific), D(ImplicitOps | Priv | VendorSpecific),
N, N,
N, N, N, N, N, N, N, N,
@@ -2569,12 +2569,12 @@ static struct opcode twobyte_table[256] = {
X16(D(ByteOp | DstMem | SrcNone | ModRM| Mov)),
/* 0xA0 - 0xA7 */
D(ImplicitOps | Stack), D(ImplicitOps | Stack),
-   N, D(DstMem | SrcReg | ModRM | BitOp),
+   DI(ImplicitOps, cpuid), D(DstMem | SrcReg | ModRM | BitOp),
D(DstMem | SrcReg | Src2ImmByte | ModRM),
D(DstMem | SrcReg | Src2CL | ModRM), N, N,
/* 0xA8 - 0xAF */
D(ImplicitOps | Stack), D(ImplicitOps | Stack),
-   N, D(DstMem | SrcReg | ModRM | BitOp | Lock),
+   DI(ImplicitOps, rsm), D(DstMem | SrcReg | ModRM | BitOp | Lock),
D(DstMem | SrcReg | Src2ImmByte | ModRM),
D(DstMem | SrcReg | Src2CL | ModRM),
D(ModRM), I(DstReg | SrcMem | ModRM, em_imul),
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 958697e..c2e90bb 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3871,6 +3871,9 @@ static void svm_fpu_deactivate(struct kvm_vcpu *vcpu)
update_cr0_intercept(svm);
 }
 
+#define PRE_EX(exit)  { .exit_code = (exit), \
+   .stage = x86_icpt_pre_except, \
+   .valid = true }
 #define POST_EX(exit) { .exit_code = (exit), \
.stage = x86_icpt_post_except, \
.valid = true }
@@ -3910,8 +3913,18 @@ static struct __x86_intercept {
[x86_intercept_rdtscp]  = POST_EX(SVM_EXIT_RDTSCP),
[x86_intercept_monitor] = POST_MEM(SVM_EXIT_MONITOR),
[x86_intercept_mwait]   = POST_EX(SVM_EXIT_MWAIT),
+   [x86_intercept_invlpg]  = POST_EX(SVM_EXIT_INVLPG),
+   [x86_intercept_invd]= POST_EX(SVM_EXIT_INVD),
+   [x86_intercept_wbinvd]  = POST_EX(SVM_EXIT_WBINVD),
+   [x86_intercept_wrmsr]   = POST_EX(SVM_EXIT_MSR),
+   [x86_intercept_rdtsc]   = POST_EX(SVM_EXIT_RDTSC),
+   [x86_intercept_rdmsr]   = POST_EX(SVM_EXIT_MSR),
+   [x86_intercept_rdpmc]   = POST_EX(SVM_EXIT_RDPMC),
+   [x86_intercept_cpuid]   = PRE_EX(SVM_EXIT_CPUID),
+   [x86_intercept_rsm] = PRE_EX(SVM_EXIT_RSM),
 };
 
+#undef PRE_EX
 #undef POST_EX
 #undef POST_MEM
 
@@ -3971,6 +3984,12 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu,
case SVM_EXIT_WRITE_DR0:
icpt_info.exit_code += reg;
break;
+   case SVM_EXIT_MSR:
+   if (info-intercept == x86_intercept_wrmsr)
+   vmcb-control.exit_info_1 = 1;
+   else
+   vmcb-control.exit_info_1 = 0;
+   break;
default:
break;
}
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/13] KVM: X86: Don't write-back cpu-state on X86EMUL_INTERCEPTED

2011-03-25 Thread Joerg Roedel
This patch prevents the changed CPU state to be written back
when the emulator detected that the instruction was
intercepted by the guest.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/include/asm/kvm_emulate.h |1 +
 arch/x86/kvm/emulate.c |3 +++
 arch/x86/kvm/x86.c |3 +++
 3 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 277f189..7960eeb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -325,6 +325,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void 
*insn, int insn_len);
 #define EMULATION_FAILED -1
 #define EMULATION_OK 0
 #define EMULATION_RESTART 1
+#define EMULATION_INTERCEPTED 2
 int x86_emulate_insn(struct x86_emulate_ctxt *ctxt);
 int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 u16 tss_selector, int reason,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index cf5f396..078acc4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3455,6 +3455,9 @@ writeback:
 done:
if (rc == X86EMUL_PROPAGATE_FAULT)
ctxt-have_exception = true;
+   if (rc == X86EMUL_INTERCEPTED)
+   return EMULATION_INTERCEPTED;
+
return (rc == X86EMUL_UNHANDLEABLE) ? EMULATION_FAILED : EMULATION_OK;
 
 twobyte_insn:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2338309..90a41aa 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4458,6 +4458,9 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
 restart:
r = x86_emulate_insn(vcpu-arch.emulate_ctxt);
 
+   if (r == EMULATION_INTERCEPTED)
+   return EMULATE_DONE;
+
if (r == EMULATION_FAILED) {
if (reexecute_instruction(vcpu, cr2))
return EMULATE_DONE;
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/13] KVM: SVM: Add intercept checks for SVM instructions

2011-03-25 Thread Joerg Roedel
This patch adds the necessary code changes in the
instruction emulator and the extensions to svm.c to
implement intercept checks for the svm instructions.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/emulate.c |   23 ++-
 arch/x86/kvm/svm.c |8 
 2 files changed, 30 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 505348f..dc53806 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -75,6 +75,8 @@
 #define Stack   (113) /* Stack instruction (push/pop) */
 #define Group   (114) /* Bits 3:5 of modrm byte extend opcode */
 #define GroupDual   (115) /* Alternate decoding of mod == 3 */
+#define RMExt   (116) /* Opcode extension in ModRM r/m if mod == 3 */
+
 /* Misc flags */
 #define VendorSpecific (122) /* Vendor specific instruction */
 #define NoAccess(123) /* Don't access memory (lea/invlpg/verr etc) */
@@ -2329,6 +2331,7 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
 #define D(_y) { .flags = (_y) }
 #define DI(_y, _i) { .flags = (_y), .intercept = x86_intercept_##_i }
 #define ND(0)
+#define EXT(_f, _e) { .flags = ((_f) | RMExt), .u.group = (_e) }
 #define G(_f, _g) { .flags = ((_f) | Group), .u.group = (_g) }
 #define GD(_f, _g) { .flags = ((_f) | Group | GroupDual), .u.gdual = (_g) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
@@ -2343,6 +2346,17 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
D2bv(((_f)  ~Lock) | DstAcc | SrcImm)
 
 
+static struct opcode group7_rm3[] = {
+   DI(SrcNone | ModRM | Priv, vmrun),
+   DI(SrcNone | ModRM | Priv, vmmcall),
+   DI(SrcNone | ModRM | Priv, vmload),
+   DI(SrcNone | ModRM | Priv, vmsave),
+   DI(SrcNone | ModRM | Priv, stgi),
+   DI(SrcNone | ModRM | Priv, clgi),
+   DI(SrcNone | ModRM | Priv, skinit),
+   DI(SrcNone | ModRM | Priv, invlpga),
+};
+
 static struct opcode group1[] = {
X7(D(Lock)), N
 };
@@ -2386,7 +2400,7 @@ static struct group_dual group7 = { {
DI(SrcMem | ModRM | ByteOp | Priv | NoAccess, invlpg),
 }, {
D(SrcNone | ModRM | Priv | VendorSpecific), N,
-   N, D(SrcNone | ModRM | Priv | VendorSpecific),
+   N, EXT(0, group7_rm3),
DI(SrcNone | ModRM | DstMem | Mov, smsw), N,
DI(SrcMem16 | ModRM | Mov | Priv, lmsw), N,
 } };
@@ -2581,6 +2595,7 @@ static struct opcode twobyte_table[256] = {
 #undef G
 #undef GD
 #undef I
+#undef EXT
 
 #undef D2bv
 #undef I2bv
@@ -2758,6 +2773,12 @@ done_prefixes:
opcode = g_mod3[goffset];
else
opcode = g_mod012[goffset];
+
+   if (opcode.flags  RMExt) {
+   goffset = c-modrm  7;
+   opcode = opcode.u.group[goffset];
+   }
+
c-d |= opcode.flags;
}
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index faa959e..dded390 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3899,6 +3899,14 @@ static struct __x86_intercept {
[x86_intercept_sidt]= POST_MEM(SVM_EXIT_IDTR_READ),
[x86_intercept_lgdt]= POST_MEM(SVM_EXIT_GDTR_WRITE),
[x86_intercept_lidt]= POST_MEM(SVM_EXIT_IDTR_WRITE),
+   [x86_intercept_vmrun]   = POST_EX(SVM_EXIT_VMRUN),
+   [x86_intercept_vmmcall] = POST_EX(SVM_EXIT_VMMCALL),
+   [x86_intercept_vmload]  = POST_EX(SVM_EXIT_VMLOAD),
+   [x86_intercept_vmsave]  = POST_EX(SVM_EXIT_VMSAVE),
+   [x86_intercept_stgi]= POST_EX(SVM_EXIT_STGI),
+   [x86_intercept_clgi]= POST_EX(SVM_EXIT_CLGI),
+   [x86_intercept_skinit]  = POST_EX(SVM_EXIT_SKINIT),
+   [x86_intercept_invlpga] = POST_EX(SVM_EXIT_INVLPGA),
 };
 
 #undef POST_EX
-- 
1.7.1


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 13/13] KVM: SVM: Remove nested sel_cr0_write handling code

2011-03-25 Thread Joerg Roedel
This patch removes all the old code which handled the nested
selective cr0 write intercepts. This code was only in place
as a work-around until the instruction emulator is capable
of doing the same. This is the case with this patch-set and
so the code can be removed.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 arch/x86/kvm/svm.c |   78 +--
 1 files changed, 26 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1672e3c..37c0060 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -93,14 +93,6 @@ struct nested_state {
/* A VMEXIT is required but not yet emulated */
bool exit_required;
 
-   /*
-* If we vmexit during an instruction emulation we need this to restore
-* the l1 guest rip after the emulation
-*/
-   unsigned long vmexit_rip;
-   unsigned long vmexit_rsp;
-   unsigned long vmexit_rax;
-
/* cache for intercepts of the guest */
u32 intercept_cr;
u32 intercept_dr;
@@ -1365,31 +1357,6 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 {
struct vcpu_svm *svm = to_svm(vcpu);
 
-   if (is_guest_mode(vcpu)) {
-   /*
-* We are here because we run in nested mode, the host kvm
-* intercepts cr0 writes but the l1 hypervisor does not.
-* But the L1 hypervisor may intercept selective cr0 writes.
-* This needs to be checked here.
-*/
-   unsigned long old, new;
-
-   /* Remove bits that would trigger a real cr0 write intercept */
-   old = vcpu-arch.cr0  SVM_CR0_SELECTIVE_MASK;
-   new = cr0  SVM_CR0_SELECTIVE_MASK;
-
-   if (old == new) {
-   /* cr0 write with ts and mp unchanged */
-   svm-vmcb-control.exit_code = SVM_EXIT_CR0_SEL_WRITE;
-   if (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE) {
-   svm-nested.vmexit_rip = kvm_rip_read(vcpu);
-   svm-nested.vmexit_rsp = 
kvm_register_read(vcpu, VCPU_REGS_RSP);
-   svm-nested.vmexit_rax = 
kvm_register_read(vcpu, VCPU_REGS_RAX);
-   return;
-   }
-   }
-   }
-
 #ifdef CONFIG_X86_64
if (vcpu-arch.efer  EFER_LME) {
if (!is_paging(vcpu)  (cr0  X86_CR0_PG)) {
@@ -2676,6 +2643,29 @@ static int emulate_on_interception(struct vcpu_svm *svm)
return emulate_instruction(svm-vcpu, 0) == EMULATE_DONE;
 }
 
+bool check_selective_cr0_intercepted(struct vcpu_svm *svm, unsigned long val)
+{
+   unsigned long cr0 = svm-vcpu.arch.cr0;
+   bool ret = false;
+   u64 intercept;
+
+   intercept = svm-nested.intercept;
+
+   if (!is_guest_mode(svm-vcpu) ||
+   (!(intercept  (1ULL  INTERCEPT_SELECTIVE_CR0
+   return false;
+
+   cr0 = ~SVM_CR0_SELECTIVE_MASK;
+   val = ~SVM_CR0_SELECTIVE_MASK;
+
+   if (cr0 ^ val) {
+   svm-vmcb-control.exit_code = SVM_EXIT_CR0_SEL_WRITE;
+   ret = (nested_svm_exit_handled(svm) == NESTED_EXIT_DONE);
+   }
+
+   return ret;
+}
+
 #define CR_VALID (1ULL  63)
 
 static int cr_interception(struct vcpu_svm *svm)
@@ -2699,7 +2689,8 @@ static int cr_interception(struct vcpu_svm *svm)
val = kvm_register_read(svm-vcpu, reg);
switch (cr) {
case 0:
-   err = kvm_set_cr0(svm-vcpu, val);
+   if (!check_selective_cr0_intercepted(svm, val))
+   err = kvm_set_cr0(svm-vcpu, val);
break;
case 3:
err = kvm_set_cr3(svm-vcpu, val);
@@ -2744,23 +2735,6 @@ static int cr_interception(struct vcpu_svm *svm)
return 1;
 }
 
-static int cr0_write_interception(struct vcpu_svm *svm)
-{
-   struct kvm_vcpu *vcpu = svm-vcpu;
-   int r;
-
-   r = cr_interception(svm);
-
-   if (svm-nested.vmexit_rip) {
-   kvm_register_write(vcpu, VCPU_REGS_RIP, svm-nested.vmexit_rip);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, svm-nested.vmexit_rsp);
-   kvm_register_write(vcpu, VCPU_REGS_RAX, svm-nested.vmexit_rax);
-   svm-nested.vmexit_rip = 0;
-   }
-
-   return r;
-}
-
 static int dr_interception(struct vcpu_svm *svm)
 {
int reg, dr;
@@ -3048,7 +3022,7 @@ static int (*svm_exit_handlers[])(struct vcpu_svm *svm) = 
{
[SVM_EXIT_READ_CR4] = cr_interception,
[SVM_EXIT_READ_CR8] = cr_interception,
[SVM_EXIT_CR0_SEL_WRITE]= emulate_on_interception,
-   [SVM_EXIT_WRITE_CR0]= cr0_write_interception,
+   [SVM_EXIT_WRITE_CR0]= 

2.6.38.1 general protection fault

2011-03-25 Thread Tomasz Chmielewski
I got this on a 2.6.38.1 system which (I think) had some problem accessing 
guest image on a btrfs filesystem.


general protection fault:  [#1] SMP 
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0 
Modules linked in: ipt_MASQUERADE vhost_net kvm_intel kvm iptable_filter 
xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 
ip_tables x_tables bridge stp btrfs zlib_deflate crc32c libcrc32c coretemp 
f71882fg snd_pcm snd_timer snd soundcore i2c_i801 snd_page_alloc tpm_tis tpm 
tpm_bios pcspkr i7core_edac edac_core r8169 mii raid10 raid456 async_pq 
async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 ahci 
libahci sata_nv sata_sil sata_via 3w_9xxx 3w_ [last unloaded: 
scsi_wait_scan]

Pid: 10199, comm: kvm Not tainted 2.6.38.1 #1 MSI MS-7522/MSI X58 Pro-E 
(MS-7522)
RIP: 0010:[a02cae20]  [a02cae20] kvm_unmap_rmapp+0x20/0x70 
[kvm]
RSP: 0018:880508ee9bf0  EFLAGS: 00010202
RAX: 8805d6b087f8 RBX: 8805b7b1 RCX: 0050
RDX:  RSI: 8805d6b087f8 RDI: 8805b7b1
RBP: 880508ee9c10 R08: 8801061d4000 R09: c9001f19aff0
R10: 0030 R11:  R12: 
R13: c9001f19aff8 R14: 0060 R15: 8801061d4000
FS:  7f7ca25d6730() GS:8800bf40() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 00462b10 CR3: 0003ac47f000 CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process kvm (pid: 10199, threadinfo 880508ee8000, task 88001b5a5b00)
Stack:
 ffcf 000220ff 0001 8801061d4050
 880508ee9c80 a02c8a54 0030 a02cae00
  7f7c80a2b000 8805b7b1 0001
Call Trace:
 [a02c8a54] kvm_handle_hva+0xb4/0x170 [kvm]
 [a02cae00] ? kvm_unmap_rmapp+0x0/0x70 [kvm]
 [a02c8b27] kvm_unmap_hva+0x17/0x20 [kvm]
 [a02b1e72] kvm_mmu_notifier_invalidate_range_start+0x62/0xb0 [kvm]
 [8113ea11] __mmu_notifier_invalidate_range_start+0x51/0x70
 [8111e2c1] copy_page_range+0x3b1/0x460
 [812c5628] ? rb_insert_color+0x98/0x140
 [81060cdc] dup_mm+0x2fc/0x500
 [810617fe] copy_process+0x8be/0x11b0
 [81062165] do_fork+0x75/0x350
 [81177bcd] ? mntput+0x1d/0x40
 [8115b095] ? fput+0x1e5/0x270
 [815aa7f5] ? _raw_spin_lock_irq+0x15/0x20
 [81075141] ? sigprocmask+0x91/0x110
 [81014ab8] sys_clone+0x28/0x30
 [8100c3e3] stub_clone+0x13/0x20
 [8100c0c2] ? system_call_fastpath+0x16/0x1b
Code: 49 89 01 eb 91 66 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 
0f 1f 44 00 00 45 31 e4 48 89 fb 49 89 f5 eb 1d 0f 1f 00 f6 06 01 74 38 48 8b 
15 a4 66 02 00 48 89 df 41 bc 01 00 00 00 
RIP  [a02cae20] kvm_unmap_rmapp+0x20/0x70 [kvm]
 RSP 880508ee9bf0
---[ end trace 85201a339b7635fc ]---



-- 
Tomasz Chmielewski
http://wpkg.org
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop

2011-03-25 Thread Rusty Russell
On Thu, 24 Mar 2011 10:46:49 -0700, Shirley Ma mashi...@us.ibm.com wrote:
 On Thu, 2011-03-24 at 16:28 +0200, Michael S. Tsirkin wrote:
  Several other things I am looking at, wellcome cooperation:
  1. It's probably a good idea to update avail index
 immediately instead of upon kick: for RX
 this might help parallelism with the host.
 Is that possible to use the same idea for publishing last used idx to
 publish avail idx? Then we can save guest iowrite/exits.

Yes, it should be symmetrical.  Test independently of course, but the
same logic applies.

Thanks!
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] virtio_net: remove send completion interrupts and avoid TX queue overrun through packet drop

2011-03-25 Thread Rusty Russell
On Thu, 24 Mar 2011 16:28:22 +0200, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Thu, Mar 24, 2011 at 11:00:53AM +1030, Rusty Russell wrote:
   With simply removing the notify here, it does help the case when TX
   overrun hits too often, for example for 1K message size, the single
   TCP_STREAM performance improved from 2.xGb/s to 4.xGb/s.
  
  OK, we'll be getting rid of the kick on full, so please delete that on
  all benchmarks.
  
  Now, does the capacity check before add_buf() still win anything?  I
  can't see how unless we have some weird bug.
  
  Once we've sorted that out, we should look at the more radical change
  of publishing last_used and using that to intuit whether interrupts
  should be sent.  If we're not careful with ordering and barriers that
  could introduce more bugs.
 
 Right. I am working on this, and trying to be careful.
 One thing I'm in doubt about: sometimes we just want to
 disable interrupts. Should still use flags in that case?
 I thought that if we make the published index 0 to vq-num - 1,
 then a special value in the index field could disable
 interrupts completely. We could even reuse the space
 for the flags field to stick the index in. Too complex?

Making the index free-running avoids the full or empty confusion, plus
offers and extra debugging insight.

I think that if they really want to disable interrrupts, the flag should
still work, and when the client accepts the publish last_idx feature
they are accepting that interrupts may be omitted if they haven't
updated last_idx yet.

  Anything else on the optimization agenda I've missed?
  
  Thanks,
  Rusty.
 
 Several other things I am looking at, wellcome cooperation:
 1. It's probably a good idea to update avail index
immediately instead of upon kick: for RX
this might help parallelism with the host.

Yes, once we've done everything else, we should measure this.  It makes
sense.

 2. Adding an API to add a single buffer instead of s/g,
seems to help a bit.

This goes last, since it's kind of an ugly hack, but all internal to
Linux if we decide it's a win.

 3. For TX sometimes we free a single buffer, sometimes
a ton of them, which might make the transmit latency
vary. It's probably a good idea to limit this,
maybe free the minimal number possible to keep the device
going without stops, maybe free up to MAX_SKB_FRAGS.

This kind of heuristic is going to be quite variable depending on
circumstance, I think, so it's a lot of work to make sure we get it
right.

 4. If the ring is full, we now notify right after
the first entry is consumed. For TX this is suboptimal,
we should try delaying the interrupt on host.

Lguest already does that: only sends an interrupt when it's run out of
things to do.  It does update the used ring, however, as it processes
them.

This seems sensible to me, but needs to be measured separately as well.

 More ideas, would be nice if someone can try them out:
 1. We are allocating/freeing buffers for indirect descriptors.
Use some kind of pool instead?
And we could preformat part of the descriptor.

We need some poolish mechanism for virtio_blk too; perhaps an allocation
callback which both can use (virtio_blk to alloc from a pool, virtio_net
to recycle?).

Along similar lines to preformatting, we could actually try to prepend
the skb_vnet_hdr to the vnet data, and use a single descriptor for the
hdr and the first part of the packet.

Though IIRC, qemu's virtio barfs if the first descriptor isn't just the
hdr (barf...).

 2. I didn't have time to work on virtio2 ideas presented
at the kvm forum yet, any takers?

I didn't even attend.  But I think that virtio is moribund for the
moment; there wasn't enough demand and it's clear that there are
optimization unexplored in virtio1.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM, iSCSI and High Availability

2011-03-25 Thread Marcin M. Jessa

Hi.

Over the last several days I've been reading, asking questions, 
searching the Internet to find a viable HA stack for Ubuntu with KVM 
virtualization and shared iSCSI storage. And I'm nearly as confused as 
when I started.


Basically I'm trying to build a KVM enviroment with an iSCSI SAN and I'm 
not quite sure what approach to use for storing the virtual guests.
What I understand to get max speed I should install directly to iSCSI 
exported raw devices instead of backing disks.
I'm not sure creating many small LUNs, one for each of the guests is a 
good idea.
Would it be better to create just one big LUN and then use LVM to devide 
it and assign one chunk for each of the guests?
In the same setup I would also like to implement some kind of automatic 
failover so if one of the KVM hosts is down I could automatically move 
guests over to the other one. Or just perform live migration and move 
one of the guest over to a different host with spare capacity.

What would be the best approach to implement a solution like that?

Thanks in advance.


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM internal error. Suberror: 1 with ancient 2.4 kernel as guest

2011-03-25 Thread Wei Xu
Jiri  Avi:

I attached the patched I did for movq and movdqa emulation. Please note:
(1) I only implemented those two. Other instructions like addq may be
following same way.
(2) I use same guest_fx_image to hold value and fxsave/fxrstor to copy
to/from registers. This is not very efficient I admit.
Any suggestions let me know.

Thanks!
Wei Xu


On 3/21/11 2:23 PM, Wei Xu we...@cisco.com wrote:

 Avi and Jiri:
 
 I implemented emulation of movq(64bit) and movdqa(128 bit). If you guys still
 need it let me know and I can post somewhere...
 
 Wei Xu
 
 
 On 8/31/10 9:30 AM, Avi Kivity a...@redhat.com wrote:
 
 
   On 08/31/2010 06:49 PM, Avi Kivity wrote:
  On 08/31/2010 05:32 PM, Jiri Kosina wrote:
 (qemu) x/5i $eip
 0xc027a841:  movq   (%esi),%mm0
 0xc027a844:  movq   0x8(%esi),%mm1
 0xc027a848:  movq   0x10(%esi),%mm2
 0xc027a84c:  movq   0x18(%esi),%mm3
 0xc027a850:  movq   %mm0,(%edx)
 ===
 
 Is there any issue with emulating MMX?
 
 
 Yes.  MMX is not currently emulated.
 
 If there's a command line option to disable the use of MMX you can try
 it, otherwise wait for it to be implemented (or implement it
 yourself).  I'll try to do it for 2.6.37, but can't promise anything.
 
 You can also run qemu with -cpu qemu32,-mmx.  That will expose a cpu
 without mmx support; hopefully the guest kernel will see that and avoid
 mmx instructions.



mmx-kvm.patch
Description: Binary data


mmx-qemu.patch
Description: Binary data


[GIT PULL] More power management updates for 2.6.39

2011-03-25 Thread Rafael J. Wysocki
Hi Linus,

Please pull additional power management updates for 2.6.39 from:

git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6.git syscore

They make subsystems that x86 depends on use struct syscore_ops objects instead
of sysdevs for core power management, which reduces the code size and kernel
memory footprint a bit and sipmlifies the core suspend/resume and shutdown
code paths.


 arch/x86/Kconfig |1 +
 arch/x86/kernel/amd_iommu_init.c |   26 ++
 arch/x86/kernel/apic/apic.c  |   33 -
 arch/x86/kernel/apic/io_apic.c   |   97 ++
 arch/x86/kernel/cpu/mcheck/mce.c |   21 +
 arch/x86/kernel/cpu/mtrr/main.c  |   10 ++--
 arch/x86/kernel/i8237.c  |   30 +++-
 arch/x86/kernel/i8259.c  |   33 -
 arch/x86/kernel/microcode_core.c |   34 ++
 arch/x86/kernel/pci-gart_64.c|   32 +++--
 arch/x86/oprofile/nmi_int.c  |   44 +
 drivers/base/Kconfig |7 +++
 drivers/base/sys.c   |3 +-
 drivers/cpufreq/cpufreq.c|   66 ++
 drivers/pci/intel-iommu.c|   38 ---
 include/linux/device.h   |4 ++
 include/linux/pm.h   |   10 +++-
 include/linux/sysdev.h   |7 ++-
 kernel/time/timekeeping.c|   27 +++---
 virt/kvm/kvm_main.c  |   34 +++--
 20 files changed, 206 insertions(+), 351 deletions(-)

---

Rafael J. Wysocki (6):
  x86: Use syscore_ops instead of sysdev classes and sysdevs
  timekeeping: Use syscore_ops instead of sysdev class and sysdev
  PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
  KVM: Use syscore_ops instead of sysdev class and sysdev
  cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
  Introduce ARCH_NO_SYSDEV_OPS config option (v2)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html