Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-28 Thread Julian Stecklina
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/28/2014 04:16 PM, Konrad Rzeszutek Wilk wrote:
 At least 4.8 and probably older compilers compile this as
 intended. The point is that the standard does not guarantee the
 indented behavior, i.e. the code is wrong.
 
 Perhaps I misunderstood Jan's response but it sounded to me like 
 that the compiler was not adhering to the standard?

Compilers are free to generate code that breaks the current clock code
while staying within the standards. If the compiler ignores the
standard, all bets are off anyway...

 
 I can refer to this lwn article: 
 https://lwn.net/Articles/508991/
 
 The whole point of ACCESS_ONCE is to avoid time bombs like that.
 There are lots of place where ACCESS_ONCE is used in the kernel:
 
 http://lxr.free-electrons.com/ident?i=ACCESS_ONCE
 
 See for example the check_zero function here: 
 http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559
 
 
 In other words, you don't have a sample of 'bad' compiler code.

Nope.

Julian

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iEUEARECAAYFAlLn1ZgACgkQ2EtjUdW3H9n/DgCSAmrVCyxrs42aFFB3Ug+aw4kN
7wCgmEO4CbOI3BkOXKSorY91td9u7H8=
=fgrl
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-27 Thread Julian Stecklina
On 01/24/2014 07:08 PM, Konrad Rzeszutek Wilk wrote:
 On Thu, Jan 16, 2014 at 03:13:44PM +0100, Julian Stecklina wrote:
 The paravirtualized clock used in KVM and Xen uses a version field to
 allow the guest to see when the shared data structure is inconsistent.
 The code reads the version field twice (before and after the data
 structure is copied) and checks whether they haven't
 changed and that the lowest bit is not set. As the second access is not
 synchronized, the compiler could generate code that accesses version
 more than two times and you end up with inconsistent data.
 
 Could you paste in the code that the 'bad' compiler generates
 vs the compiler that generate 'good' code please?

At least 4.8 and probably older compilers compile this as intended. The
point is that the standard does not guarantee the indented behavior,
i.e. the code is wrong.

I can refer to this lwn article:
https://lwn.net/Articles/508991/

The whole point of ACCESS_ONCE is to avoid time bombs like that. There
are lots of place where ACCESS_ONCE is used in the kernel:

http://lxr.free-electrons.com/ident?i=ACCESS_ONCE

See for example the check_zero function here:
http://lxr.free-electrons.com/source/arch/x86/kernel/kvm.c#L559

Julian

 

 An example using pvclock_get_time_values:

 host starts updating data, sets src-version to 1
 guest reads src-version (1) and stores it into dst-version.
 guest copies inconsistent data
 guest reads src-version (1) and computes xor with dst-version.
 host finishes updating data and sets src-version to 2
 guest reads src-version (2) and checks whether lower bit is not set.
 while loop exits with inconsistent data!

 AFAICS the compiler is allowed to optimize the given code this way.

 Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
 ---
  arch/x86/kernel/pvclock.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
 index 42eb330..f62b41c 100644
 --- a/arch/x86/kernel/pvclock.c
 +++ b/arch/x86/kernel/pvclock.c
 @@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct 
 pvclock_shadow_time *shadow)
  static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
  struct pvclock_vcpu_time_info *src)
  {
 +u32 nversion;
 +
  do {
  dst-version = src-version;
  rmb();  /* fetch version before data */
 @@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
 pvclock_shadow_time *dst,
  dst-tsc_shift = src-tsc_shift;
  dst-flags = src-flags;
  rmb();  /* test version after fetching data */
 -} while ((src-version  1) || (dst-version != src-version));
 +nversion = ACCESS_ONCE(src-version);
 +} while ((nversion  1) || (dst-version != nversion));
  
  return dst-version;
  }
 @@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
  struct pvclock_vcpu_time_info *vcpu_time,
  struct timespec *ts)
  {
 -u32 version;
 +u32 version, nversion;
  u64 delta;
  struct timespec now;
  
 @@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
 *wall_clock,
  now.tv_sec  = wall_clock-sec;
  now.tv_nsec = wall_clock-nsec;
  rmb();  /* fetch time before checking version */
 -} while ((wall_clock-version  1) || (version != wall_clock-version));
 +nversion = ACCESS_ONCE(wall_clock-version);
 +} while ((nversion  1) || (version != nversion));
  
  delta = pvclock_clocksource_read(vcpu_time);/* time since system 
 boot */
  delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 -- 
 1.8.4.2


 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel




signature.asc
Description: OpenPGP digital signature


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-17 Thread Julian Stecklina
On 01/17/2014 10:41 AM, Jan Beulich wrote:
 The half about converting local variable accesses
 back to memory reads (i.e. eliding the local variable), however,
 is only a theoretical issue afaict: If a compiler really did this, I
 think there'd be far more places where this would hurt.

It happens rarely, but it does happen. Not fixing those issues is
inviting trouble with new compiler generations. And these issues are
terribly hard to debug.

Julian



signature.asc
Description: OpenPGP digital signature


[PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-16 Thread Julian Stecklina
The paravirtualized clock used in KVM and Xen uses a version field to
allow the guest to see when the shared data structure is inconsistent.
The code reads the version field twice (before and after the data
structure is copied) and checks whether they haven't
changed and that the lowest bit is not set. As the second access is not
synchronized, the compiler could generate code that accesses version
more than two times and you end up with inconsistent data.

An example using pvclock_get_time_values:

host starts updating data, sets src-version to 1
guest reads src-version (1) and stores it into dst-version.
guest copies inconsistent data
guest reads src-version (1) and computes xor with dst-version.
host finishes updating data and sets src-version to 2
guest reads src-version (2) and checks whether lower bit is not set.
while loop exits with inconsistent data!

AFAICS the compiler is allowed to optimize the given code this way.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kernel/pvclock.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..f62b41c 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -55,6 +55,8 @@ static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time 
*shadow)
 static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst,
struct pvclock_vcpu_time_info *src)
 {
+   u32 nversion;
+
do {
dst-version = src-version;
rmb();  /* fetch version before data */
@@ -64,7 +66,8 @@ static unsigned pvclock_get_time_values(struct 
pvclock_shadow_time *dst,
dst-tsc_shift = src-tsc_shift;
dst-flags = src-flags;
rmb();  /* test version after fetching data */
-   } while ((src-version  1) || (dst-version != src-version));
+   nversion = ACCESS_ONCE(src-version);
+   } while ((nversion  1) || (dst-version != nversion));
 
return dst-version;
 }
@@ -135,7 +138,7 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
*wall_clock,
struct pvclock_vcpu_time_info *vcpu_time,
struct timespec *ts)
 {
-   u32 version;
+   u32 version, nversion;
u64 delta;
struct timespec now;
 
@@ -146,7 +149,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock 
*wall_clock,
now.tv_sec  = wall_clock-sec;
now.tv_nsec = wall_clock-nsec;
rmb();  /* fetch time before checking version */
-   } while ((wall_clock-version  1) || (version != wall_clock-version));
+   nversion = ACCESS_ONCE(wall_clock-version);
+   } while ((nversion  1) || (version != nversion));
 
delta = pvclock_clocksource_read(vcpu_time);/* time since system 
boot */
delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
-- 
1.8.4.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH] KVM, XEN: Fix potential race in pvclock code

2014-01-16 Thread Julian Stecklina
On 01/16/2014 04:04 PM, Jan Beulich wrote:
 I don't think so - this would only be an issue if the conditions used
 | instead of ||. || implies a sequence point between evaluating the
 left and right sides, and the standard says: The presence of a
 sequence point between the evaluation of expressions A and B
 implies that every value computation and side effect associated
 with A is sequenced before every value computation and side
 effect associated with B.

This only applies to single-threaded code. Multithreaded code must be
data-race free for that to be true. See

https://lwn.net/Articles/508991/

 And even if there was a problem (i.e. my interpretation of the
 above being incorrect), I don't think you'd need ACCESS_ONCE()
 here: The same local variable can't have two different values in
 two different use sites when there was no intermediate
 assignment to it.

Same comment as above.

Julian



signature.asc
Description: OpenPGP digital signature


[PATCH v4] KVM: x86: Make register state after reset conform to specification

2012-12-05 Thread Julian Stecklina
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT.  SVM code properly initializes EDX.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/svm.c   | 14 ++
 arch/x86/kvm/vmx.c   |  7 ---
 arch/x86/kvm/x86.c   | 10 ++
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 52f6166..a20ecb5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -661,6 +661,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
} else
*eax = *ebx = *ecx = *edx = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dcb79527..d29d3cd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -20,6 +20,7 @@
 #include mmu.h
 #include kvm_cache_regs.h
 #include x86.h
+#include cpuid.h
 
 #include linux/module.h
 #include linux/mod_devicetable.h
@@ -1193,6 +1194,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   u32 dummy;
+   u32 eax = 1;
 
init_vmcb(svm);
 
@@ -1201,8 +1204,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
}
-   vcpu-arch.regs_avail = ~0;
-   vcpu-arch.regs_dirty = ~0;
+
+   kvm_cpuid(vcpu, eax, dummy, dummy, dummy);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
return 0;
 }
@@ -1259,10 +1263,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   err = fx_init(svm-vcpu);
-   if (err)
-   goto free_page4;
-
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
@@ -1271,8 +1271,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
return svm-vcpu;
 
-free_page4:
-   __free_page(hsave_page);
 free_page3:
__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fd2046..4ea3cb3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3934,8 +3934,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
 
-   vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP));
-
vmx-rmode.vm86_active = 0;
 
vmx-soft_vnmi_blocked = 0;
@@ -3947,10 +3945,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   ret = fx_init(vmx-vcpu);
-   if (ret != 0)
-   goto out;
-
vmx_segment_cache_clear(vmx);
 
seg_setup(VCPU_SREG_CS);
@@ -3991,7 +3985,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_GDTR_BASE, 0);
vmcs_write32(GUEST_GDTR_LIMIT, 0x);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bdaf29..57c76e8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6461,6 +6461,10 @@ static int kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 
kvm_pmu_reset(vcpu);
 
+   memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs));
+   vcpu-arch.regs_avail = ~0;
+   vcpu-arch.regs_dirty = ~0;
+
return kvm_x86_ops-vcpu_reset(vcpu);
 }
 
@@ -6629,11 +6633,17 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
if (!zalloc_cpumask_var(vcpu-arch.wbinvd_dirty_mask, GFP_KERNEL))
goto fail_free_mce_banks;
 
+   r = fx_init(vcpu);
+   if (r)
+   goto fail_free_wbinvd_dirty_mask;
+
vcpu-arch.ia32_tsc_adjust_msr = 0x0;
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
 
return 0;
+fail_free_wbinvd_dirty_mask:
+   free_cpumask_var(vcpu-arch.wbinvd_dirty_mask);
 fail_free_mce_banks:
kfree(vcpu-arch.mce_banks);
 fail_free_lapic:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5] KVM: x86: Make register state after reset conform to specification

2012-12-05 Thread Julian Stecklina
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT.  SVM code properly initializes EDX.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/svm.c   | 14 ++
 arch/x86/kvm/vmx.c   |  8 
 arch/x86/kvm/x86.c   | 10 ++
 4 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 52f6166..a20ecb5 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -661,6 +661,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
} else
*eax = *ebx = *ecx = *edx = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dcb79527..d29d3cd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -20,6 +20,7 @@
 #include mmu.h
 #include kvm_cache_regs.h
 #include x86.h
+#include cpuid.h
 
 #include linux/module.h
 #include linux/mod_devicetable.h
@@ -1193,6 +1194,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   u32 dummy;
+   u32 eax = 1;
 
init_vmcb(svm);
 
@@ -1201,8 +1204,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
}
-   vcpu-arch.regs_avail = ~0;
-   vcpu-arch.regs_dirty = ~0;
+
+   kvm_cpuid(vcpu, eax, dummy, dummy, dummy);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
return 0;
 }
@@ -1259,10 +1263,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
svm-asid_generation = 0;
init_vmcb(svm);
 
-   err = fx_init(svm-vcpu);
-   if (err)
-   goto free_page4;
-
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
@@ -1271,8 +1271,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
return svm-vcpu;
 
-free_page4:
-   __free_page(hsave_page);
 free_page3:
__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2fd2046..6adbad6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3934,8 +3934,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
 
-   vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP));
-
vmx-rmode.vm86_active = 0;
 
vmx-soft_vnmi_blocked = 0;
@@ -3947,10 +3945,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   ret = fx_init(vmx-vcpu);
-   if (ret != 0)
-   goto out;
-
vmx_segment_cache_clear(vmx);
 
seg_setup(VCPU_SREG_CS);
@@ -3991,7 +3985,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_GDTR_BASE, 0);
vmcs_write32(GUEST_GDTR_LIMIT, 0x);
@@ -4041,7 +4034,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
/* HACK: Don't enable emulation on guest boot/reset */
vmx-emulation_required = 0;
 
-out:
return ret;
 }
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3bdaf29..57c76e8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6461,6 +6461,10 @@ static int kvm_vcpu_reset(struct kvm_vcpu *vcpu)
 
kvm_pmu_reset(vcpu);
 
+   memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs));
+   vcpu-arch.regs_avail = ~0;
+   vcpu-arch.regs_dirty = ~0;
+
return kvm_x86_ops-vcpu_reset(vcpu);
 }
 
@@ -6629,11 +6633,17 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
if (!zalloc_cpumask_var(vcpu-arch.wbinvd_dirty_mask, GFP_KERNEL))
goto fail_free_mce_banks;
 
+   r = fx_init(vcpu);
+   if (r)
+   goto fail_free_wbinvd_dirty_mask;
+
vcpu-arch.ia32_tsc_adjust_msr = 0x0;
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
 
return 0;
+fail_free_wbinvd_dirty_mask:
+   free_cpumask_var(vcpu-arch.wbinvd_dirty_mask);
 fail_free_mce_banks:
kfree(vcpu-arch.mce_banks);
 fail_free_lapic:
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: x86: Make register state after reset conform to specification

2012-12-05 Thread Julian Stecklina
Thus spake Gleb Natapov g...@redhat.com:

 -ret = fx_init(vmx-vcpu);
 -if (ret != 0)
 -goto out;
 -
 Label out is now unused. Compiler complains.

5th time's the charm. ;) Patch is updated.

Julian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] KVM: x86: Make register state after reset conform to specification

2012-12-04 Thread Julian Stecklina
VMX behaves now as SVM wrt to FPU initialization. Code has been moved to
generic code path. General-purpose registers are now cleared on reset and
INIT.  SVM code properly initializes EDX.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/svm.c   | 14 ++
 arch/x86/kvm/vmx.c   |  7 ---
 arch/x86/kvm/x86.c   |  8 
 4 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0595f13..aa468c2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -659,6 +659,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
} else
*eax = *ebx = *ecx = *edx = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..6c50121 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -20,6 +20,7 @@
 #include mmu.h
 #include kvm_cache_regs.h
 #include x86.h
+#include cpuid.h
 
 #include linux/module.h
 #include linux/mod_devicetable.h
@@ -1190,6 +1191,8 @@ static void init_vmcb(struct vcpu_svm *svm)
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   u32 dummy;
+   u32 eax = 1;
 
init_vmcb(svm);
 
@@ -1198,8 +1201,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
}
-   vcpu-arch.regs_avail = ~0;
-   vcpu-arch.regs_dirty = ~0;
+
+   kvm_cpuid(vcpu, eax, dummy, dummy, dummy);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
return 0;
 }
@@ -1257,10 +1261,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
init_vmcb(svm);
kvm_write_tsc(svm-vcpu, 0);
 
-   err = fx_init(svm-vcpu);
-   if (err)
-   goto free_page4;
-
svm-vcpu.arch.apic_base = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(svm-vcpu))
svm-vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
@@ -1269,8 +1269,6 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
 
return svm-vcpu;
 
-free_page4:
-   __free_page(hsave_page);
 free_page3:
__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ff66a3b..85cecfe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3942,8 +3942,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
 
-   vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP));
-
vmx-rmode.vm86_active = 0;
 
vmx-soft_vnmi_blocked = 0;
@@ -3955,10 +3953,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   ret = fx_init(vmx-vcpu);
-   if (ret != 0)
-   goto out;
-
vmx_segment_cache_clear(vmx);
 
seg_setup(VCPU_SREG_CS);
@@ -3999,7 +3993,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_DR7, 0x400);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2966c84..2e031bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6066,6 +6066,10 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 
kvm_pmu_reset(vcpu);
 
+   memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs));
+   vcpu-arch.regs_avail = ~0;
+   vcpu-arch.regs_dirty = ~0;
+
return kvm_x86_ops-vcpu_reset(vcpu);
 }
 
@@ -6229,6 +6233,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
if (!zalloc_cpumask_var(vcpu-arch.wbinvd_dirty_mask, GFP_KERNEL))
goto fail_free_mce_banks;
 
+   r = fx_init(vcpu);
+   if (r)
+   goto fail_free_mce_banks;
+
kvm_async_pf_hash_reset(vcpu);
kvm_pmu_init(vcpu);
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM VMX: register state after reset violates spec

2012-12-03 Thread Julian Stecklina
Thus spake Gleb Natapov g...@redhat.com:

 On Thu, Nov 29, 2012 at 03:07:38PM +0100, Julian Stecklina wrote:
 Hello,
 
 we have noticed that at least on 3.6.8 with VMX after a VCPU has been
 reset via the INIT-SIPI-SIPI sequence its register state violates
 Intel's specification.
[...]
 Shouldn't vmx_vcpu_reset actively clear those registers? And from a
 quick glance at the SVM code the problem might exist there, too.
 
 It should, so why not move the fix to kvm_vcpu_reset() so it will work
 for both. Also what about R8-R15? Intel SDM says nothing about them in
 the section you mention, but in Volume 1 section 3.4.1.1 is says:
[...]
 I take it that they are undefined on the first transition to 64-bit mode
 too. AMD spec says that they should be zeroed on reset, so lets do that.
 Also SVM does not set EDX to correct value on reset.

I'll post a revised patch later today.

Julian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] KVM: x86: Make register state after reset conform to specification

2012-12-03 Thread Julian Stecklina
Floating point initialization is moved to kvm_arch_vcpu_init. Added general 
purpose
register clearing to the same function. SVM code now properly initializes
EDX.

Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/svm.c   |  7 +--
 arch/x86/kvm/vmx.c   |  7 +--
 arch/x86/kvm/x86.c   | 12 +++-
 4 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0595f13..aa468c2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -659,6 +659,7 @@ void kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, 
u32 *ecx, u32 *edx)
} else
*eax = *ebx = *ecx = *edx = 0;
 }
+EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 void kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index baead95..642213a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -20,6 +20,7 @@
 #include mmu.h
 #include kvm_cache_regs.h
 #include x86.h
+#include cpuid.h
 
 #include linux/module.h
 #include linux/mod_devicetable.h
@@ -1190,6 +1191,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
 {
struct vcpu_svm *svm = to_svm(vcpu);
+   u32 dummy, eax;
 
init_vmcb(svm);
 
@@ -1198,8 +1200,9 @@ static int svm_vcpu_reset(struct kvm_vcpu *vcpu)
svm-vmcb-save.cs.base = svm-vcpu.arch.sipi_vector  12;
svm-vmcb-save.cs.selector = svm-vcpu.arch.sipi_vector  8;
}
-   vcpu-arch.regs_avail = ~0;
-   vcpu-arch.regs_dirty = ~0;
+
+   kvm_cpuid(vcpu, eax, dummy, dummy, dummy);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
 
return 0;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ff66a3b..363fb03 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3942,7 +3942,7 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
u64 msr;
int ret;
 
-   vcpu-arch.regs_avail = ~((1  VCPU_REGS_RIP) | (1  VCPU_REGS_RSP));
+   vcpu-arch.regs_avail = ~(1  VCPU_REGS_RIP);
 
vmx-rmode.vm86_active = 0;
 
@@ -3955,10 +3955,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
msr |= MSR_IA32_APICBASE_BSP;
kvm_set_apic_base(vmx-vcpu, msr);
 
-   ret = fx_init(vmx-vcpu);
-   if (ret != 0)
-   goto out;
-
vmx_segment_cache_clear(vmx);
 
seg_setup(VCPU_SREG_CS);
@@ -3999,7 +3995,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
-   kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_DR7, 0x400);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2966c84..9640ea5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6045,6 +6045,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 {
+   int ret;
+
atomic_set(vcpu-arch.nmi_queued, 0);
vcpu-arch.nmi_pending = 0;
vcpu-arch.nmi_injected = false;
@@ -6066,7 +6068,15 @@ int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
 
kvm_pmu_reset(vcpu);
 
-   return kvm_x86_ops-vcpu_reset(vcpu);
+   memset(vcpu-arch.regs, 0, sizeof(vcpu-arch.regs));
+   vcpu-arch.regs_avail = ~0;
+   vcpu-arch.regs_dirty = ~0;
+
+   if ((ret = fx_init(vcpu)) != 0) goto out;
+
+   ret = kvm_x86_ops-vcpu_reset(vcpu);
+out:
+   return ret;
 }
 
 int kvm_arch_hardware_enable(void *garbage)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM VMX: register state after reset violates spec

2012-12-03 Thread Julian Stecklina
Thus spake Gleb Natapov g...@redhat.com:

 It should, so why not move the fix to kvm_vcpu_reset() so it will work
 for both. Also what about R8-R15? Intel SDM says nothing about them in
 the section you mention, but in Volume 1 section 3.4.1.1 is says:
[...]
 I take it that they are undefined on the first transition to 64-bit mode
 too. AMD spec says that they should be zeroed on reset, so lets do that.
 Also SVM does not set EDX to correct value on reset. It should be:

I have posted a new version of the patch taking your suggestions into
account. The VMX version is working for me. I could not test it on AMD
hardware, though.

Julian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM VMX: register state after reset violates spec

2012-11-29 Thread Julian Stecklina
Hello,

we have noticed that at least on 3.6.8 with VMX after a VCPU has been
reset via the INIT-SIPI-SIPI sequence its register state violates
Intel's specification.

Specifically for our case we see at the end of vmx_vcpu_reset the
following vcpu state:

regs_avail=ffef regs_dirty=00010010
EIP= EAX=06e8 EBX=0001 ECX=8001 EDX=0600
ESI=d238 EDI= EBP= ESP=

although EAX, EBX, ECX, ESI, EDI, EBP, ESP should _all_ be zero. See
http://download.intel.com/products/processor/manual/253668.pdf section
9.1.1 (page 9-2).

Shouldn't vmx_vcpu_reset actively clear those registers? And from a
quick glance at the SVM code the problem might exist there, too.

A workaround is to use qemu-kvm with -kvm-no-irqchip.

Julian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking latency - what to expect?

2012-11-29 Thread Julian Stecklina
Thus spake David Mohr damaili...@mcbf.net:

 * vm-vm (same host)22k

This number is in the same ballpark as what I am seeing on pretty much
the same hardware.

AFAICS, there is little you can do to the current virtio-virtio code
path that would make this substantially faster.

Julian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM VMX: Make register state after reset conform to specification.

2012-11-29 Thread Julian Stecklina
Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/vmx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ff66a3b..ec5a3b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3948,7 +3948,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-soft_vnmi_blocked = 0;
 
-   vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
msr = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
@@ -3999,6 +3998,14 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
+
+   kvm_register_write(vcpu, VCPU_REGS_RAX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RBX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RCX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, get_rdx_init_val());
+   kvm_register_write(vcpu, VCPU_REGS_RSI, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RDI, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RBP, 0);
kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_DR7, 0x400);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM VMX: Make register state after reset conform to specification.

2012-11-29 Thread Julian Stecklina
Signed-off-by: Julian Stecklina jstec...@os.inf.tu-dresden.de
---
 arch/x86/kvm/vmx.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ff66a3b..ec5a3b3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3948,7 +3948,6 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
 
vmx-soft_vnmi_blocked = 0;
 
-   vmx-vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(vmx-vcpu, 0);
msr = 0xfee0 | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(vmx-vcpu))
@@ -3999,6 +3998,14 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
kvm_rip_write(vcpu, 0xfff0);
else
kvm_rip_write(vcpu, 0);
+
+   kvm_register_write(vcpu, VCPU_REGS_RAX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RBX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RCX, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RDX, get_rdx_init_val());
+   kvm_register_write(vcpu, VCPU_REGS_RSI, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RDI, 0);
+   kvm_register_write(vcpu, VCPU_REGS_RBP, 0);
kvm_register_write(vcpu, VCPU_REGS_RSP, 0);
 
vmcs_writel(GUEST_DR7, 0x400);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Networking latency - what to expect?

2012-11-29 Thread Julian Stecklina
Thus spake David Mohr damaili...@mcbf.net:

 On 2012-11-29 07:48, Julian Stecklina wrote:
 Thus spake David Mohr damaili...@mcbf.net:

 * vm-vm (same host)22k

 This number is in the same ballpark as what I am seeing on pretty
 much
 the same hardware.

 AFAICS, there is little you can do to the current virtio-virtio code
 path that would make this substantially faster.

 Thanks for the feedback. Considering that it's better than the
 hardware network performance my main issue is actually the latency of
 communication between VMs on different hosts:
 * vm-vm (diff. hosts)   7k

 It is obvious that there is a lot more going on compared to same host
 communication, but only ~30% of the performance when the network
 hardware should not be slowing it down (too) much?

You are probably better of using SR-IOV NICs with PCI passthrough in
this case.

Maybe someone can comment on whether virtual interrupt delivery and
posted interrupts[1] are already usable. The first one should help in
either the virtio and SR-IOV scenarios, the latter only applies to
SR-IOV (and PCI passthrough in general).

Julian

[1] http://www.spinics.net/lists/kvm/msg82762.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Enable VMX-related bits in MSR_IA32_FEATURE_CONTROL.

2012-03-06 Thread Julian Stecklina
The spec (Vol 3C, Chapter 34.1) regarding the IA32_FEATURE_CONTROL MSR says 
Therefore the lock bit must be set after configuring support for Intel 
Virtualization Technology and prior to transferring control to an option ROM or 
the OS. and regarding bit 2: This bit enables VMX for system executive that 
do not require SMX.

Signed-off-by: Julian Stecklina j...@alien8.de
---
 arch/x86/kvm/vmx.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ea7678..aef1e5b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2007,7 +2007,12 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
msr_index, u64 *pdata)
 
switch (msr_index) {
case MSR_IA32_FEATURE_CONTROL:
-   *pdata = 0;
+/*
+ * If nested VMX is enabled, set the lock bit (bit 0)
+ * and the Enable VMX outside SMX bit (bit 2) in the
+ * FEATURE_CONTROL MSR.
+ */
+   *pdata = nested_vmx_allowed(vcpu) ? 0x5 : 0;
break;
case MSR_IA32_VMX_BASIC:
/*
-- 
1.7.9.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Enable VMX-related bits in MSR_IA32_FEATURE_CONTROL.

2012-03-06 Thread Julian Stecklina
Thus spake Avi Kivity a...@redhat.com:

 On 03/06/2012 05:02 PM, Julian Stecklina wrote:
 The spec (Vol 3C, Chapter 34.1) regarding the IA32_FEATURE_CONTROL MSR says 
 Therefore the lock bit must be set after configuring support for Intel 
 Virtualization Technology and prior to transferring control to an option ROM 
 or the OS. and regarding bit 2: This bit enables VMX for system executive 
 that do not require SMX.

 Signed-off-by: Julian Stecklina j...@alien8.de
 ---
  arch/x86/kvm/vmx.c |7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

 diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
 index 4ea7678..aef1e5b 100644
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2007,7 +2007,12 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 
 msr_index, u64 *pdata)
  
  switch (msr_index) {
  case MSR_IA32_FEATURE_CONTROL:
 -*pdata = 0;
 +/*
 + * If nested VMX is enabled, set the lock bit (bit 0)
 + * and the Enable VMX outside SMX bit (bit 2) in the
 + * FEATURE_CONTROL MSR.
 + */
 +*pdata = nested_vmx_allowed(vcpu) ? 0x5 : 0;
  break;
  case MSR_IA32_VMX_BASIC:
  /*

 The way I read it, it should be done by the guest, not the host.

It should be done by the BIOS, before it hands off control to the OS
kernel. The question is whether you want to emulate this MSR at this
level or just set it to sane values when nested VMX should be enabled
and be happy. :)

Regards, Julian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Enable VMX-related bits in MSR_IA32_FEATURE_CONTROL.

2012-03-06 Thread Julian Stecklina
Am Dienstag, den 06.03.2012, 17:47 +0200 schrieb Nadav Har'El:
 On Tue, Mar 06, 2012, Avi Kivity wrote about Re: [PATCH] KVM: Enable 
 VMX-related bits in MSR_IA32_FEATURE_CONTROL.:
 case MSR_IA32_FEATURE_CONTROL:
   - *pdata = 0;
   +/*
   + * If nested VMX is enabled, set the lock bit (bit 0)
   + * and the Enable VMX outside SMX bit (bit 2) in the
   + * FEATURE_CONTROL MSR.
   + */
   + *pdata = nested_vmx_allowed(vcpu) ? 0x5 : 0;
 
 0x5 can be written as FEATURE_CONTROL_LOCKED |
   FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX

Nice. Didn't know those constants. Next time I'll try harder to find
those. :)

 
 break;
 case MSR_IA32_VMX_BASIC:
 /*
  
  The way I read it, it should be done by the guest, not the host.
 
 This is also how I understand it. Check out KVM's own hardware_enable()
 to see how a guest does turn on these bits before using VMXON - it
 doesn't need to rely on the BIOS to have done it earlier (the BIOS, can,
 however, prevent the guest from doing this by setting only the lock bit).

After looking through the code (vmx_disabled_by_bios), it seems that KVM
doesn't bother with FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX if
FEATURE_CONTROL_LOCKED is not set. It seems like our kernel should do
the same. Sorry for the noise.

 What is true, however, is that the existing code is probably incomplete
 in three ways (see section 20.7 of the SDM):
 
  1. It always returns 0 for this MSR, even if the guest previously set it
 to something else. 
 
  2. handle_vmon() does not check the previous setting of this MSR.
 If the guest (or its BIOS) doesn't set both FEATURE_CONTROL_LOCKED
 and FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX, it should get a
 #GP on an attempt to VMXON. This will allow L1's BIOS to disable
 nested VMX if it wishes (not that I think this is a very useful
 usecase...).
 
  3. vmx_set_vmx_msr to MSR_IA32_FEATURE_CONTROL should throw a #GP if
 FEATURE_CONTROL_LOCKED is on.
 
 I'll create a patch to do this shortly.

This is greatly appreciated!

Regards, Julian



signature.asc
Description: This is a digitally signed message part


Re: [PATCH] KVM: Don't mistreat edge-triggered INIT IPI as INIT de-assert. (LAPIC)

2012-01-13 Thread Julian Stecklina
Am Freitag, den 13.01.2012, 08:52 -0200 schrieb Marcelo Tosatti:
 On Thu, Jan 12, 2012 at 06:07:51PM +0100, Julian Stecklina wrote:
  Am Freitag, den 23.12.2011, 08:40 -0200 schrieb Marcelo Tosatti:
   On Mon, Dec 19, 2011 at 02:14:27AM +0100, Julian Stecklina wrote:
If the guest programs an IPI with level=0 (de-assert) and trig_mode=0 
(edge),
it is erroneously treated as INIT de-assert and ignored, but to quote 
the
spec: For this delivery mode [INIT de-assert], the level flag must be 
set to
0 and trigger mode flag to 1.
   
   Yes, the implementation ignores INIT de-assert. Quoting the spec:
   
   (INIT Level De-assert) (Not supported in the Pentium 4 and Intel Xeon
   processors.)
   
   Your patch below is not improving the implementation to be closer to the
   spec: it'll trigger the INIT state initialization with trig_mode == 0
   (which is not in accordance with your spec quote above).
  
  After reading the spec again and consulting with the guy who wrote the
  code triggering this, it seems the whole if (level) in the code path
  below is superfluous. 
 
 No. Look at whats inside if (level): the mp_state assignment is the
 internal implementation of delivers an INIT request to the target
 processor.
 
 According to the spec, the INIT level de-assert 
 
 Sends a synchronization message to all the local APICs in the system
 to set their arbitration IDs (stored in their Arb ID registers) to the
 values of their APIC IDs (see Section 10.7, “System and APIC Bus
 Arbitration”).
 
 So if you remove the if (level) check, INIT de-assert will be emulated
 as INIT!

Newer processors don't support INIT level de-assert and will interpret
this as INIT. Without the if (level) check, KVM would behave in the
same way, thus not breaking code that actually runs on real processors.

For processors that still supported INIT level de-assert: If you look
into older specs (243192), you read:

101 (INIT) ... INIT is treated as an edge triggered interrupt even if
programmed otherwise.

101 (INIT Level De-assert) The trigger mode must also be set to 1 and
level mode to 0.

This means that if you don't set trigger mode to 1, you will get an INIT
instead of INIT level de-assert. This is where the current code in KVM
is wrong. So with my original patch, KVM would behave like the old spec
mandates (check for trigger mode). With the if (level) check removed,
it would behave like recent processors. Either way, the current code is
bogus.

Regards, Julian


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] KVM: Don't mistreat edge-triggered INIT IPI as INIT de-assert. (LAPIC)

2012-01-12 Thread Julian Stecklina
Am Freitag, den 23.12.2011, 08:40 -0200 schrieb Marcelo Tosatti:
 On Mon, Dec 19, 2011 at 02:14:27AM +0100, Julian Stecklina wrote:
  If the guest programs an IPI with level=0 (de-assert) and trig_mode=0 
  (edge),
  it is erroneously treated as INIT de-assert and ignored, but to quote the
  spec: For this delivery mode [INIT de-assert], the level flag must be set 
  to
  0 and trigger mode flag to 1.
 
 Yes, the implementation ignores INIT de-assert. Quoting the spec:
 
 (INIT Level De-assert) (Not supported in the Pentium 4 and Intel Xeon
 processors.)
 
 Your patch below is not improving the implementation to be closer to the
 spec: it'll trigger the INIT state initialization with trig_mode == 0
 (which is not in accordance with your spec quote above).

After reading the spec again and consulting with the guy who wrote the
code triggering this, it seems the whole if (level) in the code path
below is superfluous. Regarding level my spec says:

This flag has no meaning in Pentium 4 and Intel Xeon processors, and
will always be issued as a 1.

Judging from Section 10.2 this means every CPU where the LAPICs
communicate over the system bus (everything after and including the P4).
Thus the if can be removed, since its condition is always true
(regardless of what the programmer actually programmed into the level
field).

If there is no objection, I'll submit a revised patch.

  Signed-off-by: Julian Stecklina j...@alien8.de
  ---
   arch/x86/kvm/lapic.c |2 +-
   1 files changed, 1 insertions(+), 1 deletions(-)
  
  diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
  index a7f3e65..260770d 100644
  --- a/arch/x86/kvm/lapic.c
  +++ b/arch/x86/kvm/lapic.c
  @@ -433,7 +433,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, 
  int delivery_mode,
  break;
   
  case APIC_DM_INIT:
  -   if (level) {
  +   if (!trig_mode || level) {
  result = 1;
  vcpu-arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
  kvm_make_request(KVM_REQ_EVENT, vcpu);
  -- 
  1.7.7.4
  

Regards, Julian



signature.asc
Description: This is a digitally signed message part


Re: [PATCH] KVM: Don't mistreat edge-triggered INIT IPI as INIT de-assert. (LAPIC)

2011-12-23 Thread Julian Stecklina
On Fr, 2011-12-23 at 08:40 -0200, Marcelo Tosatti wrote:
 On Mon, Dec 19, 2011 at 02:14:27AM +0100, Julian Stecklina wrote:
  If the guest programs an IPI with level=0 (de-assert) and trig_mode=0 
  (edge),
  it is erroneously treated as INIT de-assert and ignored, but to quote the
  spec: For this delivery mode [INIT de-assert], the level flag must be set 
  to
  0 and trigger mode flag to 1.
 
 Yes, the implementation ignores INIT de-assert. Quoting the spec:
 
 (INIT Level De-assert) (Not supported in the Pentium 4 and Intel Xeon
 processors.)
 
 Your patch below is not improving the implementation to be closer to the
 spec: it'll trigger the INIT state initialization with trig_mode == 0
 (which is not in accordance with your spec quote above).

I think our code that triggers this does weird things. Let me check
that. Until then ignore that patch and sorry for the noise. :) At least
it seems as if real hardware is interpreting the spec in a slightly
different way...

Regards, Julian



signature.asc
Description: This is a digitally signed message part


[PATCH] KVM: Don't mistreat edge-triggered INIT IPI as INIT de-assert. (LAPIC)

2011-12-18 Thread Julian Stecklina
If the guest programs an IPI with level=0 (de-assert) and trig_mode=0 (edge),
it is erroneously treated as INIT de-assert and ignored, but to quote the
spec: For this delivery mode [INIT de-assert], the level flag must be set to
0 and trigger mode flag to 1.

Signed-off-by: Julian Stecklina j...@alien8.de
---
 arch/x86/kvm/lapic.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a7f3e65..260770d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -433,7 +433,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int 
delivery_mode,
break;
 
case APIC_DM_INIT:
-   if (level) {
+   if (!trig_mode || level) {
result = 1;
vcpu-arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
kvm_make_request(KVM_REQ_EVENT, vcpu);
-- 
1.7.7.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow aligned byte and word writes to IOAPIC registers.

2011-11-23 Thread Julian Stecklina
On Mi, 2011-11-23 at 12:47 +0200, Avi Kivity wrote:
 On 11/22/2011 06:09 PM, Julian Stecklina wrote:
  This fixes byte accesses to IOAPIC_REG_SELECT as mandated by at least the
  ICH10 and Intel Series 5 chipset specs. It also makes ioapic_mmio_write
  consistent with ioapic_mmio_read, which also allows byte and word accesses.
 
 
 Your patch indents with spaces, while Linux uses tabs for indents.

True. I'll post a revised version in a minute. I think this is my second
patch to Linux in total, so I am slowly getting there... Thanks for the
patience.

Regards, Julian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] KVM: Allow aligned byte and word writes to IOAPIC registers.

2011-11-23 Thread Julian Stecklina
This fixes byte accesses to IOAPIC_REG_SELECT as mandated by at least the
ICH10 and Intel Series 5 chipset specs. It also makes ioapic_mmio_write
consistent with ioapic_mmio_read, which also allows byte and word accesses.

Signed-off-by: Julian Stecklina j...@alien8.de
---
 virt/kvm/ioapic.c |   15 ---
 1 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3eed61e..71e2253 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -332,9 +332,18 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
 (void*)addr, len, val);
ASSERT(!(addr  0xf));  /* check alignment */
 
-   if (len == 4 || len == 8)
+   switch (len) {
+   case 8:
+   case 4:
data = *(u32 *) val;
-   else {
+   break;
+   case 2:
+   data = *(u16 *) val;
+   break;
+   case 1:
+   data = *(u8  *) val;
+   break;
+   default:
printk(KERN_WARNING ioapic: Unsupported size %d\n, len);
return 0;
}
@@ -343,7 +352,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
spin_lock(ioapic-lock);
switch (addr) {
case IOAPIC_REG_SELECT:
-   ioapic-ioregsel = data;
+   ioapic-ioregsel = data  0xFF; /* 8-bit register */
break;
 
case IOAPIC_REG_WINDOW:
-- 
1.7.7.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


IOAPIC doesn't handle byte writes

2011-11-22 Thread Julian Stecklina
Hello,

KVM emulates an IOAPIC that doesn't handle byte writes to its
IOAPIC_REG_SELECT register, although for example the ICH10 spec[1]
clearly states that this is an 8-bit register. See
http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-10-family-datasheet.pdf
 Table 13-4 on page 433.

The code in question is:

http://git.kernel.org/?p=virt/kvm/kvm.git;a=blob;f=virt/kvm/ioapic.c;h=3eed61eb48675a63dd1f31b0095217ab6bc5f646;hb=HEAD#l323

This breaks IOAPIC code in OSes that adhere to the spec.

I've created a small testcase[1]:

$ qemu-kvm -serial stdio -kernel ioapic
[26303.961804] ioapic: Unsupported size 1
IOAPIC ID  
[26303.970466] ioapic: Unsupported size 1
IOAPIC VER 
Done
qemu: terminating on signal 2
$ qemu-kvm  -no-kvm-irqchip -serial stdio -kernel ioapic 
IOAPIC ID  
IOAPIC VER 00170011
Done
qemu: terminating on signal 2

Expected behavior is that the IOAPIC register is not read as zero with
KVM irqchip emulation.

I would file a bug, but the kernel bugzilla seems to be down at the
moment.

Regards, Julian

[1] http://os.inf.tu-dresden.de/~jsteckli/tmp/ioapic



signature.asc
Description: This is a digitally signed message part


Re: IOAPIC doesn't handle byte writes

2011-11-22 Thread Julian Stecklina
Hello,

Avi Kivity wrote:
 Care to post a patch instead?

Sure. Never hacked KVM, though. Is there a particular reason why the
void *val argument to ioapic_mmio_read/_write is only dereferenced when
ioapic-lock is not held?

Regards, Julian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Allow aligned byte and word writes to IOAPIC registers.

2011-11-22 Thread Julian Stecklina
This fixes byte accesses to IOAPIC_REG_SELECT as mandated by at least the
ICH10 and Intel Series 5 chipset specs. It also makes ioapic_mmio_write
consistent with ioapic_mmio_read, which also allows byte and word accesses.

Signed-off-by: Julian Stecklina j...@alien8.de
---
 virt/kvm/ioapic.c |   17 +
 1 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 3eed61e..e94ef6ba 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -332,9 +332,18 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
 (void*)addr, len, val);
ASSERT(!(addr  0xf));  /* check alignment */
 
-   if (len == 4 || len == 8)
-   data = *(u32 *) val;
-   else {
+switch (len) {
+case 8:
+case 4:
+data = *(u32 *) val;
+break;
+case 2:
+data = *(u16 *) val;
+break;
+case 1:
+data = *(u8  *) val;
+break;
+default:
printk(KERN_WARNING ioapic: Unsupported size %d\n, len);
return 0;
}
@@ -343,7 +352,7 @@ static int ioapic_mmio_write(struct kvm_io_device *this, 
gpa_t addr, int len,
spin_lock(ioapic-lock);
switch (addr) {
case IOAPIC_REG_SELECT:
-   ioapic-ioregsel = data;
+   ioapic-ioregsel = data  0xFF; /* 8-bit register */
break;
 
case IOAPIC_REG_WINDOW:
-- 
1.7.7.3


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/15] Add NMI injection support to SVM.

2009-04-19 Thread Julian Stecklina
Gleb Natapov g...@redhat.com writes:

 On Sun, Apr 19, 2009 at 04:43:12PM +0300, Avi Kivity wrote:
 Gleb Natapov wrote:
 On Sun, Apr 19, 2009 at 04:40:51PM +0300, Avi Kivity wrote:
   
 Gleb Natapov wrote:
 
 It's OK as a first step towards correct NMI emulation. Additionally, you
 could enable the IRQ window interception in case the is an NMI pending.
 The resulting behavior should then much like the VNMI mask emulation for
 vmx.

 
 Yeah, but the question is if IRQ windows is already opened will exit
 happens before or after IRET.
 
 You mean if the NMI handler enabled interrupts?

 
 Yes.

   

 Then the guest deserves whatever it gets...

 I suspect windows may do this since it uses NMI for task switching.

Could you elaborate on that? How/why does it use NMIs for task
switching?

Regards,
-- 
Julian Stecklina

The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners - Ernst Jan Plugge

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] Fix task switching.

2009-03-30 Thread Julian Stecklina
Gleb Natapov g...@redhat.com writes:

 On Mon, Mar 30, 2009 at 06:35:05PM +0200, Jan Kiszka wrote:
  Haven't tried. I wrote my own tests for task switching. How can I check it?
  
 
 There is a test case attached to Julian's sourceforge-reported bug:
 
 https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599
 
 Works for me.

Then the patches should be fine (at least for me *g*).

Regards,
-- 
Julian Stecklina

The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners - Ernst Jan Plugge

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Improvements for task switching

2009-03-23 Thread Julian Stecklina
Kohl, Bernhard (NSN - DE/Munich) bernhard.k...@nsn.com writes:

 Jan Kiszka Wrote:
[...]
 OK, after the discussion has finished, I will submit separate patches.

Is there any progress on this? I've been using this patch for several
days now with no ill effects.

The patch fixes Bug 2681442 for me:
https://sourceforge.net/tracker/?func=detailatid=893831aid=2681442group_id=180599

Regards,
-- 
Julian Stecklina

The day Microsoft makes something that doesn't suck is probably the day
they start making vacuum cleaners - Ernst Jan Plugge

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html