Re: Error in frreing hugepages with preemption enabled

2013-11-29 Thread Alexander Graf

On 29.11.2013, at 05:38, Bharat Bhushan bharat.bhus...@freescale.com wrote:

 Hi Alex,
 
 I am running KVM guest with host kernel having CONFIG_PREEMPT enabled. With 
 allocated pages things seems to work fine but I uses hugepages for guest I 
 see below prints when quit from qemu.
 
 (qemu) QEMU waiting for connection on: telnet:0.0.0.0:,server
 qemu-system-ppc64: pci_add_option_rom: failed to find romfile efi-virtio.rom
 q
 debug_smp_processor_id: 15 callbacks suppressed
 BUG: using smp_processor_id() in preemptible [] code: 
 qemu-system-ppc/2504
 caller is .free_hugepd_range+0xb0/0x21c
 CPU: 1 PID: 2504 Comm: qemu-system-ppc Not tainted 3.12.0-rc3-07733-gabf4907 
 #175
 Call Trace:
 [c000fb433400] [c0007d38] .show_stack+0x7c/0x1cc (unreliable)
 [c000fb4334d0] [c05e8ce0] .dump_stack+0x9c/0xf4
 [c000fb433560] [c02de5ec] .debug_smp_processor_id+0x108/0x11c
 [c000fb4335f0] [c0025e10] .free_hugepd_range+0xb0/0x21c
 [c000fb433680] [c00265bc] .hugetlb_free_pgd_range+0x2c8/0x3b0
 [c000fb4337a0] [c00e428c] .free_pgtables+0x14c/0x158
 [c000fb433840] [c00ef320] .exit_mmap+0xec/0x194
 [c000fb433960] [c004d780] .mmput+0x64/0x124
 [c000fb4339e0] [c0051f40] .do_exit+0x29c/0x9c8
 [c000fb433ae0] [c00527c8] .do_group_exit+0x50/0xc4
 [c000fb433b70] [c00606a0] .get_signal_to_deliver+0x21c/0x5d8
 [c000fb433c70] [c0009b08] .do_signal+0x54/0x278
 [c000fb433db0] [c0009e50] .do_notify_resume+0x64/0x78
 [c000fb433e30] [cb44] .ret_from_except_lite+0x70/0x74
 
 
 This mean that free_hugepd_range() must be called with preemption enabled.

with preemption disabled.

 I tried below change and this seems to work fine (I am not having expertise 
 in this area so not sure this is correct way)

Not sure - the scope looks odd to me. Let's ask Andrea - I'm sure he knows what 
to do :).


Alex

 
 diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
 index d67db4b..6bf8459 100644
 --- a/arch/powerpc/mm/hugetlbpage.c
 +++ b/arch/powerpc/mm/hugetlbpage.c
 @@ -563,8 +563,10 @@ static void hugetlb_free_pmd_range(struct mmu_gather 
 *tlb, pud_t *pud,
 */
next = addr + (1  hugepd_shift(*(hugepd_t *)pmd));
 #endif
 +   preempt_disable();
free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT,
  addr, next, floor, ceiling);
 +   preempt_enable();
} while (addr = next, addr != end);
 
start = PUD_MASK;
 
 
 Thanks
 -Bharat
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug

2013-11-29 Thread Otártics András

Hi,
   I think I found a bug that I do not want to post on any public 
bugtrackers of KVM.

  Please let me know a mail to write to.

Thank you in advance!
Bests,
András


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug

2013-11-29 Thread Gleb Natapov
On Fri, Nov 29, 2013 at 12:28:06PM +0100, Otártics András wrote:
 Hi,
I think I found a bug that I do not want to post on any public
 bugtrackers of KVM.
   Please let me know a mail to write to.
 
Look up KVM maintainers in MAINTAINERS file in the root of the Linux source
tree and email to them.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Intel MPX support at Qemu side

2013-11-29 Thread Liu, Jinsong
Intel has released new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf

These 2 patches are prepare patches at qemu side to support Intel MPX feature.
PATCH 1/2 is to fix a minor bug which parse cpuid leaf 0x0d;
PATCH 2/2 expose cpuid leaf (0xd, 3) and (0xd, 4) to guest, and fix ebx and 
re-calculate ecx of cpuid leaf (0xd, 0);

Thanks,
Jinsong

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] target-i386: fix cpuid leaf 0x0d

2013-11-29 Thread Liu, Jinsong
From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001
From: Liu Jinsong jinsong@intel.com
Date: Fri, 22 Nov 2013 00:24:16 +0800
Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d

Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx.

Signed-off-by: Liu Jinsong jinsong@intel.com
---
 target-i386/cpu.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 864c80e..544b57f 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -335,7 +335,7 @@ typedef struct ExtSaveArea {

 static const ExtSaveArea ext_save_areas[] = {
 [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
-.offset = 0x100, .size = 0x240 },
+.offset = 0x240, .size = 0x100 },
 };

 const char *get_register_name_32(unsigned int reg)
@@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 const ExtSaveArea *esa = ext_save_areas[count];
 if ((env-features[esa-feature]  esa-bits) == esa-bits 
 (kvm_mask  (1  count)) != 0) {
-*eax = esa-offset;
-*ebx = esa-size;
+*eax = esa-size;
+*ebx = esa-offset;
 }
 }
 break;
--
1.7.1--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] target-i386: Intel MPX support

2013-11-29 Thread Liu, Jinsong
From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00 2001
From: root root@ljs.(none)
Date: Fri, 22 Nov 2013 00:24:35 +0800
Subject: [PATCH 2/2] target-i386: Intel MPX support

Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest.
Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0).

Signed-off-by: Liu Jinsong jinsong@intel.com
---
 target-i386/cpu.c |   34 ++
 target-i386/cpu.h |1 +
 2 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index 544b57f..7d04f28 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -330,12 +330,12 @@ X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = {

 typedef struct ExtSaveArea {
 uint32_t feature, bits;
-uint32_t offset, size;
 } ExtSaveArea;

 static const ExtSaveArea ext_save_areas[] = {
-[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
-.offset = 0x240, .size = 0x100 },
+[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX },
+[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX },
+[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX },
 };

 const char *get_register_name_32(unsigned int reg)
@@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX)  32);

 if (count == 0) {
-*ecx = 0x240;
+*ebx = *ecx = 0x240;
 for (i = 2; i  ARRAY_SIZE(ext_save_areas); i++) {
+uint32_t offset, size;
 const ExtSaveArea *esa = ext_save_areas[i];
+
 if ((env-features[esa-feature]  esa-bits) == esa-bits 
 (kvm_mask  (1  i)) != 0) {
 if (i  32) {
@@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
uint32_t count,
 } else {
 *edx |= 1  (i - 32);
 }
-*ecx = MAX(*ecx, esa-offset + esa-size);
+
+size = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX);
+offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
+*ecx = MAX(*ecx, offset + size);
+
+/*
+ * EBX here just in order to
+ * 1. keep compatible with old qemu version, take AVX
+ *into account;
+ * 2. keep compatible with old kernel version. Currently
+ *KVM has bug when expose cpuid 0xd to guest (include
+ *static value when guest booting and dynamic value
+ *when guest enables XCR0 features. EBX here can
+ *co-work with old buggy and new updated KVM, keep
+ *same value independent to CPU and kernel version.
+ */
+if (i == 2)
+*ebx = MAX(*ebx, offset + size);
 }
 }
 *eax |= kvm_mask  (XSTATE_FP | XSTATE_SSE);
-*ebx = *ecx;
 } else if (count == 1) {
 *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX);
 } else if (count  ARRAY_SIZE(ext_save_areas)) {
 const ExtSaveArea *esa = ext_save_areas[count];
 if ((env-features[esa-feature]  esa-bits) == esa-bits 
 (kvm_mask  (1  count)) != 0) {
-*eax = esa-size;
-*ebx = esa-offset;
+*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX);
+*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX);
 }
 }
 break;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index ea373e8..9a838d1 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
 #define CPUID_7_0_EBX_ERMS (1  9)
 #define CPUID_7_0_EBX_INVPCID  (1  10)
 #define CPUID_7_0_EBX_RTM  (1  11)
+#define CPUID_7_0_EBX_MPX  (1  14)
 #define CPUID_7_0_EBX_RDSEED   (1  18)
 #define CPUID_7_0_EBX_ADX  (1  19)
 #define CPUID_7_0_EBX_SMAP (1  20)
--
1.7.1--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] target-i386: Intel MPX support

2013-11-29 Thread Paolo Bonzini
Il 29/11/2013 14:17, Liu, Jinsong ha scritto:
 From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00 2001
 From: root root@ljs.(none)
 Date: Fri, 22 Nov 2013 00:24:35 +0800
 Subject: [PATCH 2/2] target-i386: Intel MPX support
 
 Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest.
 Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0).

There is no reason to get the size and offset from the host.  Peter
Anvin confirmed that the sizes and offsets will never change (as should
be the case for migration to work across different CPU versions).  In
fact, the size and offset is documented for every XSAVE feature except
MPX in the copy I have of the Intel documentation.

Please get the size and offset from the documentation, if it has been
updated, or from a real host, and hardcode them in QEMU.

Paolo

 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/cpu.c |   34 ++
  target-i386/cpu.h |1 +
  2 files changed, 27 insertions(+), 8 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 544b57f..7d04f28 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -330,12 +330,12 @@ X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = {
 
  typedef struct ExtSaveArea {
  uint32_t feature, bits;
 -uint32_t offset, size;
  } ExtSaveArea;
 
  static const ExtSaveArea ext_save_areas[] = {
 -[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
 -.offset = 0x240, .size = 0x100 },
 +[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX },
 +[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX },
 +[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX },
  };
 
  const char *get_register_name_32(unsigned int reg)
 @@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX)  32);
 
  if (count == 0) {
 -*ecx = 0x240;
 +*ebx = *ecx = 0x240;
  for (i = 2; i  ARRAY_SIZE(ext_save_areas); i++) {
 +uint32_t offset, size;
  const ExtSaveArea *esa = ext_save_areas[i];
 +
  if ((env-features[esa-feature]  esa-bits) == esa-bits 
  (kvm_mask  (1  i)) != 0) {
  if (i  32) {
 @@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  } else {
  *edx |= 1  (i - 32);
  }
 -*ecx = MAX(*ecx, esa-offset + esa-size);
 +
 +size = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX);
 +offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX);
 +*ecx = MAX(*ecx, offset + size);
 +
 +/*
 + * EBX here just in order to
 + * 1. keep compatible with old qemu version, take AVX
 + *into account;
 + * 2. keep compatible with old kernel version. Currently
 + *KVM has bug when expose cpuid 0xd to guest (include
 + *static value when guest booting and dynamic value
 + *when guest enables XCR0 features. EBX here can
 + *co-work with old buggy and new updated KVM, keep
 + *same value independent to CPU and kernel version.
 + */
 +if (i == 2)
 +*ebx = MAX(*ebx, offset + size);
  }
  }
  *eax |= kvm_mask  (XSTATE_FP | XSTATE_SSE);
 -*ebx = *ecx;
  } else if (count == 1) {
  *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX);
  } else if (count  ARRAY_SIZE(ext_save_areas)) {
  const ExtSaveArea *esa = ext_save_areas[count];
  if ((env-features[esa-feature]  esa-bits) == esa-bits 
  (kvm_mask  (1  count)) != 0) {
 -*eax = esa-size;
 -*ebx = esa-offset;
 +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX);
 +*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX);
  }
  }
  break;
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index ea373e8..9a838d1 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
  #define CPUID_7_0_EBX_ERMS (1  9)
  #define CPUID_7_0_EBX_INVPCID  (1  10)
  #define CPUID_7_0_EBX_RTM  (1  11)
 +#define CPUID_7_0_EBX_MPX  (1  14)
  #define CPUID_7_0_EBX_RDSEED   (1  18)
  #define CPUID_7_0_EBX_ADX  (1  19)
  #define CPUID_7_0_EBX_SMAP (1  20)
 --
 1.7.1
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d

2013-11-29 Thread Paolo Bonzini
Il 29/11/2013 14:15, Liu, Jinsong ha scritto:
 From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 22 Nov 2013 00:24:16 +0800
 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d
 
 Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx.

There is no visible change right (the two hunks cancel each other)?
Since you will have to post a v2, please make this explicit in the
commit message.

Thanks,

Paolo

 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/cpu.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 864c80e..544b57f 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -335,7 +335,7 @@ typedef struct ExtSaveArea {
 
  static const ExtSaveArea ext_save_areas[] = {
  [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
 -.offset = 0x100, .size = 0x240 },
 +.offset = 0x240, .size = 0x100 },
  };
 
  const char *get_register_name_32(unsigned int reg)
 @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
 uint32_t count,
  const ExtSaveArea *esa = ext_save_areas[count];
  if ((env-features[esa-feature]  esa-bits) == esa-bits 
  (kvm_mask  (1  count)) != 0) {
 -*eax = esa-offset;
 -*ebx = esa-size;
 +*eax = esa-size;
 +*ebx = esa-offset;
  }
  }
  break;
 --
 1.7.1
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Intel MPX support for KVM

2013-11-29 Thread Liu, Jinsong
Intel has released new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf

These patches are to support Intel MPX for KVM.
PATCH 1/4 is some MPX definiation;
PATCH 2/4 re-calculate cpuid(0xd,0) EBX;
PATCH 3/4 enable Intel Memory Protection Extension for guest;
PATCH 4/4 is Intel MPX vmx and msr handle;

These pathes are based on my ex-colleague Xudong's work, now I help him to push 
these patches.

Thanks,
Jinsong--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] X86: Intel MPX definiation

2013-11-29 Thread Liu, Jinsong
From 3a1a011100b38a275d8c95468c12c483e316bb15 Mon Sep 17 00:00:00 2001
From: Liu Jinsong jinsong@intel.com
Date: Fri, 29 Nov 2013 01:27:00 +0800
Subject: [PATCH 1/4] X86: Intel MPX definiation

Signed-off-by: Xudong Hao xudong@intel.com
Reviewed-by: Liu Jinsong jinsong@intel.com
---
 arch/x86/include/asm/cpufeature.h |2 ++
 arch/x86/include/asm/xsave.h  |5 -
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 89270b4..1b00b01 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -216,6 +216,7 @@
 #define X86_FEATURE_ERMS   (9*32+ 9) /* Enhanced REP MOVSB/STOSB */
 #define X86_FEATURE_INVPCID(9*32+10) /* Invalidate Processor Context ID */
 #define X86_FEATURE_RTM(9*32+11) /* Restricted Transactional 
Memory */
+#define X86_FEATURE_MPX(9*32+14) /* Memory Protection 
Extension */
 #define X86_FEATURE_RDSEED (9*32+18) /* The RDSEED instruction */
 #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX 
instructions */
 #define X86_FEATURE_SMAP   (9*32+20) /* Supervisor Mode Access Prevention 
*/
@@ -330,6 +331,7 @@ extern const char * const x86_power_flags[32];
 #define cpu_has_perfctr_l2 boot_cpu_has(X86_FEATURE_PERFCTR_L2)
 #define cpu_has_cx8boot_cpu_has(X86_FEATURE_CX8)
 #define cpu_has_cx16   boot_cpu_has(X86_FEATURE_CX16)
+#define cpu_has_mpxboot_cpu_has(X86_FEATURE_MPX)
 #define cpu_has_eager_fpu  boot_cpu_has(X86_FEATURE_EAGER_FPU)
 #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT)
 
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 0415cda..d3e3ea5 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -9,6 +9,8 @@
 #define XSTATE_FP  0x1
 #define XSTATE_SSE 0x2
 #define XSTATE_YMM 0x4
+#define XSTATE_BNDREGS 0x8
+#define XSTATE_BNDCSR  0x10
 
 #define XSTATE_FPSSE   (XSTATE_FP | XSTATE_SSE)
 
@@ -23,7 +25,8 @@
 /*
  * These are the features that the OS can handle currently.
  */
-#define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | \
+   XSTATE_BNDREGS | XSTATE_BNDCSR)
 
 #ifdef CONFIG_X86_64
 #define REX_PREFIX 0x48, 
-- 
1.7.1


0001-X86-Intel-MPX-definiation.patch
Description: 0001-X86-Intel-MPX-definiation.patch


[PATCH 2/4] KVM/X86: Fix xsave cpuid exposing bug

2013-11-29 Thread Liu, Jinsong
From b060be65e466291c91963e58c4880ec614d0b294 Mon Sep 17 00:00:00 2001
From: Liu Jinsong jinsong@intel.com
Date: Fri, 29 Nov 2013 01:27:53 +0800
Subject: [PATCH 2/4] KVM/X86: Fix xsave cpuid exposing bug

EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable.
Bit 63 of XCR0 is reserved for future expansion.

Signed-off-by: Liu Jinsong jinsong@intel.com
---
 arch/x86/include/asm/xsave.h |2 ++
 arch/x86/kvm/cpuid.c |4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index d3e3ea5..6120e74 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -13,6 +13,8 @@
 #define XSTATE_BNDCSR  0x10
 
 #define XSTATE_FPSSE   (XSTATE_FP | XSTATE_SSE)
+/* Bit 63 of XCR0 is reserved for future expansion */
+#define XSTATE_EXTEND_MASK (~(XSTATE_FPSSE | (1  63)))
 
 #define FXSAVE_SIZE512
 
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index c697625..a8ce117 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,7 +28,7 @@ static u32 xstate_required_size(u64 xstate_bv)
int feature_bit = 0;
u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET;
 
-   xstate_bv = ~XSTATE_FPSSE;
+   xstate_bv = XSTATE_EXTEND_MASK;
while (xstate_bv) {
if (xstate_bv  0x1) {
u32 eax, ebx, ecx, edx;
@@ -74,7 +74,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
vcpu-arch.guest_supported_xcr0 =
(best-eax | ((u64)best-edx  32)) 
host_xcr0  KVM_SUPPORTED_XCR0;
-   vcpu-arch.guest_xstate_size =
+   vcpu-arch.guest_xstate_size = best-ebx =
xstate_required_size(vcpu-arch.guest_supported_xcr0);
}
 
-- 
1.7.1


0002-KVM-X86-Fix-xsave-cpuid-exposing-bug.patch
Description: 0002-KVM-X86-Fix-xsave-cpuid-exposing-bug.patch


[PATCH 3/4] KVM/X86: Enable Intel MPX for guest

2013-11-29 Thread Liu, Jinsong
From 11ae33723027c7b8e53a8c109f127800d7f0ad6e Mon Sep 17 00:00:00 2001
From: Liu Jinsong jinsong@intel.com
Date: Fri, 29 Nov 2013 01:28:19 +0800
Subject: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest

Enable Intel Memory Protection Extension for guest.

Signed-off-by: Xudong Hao xudong@intel.com
Reviewed-by: Liu Jinsong jinsong@intel.com
---
 arch/x86/kvm/cpuid.c |4 ++--
 arch/x86/kvm/x86.c   |   14 --
 arch/x86/kvm/x86.h   |3 ++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index a8ce117..e30d4ce 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
(best-eax | ((u64)best-edx  32)) 
host_xcr0  KVM_SUPPORTED_XCR0;
vcpu-arch.guest_xstate_size = best-ebx =
-   xstate_required_size(vcpu-arch.guest_supported_xcr0);
+   xstate_required_size(vcpu-arch.xcr0);
}
 
kvm_pmu_cpuid_update(vcpu);
@@ -303,7 +303,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 
*entry, u32 function,
/* cpuid 7.0.ebx */
const u32 kvm_supported_word9_x86_features =
F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) |
-   F(BMI2) | F(ERMS) | f_invpcid | F(RTM);
+   F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | F(MPX);
 
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ef1ba..6e38698 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -576,13 +576,13 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu)
 
 int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
 {
-   u64 xcr0;
+   u64 xcr0 = xcr;
+   u64 old_xcr0 = vcpu-arch.xcr0;
u64 valid_bits;
 
/* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now  */
if (index != XCR_XFEATURE_ENABLED_MASK)
return 1;
-   xcr0 = xcr;
if (!(xcr0  XSTATE_FP))
return 1;
if ((xcr0  XSTATE_YMM)  !(xcr0  XSTATE_SSE))
@@ -597,8 +597,15 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 
xcr)
if (xcr0  ~valid_bits)
return 1;
 
+   if ((!(xcr0  XSTATE_BNDREGS)) != (!(xcr0  XSTATE_BNDCSR)))
+   return 1;
+
kvm_put_guest_xcr0(vcpu);
vcpu-arch.xcr0 = xcr0;
+
+   if ((xcr0 ^ old_xcr0)  XSTATE_EXTEND_MASK)
+   kvm_update_cpuid(vcpu);
+
return 0;
 }
 
@@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
 
kvm_x86_ops-prepare_guest_switch(vcpu);
+   if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) 
+   (vcpu-arch.xcr0  (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR)))
+   kvm_x86_ops-fpu_activate(vcpu);
if (vcpu-fpu_active)
kvm_load_guest_fpu(vcpu);
kvm_load_guest_xcr0(vcpu);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 587fb9e..985e40e 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt 
*ctxt,
gva_t addr, void *val, unsigned int bytes,
struct x86_exception *exception);
 
-#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
+#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \
+   | XSTATE_BNDREGS | XSTATE_BNDCSR)
 extern u64 host_xcr0;
 
 extern struct static_key kvm_no_apic_vcpu;
-- 
1.7.1


0003-KVM-X86-Enable-Intel-MPX-for-guest.patch
Description: 0003-KVM-X86-Enable-Intel-MPX-for-guest.patch


[PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle

2013-11-29 Thread Liu, Jinsong
From 7532bdffe9f74db65f6eff733cb227a66bef932e Mon Sep 17 00:00:00 2001
From: Liu Jinsong jinsong@intel.com
Date: Sat, 30 Nov 2013 00:27:02 +0800
Subject: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle

Signed-off-by: Xudong Hao xudong@intel.com
Reviewed-by: Liu Jinsong jinsong@intel.com
---
 arch/x86/include/asm/vmx.h|2 ++
 arch/x86/include/uapi/asm/msr-index.h |1 +
 arch/x86/kvm/vmx.c|   13 +++--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 966502d..1bf4681 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -85,6 +85,7 @@
 #define VM_EXIT_SAVE_IA32_EFER  0x0010
 #define VM_EXIT_LOAD_IA32_EFER  0x0020
 #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER   0x0040
+#define VM_EXIT_CLEAR_BNDCFGS   0x0080
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR  0x00036dff
 
@@ -95,6 +96,7 @@
 #define VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL 0x2000
 #define VM_ENTRY_LOAD_IA32_PAT 0x4000
 #define VM_ENTRY_LOAD_IA32_EFER 0x8000
+#define VM_ENTRY_LOAD_BNDCFGS   0x0001
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x11ff
 
diff --git a/arch/x86/include/uapi/asm/msr-index.h 
b/arch/x86/include/uapi/asm/msr-index.h
index 37813b5..2a418c4 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -294,6 +294,7 @@
 #define MSR_SMI_COUNT  0x0034
 #define MSR_IA32_FEATURE_CONTROL0x003a
 #define MSR_IA32_TSC_ADJUST 0x003b
+#define MSR_IA32_BNDCFGS   0x0d90
 
 #define FEATURE_CONTROL_LOCKED (10)
 #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX   (11)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b2fe1c2..aa23edf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -439,6 +439,7 @@ struct vcpu_vmx {
 #endif
int   gs_ldt_reload_needed;
int   fs_reload_needed;
+   u64   msr_host_bndcfgs;
} host_state;
struct {
int vm86_active;
@@ -1647,6 +1648,8 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu)
if (is_long_mode(vmx-vcpu))
wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base);
 #endif
+   if (cpu_has_mpx)
+   rdmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs);
for (i = 0; i  vmx-save_nmsrs; ++i)
kvm_set_shared_msr(vmx-guest_msrs[i].index,
   vmx-guest_msrs[i].data,
@@ -1684,6 +1687,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
 #ifdef CONFIG_X86_64
wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
 #endif
+   if (cpu_has_mpx)
+   wrmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs);
/*
 * If the FPU is not active (through the host task or
 * the guest vcpu), then restore the cr0.TS bit.
@@ -2800,7 +2805,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
 #endif
opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
-   VM_EXIT_ACK_INTR_ON_EXIT;
+   VM_EXIT_ACK_INTR_ON_EXIT | VM_EXIT_CLEAR_BNDCFGS;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
_vmexit_control)  0)
return -EIO;
@@ -2817,7 +2822,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
_pin_based_exec_control = ~PIN_BASED_POSTED_INTR;
 
min = 0;
-   opt = VM_ENTRY_LOAD_IA32_PAT;
+   opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
_vmentry_control)  0)
return -EIO;
@@ -8636,6 +8641,10 @@ static int __init vmx_init(void)
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
+   if ((vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_BNDCFGS) 
+   (vmcs_config.vmexit_ctrl  VM_EXIT_CLEAR_BNDCFGS))
+   vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true);
+
memcpy(vmx_msr_bitmap_legacy_x2apic,
vmx_msr_bitmap_legacy, PAGE_SIZE);
memcpy(vmx_msr_bitmap_longmode_x2apic,
-- 
1.7.1


0004-KVM-X86-Intel-MPX-vmx-and-msr-handle.patch
Description: 0004-KVM-X86-Intel-MPX-vmx-and-msr-handle.patch


Re: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest

2013-11-29 Thread Paolo Bonzini
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index a8ce117..e30d4ce 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
   (best-eax | ((u64)best-edx  32)) 
   host_xcr0  KVM_SUPPORTED_XCR0;
   vcpu-arch.guest_xstate_size = best-ebx =
 - xstate_required_size(vcpu-arch.guest_supported_xcr0);
 + xstate_required_size(vcpu-arch.xcr0);
   }
  
   kvm_pmu_cpuid_update(vcpu);
 ...
   kvm_put_guest_xcr0(vcpu);
   vcpu-arch.xcr0 = xcr0;
 +
 + if ((xcr0 ^ old_xcr0)  XSTATE_EXTEND_MASK)
 + kvm_update_cpuid(vcpu);
 +
   return 0;
  }

These hunks should be part of the previous patch.

 @@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
   preempt_disable();
  
   kvm_x86_ops-prepare_guest_switch(vcpu);
 + if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) 

Shouldn't be necessary, setting xcr0 fails unless OSXSAVE=1.

 + (vcpu-arch.xcr0  (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR)))
 + kvm_x86_ops-fpu_activate(vcpu);

Can you explain this?

   if (vcpu-fpu_active)
   kvm_load_guest_fpu(vcpu);
   kvm_load_guest_xcr0(vcpu);
 diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
 index 587fb9e..985e40e 100644
 --- a/arch/x86/kvm/x86.h
 +++ b/arch/x86/kvm/x86.h
 @@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt 
 *ctxt,
   gva_t addr, void *val, unsigned int bytes,
   struct x86_exception *exception);
  
 -#define KVM_SUPPORTED_XCR0   (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 +#define KVM_SUPPORTED_XCR0   (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \
 + | XSTATE_BNDREGS | XSTATE_BNDCSR)
  extern u64 host_xcr0;
  
  extern struct static_key kvm_no_apic_vcpu;
 

Otherwise looks straightforward.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d

2013-11-29 Thread Liu, Jinsong
Paolo Bonzini wrote:
 Il 29/11/2013 14:15, Liu, Jinsong ha scritto:
 From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00
 2001 
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 22 Nov 2013 00:24:16 +0800
 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d
 
 Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx.
 
 There is no visible change right (the two hunks cancel each other)?
 Since you will have to post a v2, please make this explicit in the
 commit message.
 

OK, will add explicit commit message, or, drop this patch if needed.

Thanks,
Jinsong

 
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/cpu.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 864c80e..544b57f 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -335,7 +335,7 @@ typedef struct ExtSaveArea {
 
  static const ExtSaveArea ext_save_areas[] = {
  [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
 -.offset = 0x100, .size = 0x240 },
 +.offset = 0x240, .size = 0x100 },
  };
 
  const char *get_register_name_32(unsigned int reg)
 @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
  index, uint32_t count, const ExtSaveArea *esa =
  ext_save_areas[count]; if
  ((env-features[esa-feature]  esa-bits) ==
 esa-bits  (kvm_mask  (1  count)) != 0) { 
 -*eax = esa-offset;
 -*ebx = esa-size;
 +*eax = esa-size;
 +*ebx = esa-offset;
  }
  }
  break;
 --
 1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] target-i386: Intel MPX support

2013-11-29 Thread Liu, Jinsong
Paolo Bonzini wrote:
 Il 29/11/2013 14:17, Liu, Jinsong ha scritto:
 From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00
 2001 
 From: root root@ljs.(none)
 Date: Fri, 22 Nov 2013 00:24:35 +0800
 Subject: [PATCH 2/2] target-i386: Intel MPX support
 
 Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest.
 Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0).
 
 There is no reason to get the size and offset from the host.  Peter
 Anvin confirmed that the sizes and offsets will never change (as
 should be the case for migration to work across different CPU
 versions).  In fact, the size and offset is documented for every
 XSAVE feature except MPX in the copy I have of the Intel
 documentation. 

If the sizes and offsets will never change, what's the bad effect of getting 
them from host?

 
 Please get the size and offset from the documentation, if it has been
 updated, or from a real host, and hardcode them in QEMU.
 

Hmm, the problem is what I get is not equal to real test :(
For example, I was told XSTATE_BNDCSR_SIZE is 0x40, but real test shows it's 
0x10.

Maybe getting from real h/w is not bad than hardcode it?

Thanks
Jinsong

 
 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/cpu.c |   34 ++ 
  target-i386/cpu.h |1 + 2 files changed, 27 insertions(+), 8
 deletions(-) 
 
 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 544b57f..7d04f28 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -330,12 +330,12 @@ X86RegisterInfo32
 x86_reg_info_32[CPU_NB_REGS32] = { 
 
  typedef struct ExtSaveArea {
  uint32_t feature, bits;
 -uint32_t offset, size;
  } ExtSaveArea;
 
  static const ExtSaveArea ext_save_areas[] = {
 -[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
 -.offset = 0x240, .size = 0x100 },
 +[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX },
 +[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX },
 +[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, 
 }; 
 
  const char *get_register_name_32(unsigned int reg)
 @@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
  index, uint32_t count,
 ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX)  32); 
 
  if (count == 0) {
 -*ecx = 0x240;
 +*ebx = *ecx = 0x240;
  for (i = 2; i  ARRAY_SIZE(ext_save_areas); i++) {
 +uint32_t offset, size;
  const ExtSaveArea *esa = ext_save_areas[i]; +
  if ((env-features[esa-feature]  esa-bits) ==
  esa-bits  (kvm_mask  (1  i)) != 0) {
  if (i  32) {
 @@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env,
  uint32_t index, uint32_t count,
  } else { *edx |= 1  (i - 32); }
 -*ecx = MAX(*ecx, esa-offset + esa-size); +
 +size = kvm_arch_get_supported_cpuid(s, 0xd, i,
 R_EAX); +offset =
 kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX); +   
 *ecx = MAX(*ecx, offset + size); + +/*
 + * EBX here just in order to
 + * 1. keep compatible with old qemu version,
 take AVX + *into account;
 + * 2. keep compatible with old kernel version.
 Currently + *KVM has bug when expose cpuid
 0xd to guest (include + *static value when
 guest booting and dynamic value + *when
 guest enables XCR0 features. EBX here can + *   
 co-work with old buggy and new updated KVM, keep +  
 *same value independent to CPU and kernel version. +
 */ +if (i == 2) +*ebx =
  MAX(*ebx, offset + size);  } }
  *eax |= kvm_mask  (XSTATE_FP | XSTATE_SSE); - 
  *ebx = *ecx; } else if (count == 1) {
  *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX);
  } else if (count  ARRAY_SIZE(ext_save_areas)) {
  const ExtSaveArea *esa = ext_save_areas[count];
  if ((env-features[esa-feature]  esa-bits) ==
  esa-bits  (kvm_mask  (1  count)) != 0) {
 -*eax = esa-size;
 -*ebx = esa-offset;
 +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count,
 R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0xd,
  count, R_EBX);  } }
  break;
 diff --git a/target-i386/cpu.h b/target-i386/cpu.h
 index ea373e8..9a838d1 100644
 --- a/target-i386/cpu.h
 +++ b/target-i386/cpu.h
 @@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS];
  #define CPUID_7_0_EBX_ERMS (1  9)
  #define CPUID_7_0_EBX_INVPCID  (1  10)
  #define CPUID_7_0_EBX_RTM  (1  11)
 +#define 

RE: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest

2013-11-29 Thread Liu, Jinsong
Paolo Bonzini wrote:
 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
 index a8ce117..e30d4ce 100644
 --- a/arch/x86/kvm/cpuid.c
 +++ b/arch/x86/kvm/cpuid.c
 @@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu)
  (best-eax | ((u64)best-edx  32)) 
  host_xcr0  KVM_SUPPORTED_XCR0;
  vcpu-arch.guest_xstate_size = best-ebx =
 -xstate_required_size(vcpu-arch.guest_supported_xcr0);
 +xstate_required_size(vcpu-arch.xcr0);
  }
 
  kvm_pmu_cpuid_update(vcpu);
 ...
  kvm_put_guest_xcr0(vcpu);
  vcpu-arch.xcr0 = xcr0;
 +
 +if ((xcr0 ^ old_xcr0)  XSTATE_EXTEND_MASK)
 +kvm_update_cpuid(vcpu);
 +
  return 0;
  }
 
 These hunks should be part of the previous patch.
 
 @@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu
 *vcpu)   preempt_disable(); 
 
  kvm_x86_ops-prepare_guest_switch(vcpu);
 +if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) 
 
 Shouldn't be necessary, setting xcr0 fails unless OSXSAVE=1.
 
 +(vcpu-arch.xcr0  (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR)))
 +kvm_x86_ops-fpu_activate(vcpu);
 
 Can you explain this?

No, in fact I'm also some wondering about it, but per it has been tested, I 
didn't update this code.
I will double check and drop it if need (or, maybe Xudong can elaborate more?)

Thanks,
Jinsong

 
  if (vcpu-fpu_active)
  kvm_load_guest_fpu(vcpu);
  kvm_load_guest_xcr0(vcpu);
 diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
 index 587fb9e..985e40e 100644
 --- a/arch/x86/kvm/x86.h
 +++ b/arch/x86/kvm/x86.h
 @@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct
  x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes,
  struct x86_exception *exception);
 
 -#define KVM_SUPPORTED_XCR0  (XSTATE_FP | XSTATE_SSE | XSTATE_YMM)
 +#define KVM_SUPPORTED_XCR0  (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \
 +| XSTATE_BNDREGS | XSTATE_BNDCSR)
  extern u64 host_xcr0;
 
  extern struct static_key kvm_no_apic_vcpu;
 
 
 Otherwise looks straightforward.

Thanks, will update per your comments.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle

2013-11-29 Thread Paolo Bonzini
Il 29/11/2013 14:44, Liu, Jinsong ha scritto:
 From 7532bdffe9f74db65f6eff733cb227a66bef932e Mon Sep 17 00:00:00 2001
 From: Liu Jinsong jinsong@intel.com
 Date: Sat, 30 Nov 2013 00:27:02 +0800
 Subject: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle
 
 Signed-off-by: Xudong Hao xudong@intel.com
 Reviewed-by: Liu Jinsong jinsong@intel.com

This should be a Signed-off-by since you are posting the patch, not
Xudong Hao (same for patches 1 and 3).

I think this patch should go before the previous one.

Also see below.

 + if (cpu_has_mpx)
 + rdmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs);
   for (i = 0; i  vmx-save_nmsrs; ++i)
   kvm_set_shared_msr(vmx-guest_msrs[i].index,
  vmx-guest_msrs[i].data,
 @@ -1684,6 +1687,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx)
  #ifdef CONFIG_X86_64
   wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base);
  #endif
 + if (cpu_has_mpx)
 + wrmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs);

This should be if (vmx-host_state.msr_host_bndcfgs), so that no WRMSR
is done if host_bndcfgs == 0 (which includes the case of !cpu_has_mpx).

 @@ -2800,7 +2805,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
 *vmcs_conf)
   min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
  #endif
   opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT |
 - VM_EXIT_ACK_INTR_ON_EXIT;
 + VM_EXIT_ACK_INTR_ON_EXIT | VM_EXIT_CLEAR_BNDCFGS;
 - opt = VM_ENTRY_LOAD_IA32_PAT;
 + opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS;
   if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
   _vmentry_control)  0)
   return -EIO;
 @@ -8636,6 +8641,10 @@ static int __init vmx_init(void)
   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false);
   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false);
   vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false);
 + if ((vmcs_config.vmentry_ctrl  VM_ENTRY_LOAD_BNDCFGS) 
 + (vmcs_config.vmexit_ctrl  VM_EXIT_CLEAR_BNDCFGS))
 + vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true);

Why only disable it in that case?  If the two bits are guaranteed to be present
for if (cpu_has_mpx), please use if (cpu_has_mpx) here or perhaps make it
unconditional.  If the two bits might not be there, you need to emulate them
using add_atomic_switch_msr.

Thanks,

Paolo

   memcpy(vmx_msr_bitmap_legacy_x2apic,
   vmx_msr_bitmap_legacy, PAGE_SIZE);
   memcpy(vmx_msr_bitmap_longmode_x2apic,
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d

2013-11-29 Thread Paolo Bonzini
Il 29/11/2013 15:46, Liu, Jinsong ha scritto:
 Paolo Bonzini wrote:
 Il 29/11/2013 14:15, Liu, Jinsong ha scritto:
 From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00
 2001 
 From: Liu Jinsong jinsong@intel.com
 Date: Fri, 22 Nov 2013 00:24:16 +0800
 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d

 Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx.

 There is no visible change right (the two hunks cancel each other)?
 Since you will have to post a v2, please make this explicit in the
 commit message.

 
 OK, will add explicit commit message, or, drop this patch if needed.

The patch is correct, so keep it please.  However, mention in the commit
message that the CPUID values were valid even before this patch.

Also, the QEMU side needs support for transferring the state in and out
of KVM (kvm_put_xsave, kvm_get_xsave).  On top of this you can add
migration support using a new subsection of vmstate_cpu.

Thanks!

Paolo

 Thanks,
 Jinsong
 

 Signed-off-by: Liu Jinsong jinsong@intel.com
 ---
  target-i386/cpu.c |6 +++---
  1 files changed, 3 insertions(+), 3 deletions(-)

 diff --git a/target-i386/cpu.c b/target-i386/cpu.c
 index 864c80e..544b57f 100644
 --- a/target-i386/cpu.c
 +++ b/target-i386/cpu.c
 @@ -335,7 +335,7 @@ typedef struct ExtSaveArea {

  static const ExtSaveArea ext_save_areas[] = {
  [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX,
 -.offset = 0x100, .size = 0x240 },
 +.offset = 0x240, .size = 0x100 },
  };

  const char *get_register_name_32(unsigned int reg)
 @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t
  index, uint32_t count, const ExtSaveArea *esa =
  ext_save_areas[count]; if
  ((env-features[esa-feature]  esa-bits) ==
 esa-bits  (kvm_mask  (1  count)) != 0) { 
 -*eax = esa-offset;
 -*ebx = esa-size;
 +*eax = esa-size;
 +*ebx = esa-offset;
  }
  }
  break;
 --
 1.7.1
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] target-i386: Intel MPX support

2013-11-29 Thread Paolo Bonzini
Il 29/11/2013 15:50, Liu, Jinsong ha scritto:
  There is no reason to get the size and offset from the host.  Peter
  Anvin confirmed that the sizes and offsets will never change (as
  should be the case for migration to work across different CPU
  versions).  In fact, the size and offset is documented for every
  XSAVE feature except MPX in the copy I have of the Intel
  documentation. 
 
 If the sizes and offsets will never change, what's the bad effect of getting 
 them from host?

In case TCG gets AVX/MPX support later, you will not be able to get
CPUID values from the host.  The leaf 0xd code was written so that it
would work for both KVM and TCG.

When QEMU got AVX support, we decided not to treat XSAVE data as opaque
blobs, and instead unmarshal data out of it into the CPUX86State
struct and back.  This is again useful for TCG, but it also makes for
easier interpretation of migration state.  You will have to rely on
precise sizes and offsets in the marshaling/unmarshaling code of
kvm_get_xsave/kvm_put_xsave, so it is not a big problem to have them
here as well.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM/X86: vpmu migration, make perf_event associated with vcpu thread

2013-11-29 Thread Wang Hui
After applying Paolo's patch, vpmu's data was migrated correctly.
https://patchwork.kernel.org/patch/2850813/

But when I wrote a test module to make IA32_PMC1 to count the event of unhalted
cpu-cycles, after migration the value of IA32_PMC1 never grows up again. I found
that after migration perf_event was created exactly, but when it was created,
current is qemu's main thread which won't enter no-root mode, so the count of
perf_event will never increase.

I have tried pid in the struct of kvm_vcpu to get the vcpu thread's task_struct,
but after migration when create perf_event, pid is pointed to qemu's main thread
but not vcpu thread because of the pid switching in vcpu_load. I don't 
understand
this very well, I think vcpu is created in qemu_kvm_cpu_thread_fn, which is the
vcpu thread, use the pid of current is enough, why switch is needed?

Maybe I was totally wrong, so I kept these code unchanged, add a extra tid to
keep the vcpu thread's pid, and use this tid to get the task_struct of vcpu
thread when create perf_event.

Thanks
Wang Hui

Signed-off-by: Wang Hui john.wang...@huawei.com
---
 arch/x86/kvm/pmu.c   | 8 +++-
 arch/x86/kvm/x86.c   | 6 ++
 include/linux/kvm_host.h | 1 +
 virt/kvm/kvm_main.c  | 1 +
 4 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 5c4f631..676227e 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -173,6 +173,8 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type,
.exclude_kernel = exclude_kernel,
.config = config,
};
+   struct task_struct *task = NULL;
+
if (in_tx)
attr.config |= HSW_IN_TX;
if (in_tx_cp)
@@ -180,7 +182,11 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 
type,

attr.sample_period = (-pmc-counter)  pmc_bitmask(pmc);

-   event = perf_event_create_kernel_counter(attr, -1, current,
+   if (pmc-vcpu)
+   task = pid_task(pmc-vcpu-tid, PIDTYPE_PID);
+   if (!task)
+   task = current;
+   event = perf_event_create_kernel_counter(attr, -1, task,
 intr ? kvm_perf_overflow_intr :
 kvm_perf_overflow, pmc);
if (IS_ERR(event)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 21ef1ba..f1f0e8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6704,11 +6704,17 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
 {
int r;
+   struct pid *cpu_thread_pid;

vcpu-arch.mtrr_state.have_fixed = 1;
r = vcpu_load(vcpu);
if (r)
return r;
+
+   cpu_thread_pid = get_task_pid(current, PIDTYPE_PID);
+   rcu_assign_pointer(vcpu-tid, cpu_thread_pid);
+   synchronize_rcu();
+
kvm_vcpu_reset(vcpu);
kvm_mmu_setup(vcpu);
vcpu_put(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9523d2a..ad7af9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -235,6 +235,7 @@ struct kvm_vcpu {
int guest_fpu_loaded, guest_xcr0_loaded;
wait_queue_head_t wq;
struct pid *pid;
+   struct pid *tid;
int sigset_active;
sigset_t sigset;
struct kvm_vcpu_stat stat;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a0aa84b..80bcce5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -242,6 +242,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_init);

 void kvm_vcpu_uninit(struct kvm_vcpu *vcpu)
 {
+   put_pid(vcpu-tid);
put_pid(vcpu-pid);
kvm_arch_vcpu_uninit(vcpu);
free_page((unsigned long)vcpu-run);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC] create a single workqueue for each vm to update vm irq routing table

2013-11-29 Thread Zhanghaoyu (A)
On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote:
 On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote:
  On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote:
   On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote:
Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto:
 When guest set irq smp_affinity, VMEXIT occurs, then the vcpu 
 thread will IOCTL return to QEMU from hypervisor, then vcpu 
 thread ask the hypervisor to update the irq routing table, in 
 kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread 
 is blocked for so much time to wait RCU grace period, and during 
 this period, this vcpu cannot provide service to VM, so those 
 interrupts delivered to this vcpu cannot be handled in time, and the 
 apps running on this vcpu cannot be serviced too.
 It's unacceptable in some real-time scenario, e.g. telecom. 
 
 So, I want to create a single workqueue for each VM, to 
 asynchronously performing the RCU synchronization for irq routing 
 table, and let the vcpu thread return and VMENTRY to service VM 
 immediately, no more need to blocked to wait RCU grace period.
 And, I have implemented a raw patch, took a test in our telecom 
 environment, above problem disappeared.

I don't think a workqueue is even needed.  You just need to use 
call_rcu to free old after releasing kvm-irq_lock.

What do you think?

   It should be rate limited somehow. Since it guest triggarable 
   guest may cause host to allocate a lot of memory this way.
  
  The checks in __call_rcu(), should handle this I think.  These keep 
  a per-CPU counter, which can be adjusted via rcutree.blimit, which 
  defaults to taking evasive action if more than 10K callbacks are 
  waiting on a given CPU.
  
  
 Documentation/RCU/checklist.txt has:
 
 An especially important property of the synchronize_rcu()
 primitive is that it automatically self-limits: if grace periods
 are delayed for whatever reason, then the synchronize_rcu()
 primitive will correspondingly delay updates.  In contrast,
 code using call_rcu() should explicitly limit update rate in
 cases where grace periods are delayed, as failing to do so can
 result in excessive realtime latencies or even OOM conditions.

I just asked Paul what this means.

My understanding shown as blow,
The synchronous grace period API synchronize_rcu() can prevent current thread 
from generating a large number of rcu-update subsequently, just as the 
self-limits described above in Documentation/RCU/checklist.txt, can avoid 
memory exhaustion, but the asynchronous API call_rcu() cannot limit the update 
rate, need explicitly rate limit.

Thanks,
Zhang Haoyu

 --
  Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Error in frreing hugepages with preemption enabled

2013-11-29 Thread Alexander Graf

On 29.11.2013, at 05:38, Bharat Bhushan bharat.bhus...@freescale.com wrote:

 Hi Alex,
 
 I am running KVM guest with host kernel having CONFIG_PREEMPT enabled. With 
 allocated pages things seems to work fine but I uses hugepages for guest I 
 see below prints when quit from qemu.
 
 (qemu) QEMU waiting for connection on: telnet:0.0.0.0:,server
 qemu-system-ppc64: pci_add_option_rom: failed to find romfile efi-virtio.rom
 q
 debug_smp_processor_id: 15 callbacks suppressed
 BUG: using smp_processor_id() in preemptible [] code: 
 qemu-system-ppc/2504
 caller is .free_hugepd_range+0xb0/0x21c
 CPU: 1 PID: 2504 Comm: qemu-system-ppc Not tainted 3.12.0-rc3-07733-gabf4907 
 #175
 Call Trace:
 [c000fb433400] [c0007d38] .show_stack+0x7c/0x1cc (unreliable)
 [c000fb4334d0] [c05e8ce0] .dump_stack+0x9c/0xf4
 [c000fb433560] [c02de5ec] .debug_smp_processor_id+0x108/0x11c
 [c000fb4335f0] [c0025e10] .free_hugepd_range+0xb0/0x21c
 [c000fb433680] [c00265bc] .hugetlb_free_pgd_range+0x2c8/0x3b0
 [c000fb4337a0] [c00e428c] .free_pgtables+0x14c/0x158
 [c000fb433840] [c00ef320] .exit_mmap+0xec/0x194
 [c000fb433960] [c004d780] .mmput+0x64/0x124
 [c000fb4339e0] [c0051f40] .do_exit+0x29c/0x9c8
 [c000fb433ae0] [c00527c8] .do_group_exit+0x50/0xc4
 [c000fb433b70] [c00606a0] .get_signal_to_deliver+0x21c/0x5d8
 [c000fb433c70] [c0009b08] .do_signal+0x54/0x278
 [c000fb433db0] [c0009e50] .do_notify_resume+0x64/0x78
 [c000fb433e30] [cb44] .ret_from_except_lite+0x70/0x74
 
 
 This mean that free_hugepd_range() must be called with preemption enabled.

with preemption disabled.

 I tried below change and this seems to work fine (I am not having expertise 
 in this area so not sure this is correct way)

Not sure - the scope looks odd to me. Let's ask Andrea - I'm sure he knows what 
to do :).


Alex

 
 diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c
 index d67db4b..6bf8459 100644
 --- a/arch/powerpc/mm/hugetlbpage.c
 +++ b/arch/powerpc/mm/hugetlbpage.c
 @@ -563,8 +563,10 @@ static void hugetlb_free_pmd_range(struct mmu_gather 
 *tlb, pud_t *pud,
 */
next = addr + (1  hugepd_shift(*(hugepd_t *)pmd));
 #endif
 +   preempt_disable();
free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT,
  addr, next, floor, ceiling);
 +   preempt_enable();
} while (addr = next, addr != end);
 
start = PUD_MASK;
 
 
 Thanks
 -Bharat
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html