Re: Error in frreing hugepages with preemption enabled
On 29.11.2013, at 05:38, Bharat Bhushan bharat.bhus...@freescale.com wrote: Hi Alex, I am running KVM guest with host kernel having CONFIG_PREEMPT enabled. With allocated pages things seems to work fine but I uses hugepages for guest I see below prints when quit from qemu. (qemu) QEMU waiting for connection on: telnet:0.0.0.0:,server qemu-system-ppc64: pci_add_option_rom: failed to find romfile efi-virtio.rom q debug_smp_processor_id: 15 callbacks suppressed BUG: using smp_processor_id() in preemptible [] code: qemu-system-ppc/2504 caller is .free_hugepd_range+0xb0/0x21c CPU: 1 PID: 2504 Comm: qemu-system-ppc Not tainted 3.12.0-rc3-07733-gabf4907 #175 Call Trace: [c000fb433400] [c0007d38] .show_stack+0x7c/0x1cc (unreliable) [c000fb4334d0] [c05e8ce0] .dump_stack+0x9c/0xf4 [c000fb433560] [c02de5ec] .debug_smp_processor_id+0x108/0x11c [c000fb4335f0] [c0025e10] .free_hugepd_range+0xb0/0x21c [c000fb433680] [c00265bc] .hugetlb_free_pgd_range+0x2c8/0x3b0 [c000fb4337a0] [c00e428c] .free_pgtables+0x14c/0x158 [c000fb433840] [c00ef320] .exit_mmap+0xec/0x194 [c000fb433960] [c004d780] .mmput+0x64/0x124 [c000fb4339e0] [c0051f40] .do_exit+0x29c/0x9c8 [c000fb433ae0] [c00527c8] .do_group_exit+0x50/0xc4 [c000fb433b70] [c00606a0] .get_signal_to_deliver+0x21c/0x5d8 [c000fb433c70] [c0009b08] .do_signal+0x54/0x278 [c000fb433db0] [c0009e50] .do_notify_resume+0x64/0x78 [c000fb433e30] [cb44] .ret_from_except_lite+0x70/0x74 This mean that free_hugepd_range() must be called with preemption enabled. with preemption disabled. I tried below change and this seems to work fine (I am not having expertise in this area so not sure this is correct way) Not sure - the scope looks odd to me. Let's ask Andrea - I'm sure he knows what to do :). Alex diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index d67db4b..6bf8459 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -563,8 +563,10 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud, */ next = addr + (1 hugepd_shift(*(hugepd_t *)pmd)); #endif + preempt_disable(); free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT, addr, next, floor, ceiling); + preempt_enable(); } while (addr = next, addr != end); start = PUD_MASK; Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug
Hi, I think I found a bug that I do not want to post on any public bugtrackers of KVM. Please let me know a mail to write to. Thank you in advance! Bests, András -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug
On Fri, Nov 29, 2013 at 12:28:06PM +0100, Otártics András wrote: Hi, I think I found a bug that I do not want to post on any public bugtrackers of KVM. Please let me know a mail to write to. Look up KVM maintainers in MAINTAINERS file in the root of the Linux source tree and email to them. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Intel MPX support at Qemu side
Intel has released new version of Intel Architecture Instruction Set Extensions Programming Reference, adding new features like AVX-512, MPX, etc. Refer to http://download-software.intel.com/sites/default/files/319433-015.pdf These 2 patches are prepare patches at qemu side to support Intel MPX feature. PATCH 1/2 is to fix a minor bug which parse cpuid leaf 0x0d; PATCH 2/2 expose cpuid leaf (0xd, 3) and (0xd, 4) to guest, and fix ebx and re-calculate ecx of cpuid leaf (0xd, 0); Thanks, Jinsong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] target-i386: fix cpuid leaf 0x0d
From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 22 Nov 2013 00:24:16 +0800 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx. Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 864c80e..544b57f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -335,7 +335,7 @@ typedef struct ExtSaveArea { static const ExtSaveArea ext_save_areas[] = { [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x100, .size = 0x240 }, +.offset = 0x240, .size = 0x100 }, }; const char *get_register_name_32(unsigned int reg) @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-offset; -*ebx = esa-size; +*eax = esa-size; +*ebx = esa-offset; } } break; -- 1.7.1-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] target-i386: Intel MPX support
From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00 2001 From: root root@ljs.(none) Date: Fri, 22 Nov 2013 00:24:35 +0800 Subject: [PATCH 2/2] target-i386: Intel MPX support Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest. Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0). Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c | 34 ++ target-i386/cpu.h |1 + 2 files changed, 27 insertions(+), 8 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 544b57f..7d04f28 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -330,12 +330,12 @@ X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = { typedef struct ExtSaveArea { uint32_t feature, bits; -uint32_t offset, size; } ExtSaveArea; static const ExtSaveArea ext_save_areas[] = { -[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x240, .size = 0x100 }, +[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX }, +[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, +[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, }; const char *get_register_name_32(unsigned int reg) @@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX) 32); if (count == 0) { -*ecx = 0x240; +*ebx = *ecx = 0x240; for (i = 2; i ARRAY_SIZE(ext_save_areas); i++) { +uint32_t offset, size; const ExtSaveArea *esa = ext_save_areas[i]; + if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 i)) != 0) { if (i 32) { @@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } else { *edx |= 1 (i - 32); } -*ecx = MAX(*ecx, esa-offset + esa-size); + +size = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX); +offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX); +*ecx = MAX(*ecx, offset + size); + +/* + * EBX here just in order to + * 1. keep compatible with old qemu version, take AVX + *into account; + * 2. keep compatible with old kernel version. Currently + *KVM has bug when expose cpuid 0xd to guest (include + *static value when guest booting and dynamic value + *when guest enables XCR0 features. EBX here can + *co-work with old buggy and new updated KVM, keep + *same value independent to CPU and kernel version. + */ +if (i == 2) +*ebx = MAX(*ebx, offset + size); } } *eax |= kvm_mask (XSTATE_FP | XSTATE_SSE); -*ebx = *ecx; } else if (count == 1) { *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX); } else if (count ARRAY_SIZE(ext_save_areas)) { const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-size; -*ebx = esa-offset; +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX); } } break; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index ea373e8..9a838d1 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_7_0_EBX_ERMS (1 9) #define CPUID_7_0_EBX_INVPCID (1 10) #define CPUID_7_0_EBX_RTM (1 11) +#define CPUID_7_0_EBX_MPX (1 14) #define CPUID_7_0_EBX_RDSEED (1 18) #define CPUID_7_0_EBX_ADX (1 19) #define CPUID_7_0_EBX_SMAP (1 20) -- 1.7.1-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] target-i386: Intel MPX support
Il 29/11/2013 14:17, Liu, Jinsong ha scritto: From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00 2001 From: root root@ljs.(none) Date: Fri, 22 Nov 2013 00:24:35 +0800 Subject: [PATCH 2/2] target-i386: Intel MPX support Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest. Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0). There is no reason to get the size and offset from the host. Peter Anvin confirmed that the sizes and offsets will never change (as should be the case for migration to work across different CPU versions). In fact, the size and offset is documented for every XSAVE feature except MPX in the copy I have of the Intel documentation. Please get the size and offset from the documentation, if it has been updated, or from a real host, and hardcode them in QEMU. Paolo Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c | 34 ++ target-i386/cpu.h |1 + 2 files changed, 27 insertions(+), 8 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 544b57f..7d04f28 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -330,12 +330,12 @@ X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = { typedef struct ExtSaveArea { uint32_t feature, bits; -uint32_t offset, size; } ExtSaveArea; static const ExtSaveArea ext_save_areas[] = { -[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x240, .size = 0x100 }, +[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX }, +[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, +[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, }; const char *get_register_name_32(unsigned int reg) @@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX) 32); if (count == 0) { -*ecx = 0x240; +*ebx = *ecx = 0x240; for (i = 2; i ARRAY_SIZE(ext_save_areas); i++) { +uint32_t offset, size; const ExtSaveArea *esa = ext_save_areas[i]; + if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 i)) != 0) { if (i 32) { @@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } else { *edx |= 1 (i - 32); } -*ecx = MAX(*ecx, esa-offset + esa-size); + +size = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX); +offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX); +*ecx = MAX(*ecx, offset + size); + +/* + * EBX here just in order to + * 1. keep compatible with old qemu version, take AVX + *into account; + * 2. keep compatible with old kernel version. Currently + *KVM has bug when expose cpuid 0xd to guest (include + *static value when guest booting and dynamic value + *when guest enables XCR0 features. EBX here can + *co-work with old buggy and new updated KVM, keep + *same value independent to CPU and kernel version. + */ +if (i == 2) +*ebx = MAX(*ebx, offset + size); } } *eax |= kvm_mask (XSTATE_FP | XSTATE_SSE); -*ebx = *ecx; } else if (count == 1) { *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX); } else if (count ARRAY_SIZE(ext_save_areas)) { const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-size; -*ebx = esa-offset; +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX); } } break; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index ea373e8..9a838d1 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_7_0_EBX_ERMS (1 9) #define CPUID_7_0_EBX_INVPCID (1 10) #define CPUID_7_0_EBX_RTM (1 11) +#define CPUID_7_0_EBX_MPX (1 14) #define CPUID_7_0_EBX_RDSEED (1 18) #define CPUID_7_0_EBX_ADX (1 19) #define CPUID_7_0_EBX_SMAP (1 20) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d
Il 29/11/2013 14:15, Liu, Jinsong ha scritto: From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 22 Nov 2013 00:24:16 +0800 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx. There is no visible change right (the two hunks cancel each other)? Since you will have to post a v2, please make this explicit in the commit message. Thanks, Paolo Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 864c80e..544b57f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -335,7 +335,7 @@ typedef struct ExtSaveArea { static const ExtSaveArea ext_save_areas[] = { [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x100, .size = 0x240 }, +.offset = 0x240, .size = 0x100 }, }; const char *get_register_name_32(unsigned int reg) @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-offset; -*ebx = esa-size; +*eax = esa-size; +*ebx = esa-offset; } } break; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Intel MPX support for KVM
Intel has released new version of Intel Architecture Instruction Set Extensions Programming Reference, adding new features like AVX-512, MPX, etc. Refer to http://download-software.intel.com/sites/default/files/319433-015.pdf These patches are to support Intel MPX for KVM. PATCH 1/4 is some MPX definiation; PATCH 2/4 re-calculate cpuid(0xd,0) EBX; PATCH 3/4 enable Intel Memory Protection Extension for guest; PATCH 4/4 is Intel MPX vmx and msr handle; These pathes are based on my ex-colleague Xudong's work, now I help him to push these patches. Thanks, Jinsong-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] X86: Intel MPX definiation
From 3a1a011100b38a275d8c95468c12c483e316bb15 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 29 Nov 2013 01:27:00 +0800 Subject: [PATCH 1/4] X86: Intel MPX definiation Signed-off-by: Xudong Hao xudong@intel.com Reviewed-by: Liu Jinsong jinsong@intel.com --- arch/x86/include/asm/cpufeature.h |2 ++ arch/x86/include/asm/xsave.h |5 - 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 89270b4..1b00b01 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -216,6 +216,7 @@ #define X86_FEATURE_ERMS (9*32+ 9) /* Enhanced REP MOVSB/STOSB */ #define X86_FEATURE_INVPCID(9*32+10) /* Invalidate Processor Context ID */ #define X86_FEATURE_RTM(9*32+11) /* Restricted Transactional Memory */ +#define X86_FEATURE_MPX(9*32+14) /* Memory Protection Extension */ #define X86_FEATURE_RDSEED (9*32+18) /* The RDSEED instruction */ #define X86_FEATURE_ADX(9*32+19) /* The ADCX and ADOX instructions */ #define X86_FEATURE_SMAP (9*32+20) /* Supervisor Mode Access Prevention */ @@ -330,6 +331,7 @@ extern const char * const x86_power_flags[32]; #define cpu_has_perfctr_l2 boot_cpu_has(X86_FEATURE_PERFCTR_L2) #define cpu_has_cx8boot_cpu_has(X86_FEATURE_CX8) #define cpu_has_cx16 boot_cpu_has(X86_FEATURE_CX16) +#define cpu_has_mpxboot_cpu_has(X86_FEATURE_MPX) #define cpu_has_eager_fpu boot_cpu_has(X86_FEATURE_EAGER_FPU) #define cpu_has_topoextboot_cpu_has(X86_FEATURE_TOPOEXT) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index 0415cda..d3e3ea5 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -9,6 +9,8 @@ #define XSTATE_FP 0x1 #define XSTATE_SSE 0x2 #define XSTATE_YMM 0x4 +#define XSTATE_BNDREGS 0x8 +#define XSTATE_BNDCSR 0x10 #define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE) @@ -23,7 +25,8 @@ /* * These are the features that the OS can handle currently. */ -#define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define XCNTXT_MASK(XSTATE_FP | XSTATE_SSE | XSTATE_YMM | \ + XSTATE_BNDREGS | XSTATE_BNDCSR) #ifdef CONFIG_X86_64 #define REX_PREFIX 0x48, -- 1.7.1 0001-X86-Intel-MPX-definiation.patch Description: 0001-X86-Intel-MPX-definiation.patch
[PATCH 2/4] KVM/X86: Fix xsave cpuid exposing bug
From b060be65e466291c91963e58c4880ec614d0b294 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 29 Nov 2013 01:27:53 +0800 Subject: [PATCH 2/4] KVM/X86: Fix xsave cpuid exposing bug EBX of cpuid(0xD, 0) is dynamic per XCR0 features enable/disable. Bit 63 of XCR0 is reserved for future expansion. Signed-off-by: Liu Jinsong jinsong@intel.com --- arch/x86/include/asm/xsave.h |2 ++ arch/x86/kvm/cpuid.c |4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h index d3e3ea5..6120e74 100644 --- a/arch/x86/include/asm/xsave.h +++ b/arch/x86/include/asm/xsave.h @@ -13,6 +13,8 @@ #define XSTATE_BNDCSR 0x10 #define XSTATE_FPSSE (XSTATE_FP | XSTATE_SSE) +/* Bit 63 of XCR0 is reserved for future expansion */ +#define XSTATE_EXTEND_MASK (~(XSTATE_FPSSE | (1 63))) #define FXSAVE_SIZE512 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index c697625..a8ce117 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -28,7 +28,7 @@ static u32 xstate_required_size(u64 xstate_bv) int feature_bit = 0; u32 ret = XSAVE_HDR_SIZE + XSAVE_HDR_OFFSET; - xstate_bv = ~XSTATE_FPSSE; + xstate_bv = XSTATE_EXTEND_MASK; while (xstate_bv) { if (xstate_bv 0x1) { u32 eax, ebx, ecx, edx; @@ -74,7 +74,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) vcpu-arch.guest_supported_xcr0 = (best-eax | ((u64)best-edx 32)) host_xcr0 KVM_SUPPORTED_XCR0; - vcpu-arch.guest_xstate_size = + vcpu-arch.guest_xstate_size = best-ebx = xstate_required_size(vcpu-arch.guest_supported_xcr0); } -- 1.7.1 0002-KVM-X86-Fix-xsave-cpuid-exposing-bug.patch Description: 0002-KVM-X86-Fix-xsave-cpuid-exposing-bug.patch
[PATCH 3/4] KVM/X86: Enable Intel MPX for guest
From 11ae33723027c7b8e53a8c109f127800d7f0ad6e Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 29 Nov 2013 01:28:19 +0800 Subject: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest Enable Intel Memory Protection Extension for guest. Signed-off-by: Xudong Hao xudong@intel.com Reviewed-by: Liu Jinsong jinsong@intel.com --- arch/x86/kvm/cpuid.c |4 ++-- arch/x86/kvm/x86.c | 14 -- arch/x86/kvm/x86.h |3 ++- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index a8ce117..e30d4ce 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) (best-eax | ((u64)best-edx 32)) host_xcr0 KVM_SUPPORTED_XCR0; vcpu-arch.guest_xstate_size = best-ebx = - xstate_required_size(vcpu-arch.guest_supported_xcr0); + xstate_required_size(vcpu-arch.xcr0); } kvm_pmu_cpuid_update(vcpu); @@ -303,7 +303,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function, /* cpuid 7.0.ebx */ const u32 kvm_supported_word9_x86_features = F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | - F(BMI2) | F(ERMS) | f_invpcid | F(RTM); + F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | F(MPX); /* all calls to cpuid_count() should be made on the same cpu */ get_cpu(); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ef1ba..6e38698 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -576,13 +576,13 @@ static void kvm_put_guest_xcr0(struct kvm_vcpu *vcpu) int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) { - u64 xcr0; + u64 xcr0 = xcr; + u64 old_xcr0 = vcpu-arch.xcr0; u64 valid_bits; /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ if (index != XCR_XFEATURE_ENABLED_MASK) return 1; - xcr0 = xcr; if (!(xcr0 XSTATE_FP)) return 1; if ((xcr0 XSTATE_YMM) !(xcr0 XSTATE_SSE)) @@ -597,8 +597,15 @@ int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) if (xcr0 ~valid_bits) return 1; + if ((!(xcr0 XSTATE_BNDREGS)) != (!(xcr0 XSTATE_BNDCSR))) + return 1; + kvm_put_guest_xcr0(vcpu); vcpu-arch.xcr0 = xcr0; + + if ((xcr0 ^ old_xcr0) XSTATE_EXTEND_MASK) + kvm_update_cpuid(vcpu); + return 0; } @@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); + if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) + (vcpu-arch.xcr0 (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR))) + kvm_x86_ops-fpu_activate(vcpu); if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); kvm_load_guest_xcr0(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 587fb9e..985e40e 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception); -#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \ + | XSTATE_BNDREGS | XSTATE_BNDCSR) extern u64 host_xcr0; extern struct static_key kvm_no_apic_vcpu; -- 1.7.1 0003-KVM-X86-Enable-Intel-MPX-for-guest.patch Description: 0003-KVM-X86-Enable-Intel-MPX-for-guest.patch
[PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle
From 7532bdffe9f74db65f6eff733cb227a66bef932e Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Sat, 30 Nov 2013 00:27:02 +0800 Subject: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle Signed-off-by: Xudong Hao xudong@intel.com Reviewed-by: Liu Jinsong jinsong@intel.com --- arch/x86/include/asm/vmx.h|2 ++ arch/x86/include/uapi/asm/msr-index.h |1 + arch/x86/kvm/vmx.c| 13 +++-- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 966502d..1bf4681 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -85,6 +85,7 @@ #define VM_EXIT_SAVE_IA32_EFER 0x0010 #define VM_EXIT_LOAD_IA32_EFER 0x0020 #define VM_EXIT_SAVE_VMX_PREEMPTION_TIMER 0x0040 +#define VM_EXIT_CLEAR_BNDCFGS 0x0080 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR 0x00036dff @@ -95,6 +96,7 @@ #define VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL 0x2000 #define VM_ENTRY_LOAD_IA32_PAT 0x4000 #define VM_ENTRY_LOAD_IA32_EFER 0x8000 +#define VM_ENTRY_LOAD_BNDCFGS 0x0001 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR 0x11ff diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h index 37813b5..2a418c4 100644 --- a/arch/x86/include/uapi/asm/msr-index.h +++ b/arch/x86/include/uapi/asm/msr-index.h @@ -294,6 +294,7 @@ #define MSR_SMI_COUNT 0x0034 #define MSR_IA32_FEATURE_CONTROL0x003a #define MSR_IA32_TSC_ADJUST 0x003b +#define MSR_IA32_BNDCFGS 0x0d90 #define FEATURE_CONTROL_LOCKED (10) #define FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX (11) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b2fe1c2..aa23edf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -439,6 +439,7 @@ struct vcpu_vmx { #endif int gs_ldt_reload_needed; int fs_reload_needed; + u64 msr_host_bndcfgs; } host_state; struct { int vm86_active; @@ -1647,6 +1648,8 @@ static void vmx_save_host_state(struct kvm_vcpu *vcpu) if (is_long_mode(vmx-vcpu)) wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_guest_kernel_gs_base); #endif + if (cpu_has_mpx) + rdmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs); for (i = 0; i vmx-save_nmsrs; ++i) kvm_set_shared_msr(vmx-guest_msrs[i].index, vmx-guest_msrs[i].data, @@ -1684,6 +1687,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) #ifdef CONFIG_X86_64 wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); #endif + if (cpu_has_mpx) + wrmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs); /* * If the FPU is not active (through the host task or * the guest vcpu), then restore the cr0.TS bit. @@ -2800,7 +2805,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT | - VM_EXIT_ACK_INTR_ON_EXIT; + VM_EXIT_ACK_INTR_ON_EXIT | VM_EXIT_CLEAR_BNDCFGS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS, _vmexit_control) 0) return -EIO; @@ -2817,7 +2822,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) _pin_based_exec_control = ~PIN_BASED_POSTED_INTR; min = 0; - opt = VM_ENTRY_LOAD_IA32_PAT; + opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, _vmentry_control) 0) return -EIO; @@ -8636,6 +8641,10 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); + if ((vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_BNDCFGS) + (vmcs_config.vmexit_ctrl VM_EXIT_CLEAR_BNDCFGS)) + vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true); + memcpy(vmx_msr_bitmap_legacy_x2apic, vmx_msr_bitmap_legacy, PAGE_SIZE); memcpy(vmx_msr_bitmap_longmode_x2apic, -- 1.7.1 0004-KVM-X86-Intel-MPX-vmx-and-msr-handle.patch Description: 0004-KVM-X86-Intel-MPX-vmx-and-msr-handle.patch
Re: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index a8ce117..e30d4ce 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) (best-eax | ((u64)best-edx 32)) host_xcr0 KVM_SUPPORTED_XCR0; vcpu-arch.guest_xstate_size = best-ebx = - xstate_required_size(vcpu-arch.guest_supported_xcr0); + xstate_required_size(vcpu-arch.xcr0); } kvm_pmu_cpuid_update(vcpu); ... kvm_put_guest_xcr0(vcpu); vcpu-arch.xcr0 = xcr0; + + if ((xcr0 ^ old_xcr0) XSTATE_EXTEND_MASK) + kvm_update_cpuid(vcpu); + return 0; } These hunks should be part of the previous patch. @@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); + if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) Shouldn't be necessary, setting xcr0 fails unless OSXSAVE=1. + (vcpu-arch.xcr0 (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR))) + kvm_x86_ops-fpu_activate(vcpu); Can you explain this? if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); kvm_load_guest_xcr0(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 587fb9e..985e40e 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception); -#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \ + | XSTATE_BNDREGS | XSTATE_BNDCSR) extern u64 host_xcr0; extern struct static_key kvm_no_apic_vcpu; Otherwise looks straightforward. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d
Paolo Bonzini wrote: Il 29/11/2013 14:15, Liu, Jinsong ha scritto: From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 22 Nov 2013 00:24:16 +0800 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx. There is no visible change right (the two hunks cancel each other)? Since you will have to post a v2, please make this explicit in the commit message. OK, will add explicit commit message, or, drop this patch if needed. Thanks, Jinsong Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 864c80e..544b57f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -335,7 +335,7 @@ typedef struct ExtSaveArea { static const ExtSaveArea ext_save_areas[] = { [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x100, .size = 0x240 }, +.offset = 0x240, .size = 0x100 }, }; const char *get_register_name_32(unsigned int reg) @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-offset; -*ebx = esa-size; +*eax = esa-size; +*ebx = esa-offset; } } break; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 2/2] target-i386: Intel MPX support
Paolo Bonzini wrote: Il 29/11/2013 14:17, Liu, Jinsong ha scritto: From aac033473bc88befe39a9add99820c0a7118ac90 Mon Sep 17 00:00:00 2001 From: root root@ljs.(none) Date: Fri, 22 Nov 2013 00:24:35 +0800 Subject: [PATCH 2/2] target-i386: Intel MPX support Expose cpuid leaf (0xd, 3) and (0xd, 4) to guest. Fix ebx and re-calculate ecx of cpuid leaf (0xd, 0). There is no reason to get the size and offset from the host. Peter Anvin confirmed that the sizes and offsets will never change (as should be the case for migration to work across different CPU versions). In fact, the size and offset is documented for every XSAVE feature except MPX in the copy I have of the Intel documentation. If the sizes and offsets will never change, what's the bad effect of getting them from host? Please get the size and offset from the documentation, if it has been updated, or from a real host, and hardcode them in QEMU. Hmm, the problem is what I get is not equal to real test :( For example, I was told XSTATE_BNDCSR_SIZE is 0x40, but real test shows it's 0x10. Maybe getting from real h/w is not bad than hardcode it? Thanks Jinsong Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c | 34 ++ target-i386/cpu.h |1 + 2 files changed, 27 insertions(+), 8 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 544b57f..7d04f28 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -330,12 +330,12 @@ X86RegisterInfo32 x86_reg_info_32[CPU_NB_REGS32] = { typedef struct ExtSaveArea { uint32_t feature, bits; -uint32_t offset, size; } ExtSaveArea; static const ExtSaveArea ext_save_areas[] = { -[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x240, .size = 0x100 }, +[2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX }, +[3] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, +[4] = { .feature = FEAT_7_0_EBX, .bits = CPUID_7_0_EBX_MPX }, }; const char *get_register_name_32(unsigned int reg) @@ -2204,9 +2204,11 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, ((uint64_t)kvm_arch_get_supported_cpuid(s, 0xd, 0, R_EDX) 32); if (count == 0) { -*ecx = 0x240; +*ebx = *ecx = 0x240; for (i = 2; i ARRAY_SIZE(ext_save_areas); i++) { +uint32_t offset, size; const ExtSaveArea *esa = ext_save_areas[i]; + if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 i)) != 0) { if (i 32) { @@ -2214,19 +2216,35 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } else { *edx |= 1 (i - 32); } -*ecx = MAX(*ecx, esa-offset + esa-size); + +size = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EAX); +offset = kvm_arch_get_supported_cpuid(s, 0xd, i, R_EBX); + *ecx = MAX(*ecx, offset + size); + +/* + * EBX here just in order to + * 1. keep compatible with old qemu version, take AVX + *into account; + * 2. keep compatible with old kernel version. Currently + *KVM has bug when expose cpuid 0xd to guest (include + *static value when guest booting and dynamic value + *when guest enables XCR0 features. EBX here can + * co-work with old buggy and new updated KVM, keep + *same value independent to CPU and kernel version. + */ +if (i == 2) +*ebx = MAX(*ebx, offset + size); } } *eax |= kvm_mask (XSTATE_FP | XSTATE_SSE); - *ebx = *ecx; } else if (count == 1) { *eax = kvm_arch_get_supported_cpuid(s, 0xd, 1, R_EAX); } else if (count ARRAY_SIZE(ext_save_areas)) { const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-size; -*ebx = esa-offset; +*eax = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EAX); +*ebx = kvm_arch_get_supported_cpuid(s, 0xd, count, R_EBX); } } break; diff --git a/target-i386/cpu.h b/target-i386/cpu.h index ea373e8..9a838d1 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -545,6 +545,7 @@ typedef uint32_t FeatureWordArray[FEATURE_WORDS]; #define CPUID_7_0_EBX_ERMS (1 9) #define CPUID_7_0_EBX_INVPCID (1 10) #define CPUID_7_0_EBX_RTM (1 11) +#define
RE: [PATCH 3/4] KVM/X86: Enable Intel MPX for guest
Paolo Bonzini wrote: diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index a8ce117..e30d4ce 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -75,7 +75,7 @@ void kvm_update_cpuid(struct kvm_vcpu *vcpu) (best-eax | ((u64)best-edx 32)) host_xcr0 KVM_SUPPORTED_XCR0; vcpu-arch.guest_xstate_size = best-ebx = -xstate_required_size(vcpu-arch.guest_supported_xcr0); +xstate_required_size(vcpu-arch.xcr0); } kvm_pmu_cpuid_update(vcpu); ... kvm_put_guest_xcr0(vcpu); vcpu-arch.xcr0 = xcr0; + +if ((xcr0 ^ old_xcr0) XSTATE_EXTEND_MASK) +kvm_update_cpuid(vcpu); + return 0; } These hunks should be part of the previous patch. @@ -5960,6 +5967,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) preempt_disable(); kvm_x86_ops-prepare_guest_switch(vcpu); +if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) Shouldn't be necessary, setting xcr0 fails unless OSXSAVE=1. +(vcpu-arch.xcr0 (u64)(XSTATE_BNDREGS | XSTATE_BNDCSR))) +kvm_x86_ops-fpu_activate(vcpu); Can you explain this? No, in fact I'm also some wondering about it, but per it has been tested, I didn't update this code. I will double check and drop it if need (or, maybe Xudong can elaborate more?) Thanks, Jinsong if (vcpu-fpu_active) kvm_load_guest_fpu(vcpu); kvm_load_guest_xcr0(vcpu); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 587fb9e..985e40e 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -122,7 +122,8 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt, gva_t addr, void *val, unsigned int bytes, struct x86_exception *exception); -#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM) +#define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \ +| XSTATE_BNDREGS | XSTATE_BNDCSR) extern u64 host_xcr0; extern struct static_key kvm_no_apic_vcpu; Otherwise looks straightforward. Thanks, will update per your comments. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle
Il 29/11/2013 14:44, Liu, Jinsong ha scritto: From 7532bdffe9f74db65f6eff733cb227a66bef932e Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Sat, 30 Nov 2013 00:27:02 +0800 Subject: [PATCH 4/4] KVM/X86: Intel MPX vmx and msr handle Signed-off-by: Xudong Hao xudong@intel.com Reviewed-by: Liu Jinsong jinsong@intel.com This should be a Signed-off-by since you are posting the patch, not Xudong Hao (same for patches 1 and 3). I think this patch should go before the previous one. Also see below. + if (cpu_has_mpx) + rdmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs); for (i = 0; i vmx-save_nmsrs; ++i) kvm_set_shared_msr(vmx-guest_msrs[i].index, vmx-guest_msrs[i].data, @@ -1684,6 +1687,8 @@ static void __vmx_load_host_state(struct vcpu_vmx *vmx) #ifdef CONFIG_X86_64 wrmsrl(MSR_KERNEL_GS_BASE, vmx-msr_host_kernel_gs_base); #endif + if (cpu_has_mpx) + wrmsrl(MSR_IA32_BNDCFGS, vmx-host_state.msr_host_bndcfgs); This should be if (vmx-host_state.msr_host_bndcfgs), so that no WRMSR is done if host_bndcfgs == 0 (which includes the case of !cpu_has_mpx). @@ -2800,7 +2805,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT | - VM_EXIT_ACK_INTR_ON_EXIT; + VM_EXIT_ACK_INTR_ON_EXIT | VM_EXIT_CLEAR_BNDCFGS; - opt = VM_ENTRY_LOAD_IA32_PAT; + opt = VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS; if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, _vmentry_control) 0) return -EIO; @@ -8636,6 +8641,10 @@ static int __init vmx_init(void) vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_CS, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_ESP, false); vmx_disable_intercept_for_msr(MSR_IA32_SYSENTER_EIP, false); + if ((vmcs_config.vmentry_ctrl VM_ENTRY_LOAD_BNDCFGS) + (vmcs_config.vmexit_ctrl VM_EXIT_CLEAR_BNDCFGS)) + vmx_disable_intercept_for_msr(MSR_IA32_BNDCFGS, true); Why only disable it in that case? If the two bits are guaranteed to be present for if (cpu_has_mpx), please use if (cpu_has_mpx) here or perhaps make it unconditional. If the two bits might not be there, you need to emulate them using add_atomic_switch_msr. Thanks, Paolo memcpy(vmx_msr_bitmap_legacy_x2apic, vmx_msr_bitmap_legacy, PAGE_SIZE); memcpy(vmx_msr_bitmap_longmode_x2apic, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d
Il 29/11/2013 15:46, Liu, Jinsong ha scritto: Paolo Bonzini wrote: Il 29/11/2013 14:15, Liu, Jinsong ha scritto: From e4b58c7bafc4d9f913a572a1b1cfee91c92f1637 Mon Sep 17 00:00:00 2001 From: Liu Jinsong jinsong@intel.com Date: Fri, 22 Nov 2013 00:24:16 +0800 Subject: [PATCH 1/2] target-i386: fix cpuid leaf 0x0d Fix cpuid leaf 0x0d which incorrectly parsed eax and ebx. There is no visible change right (the two hunks cancel each other)? Since you will have to post a v2, please make this explicit in the commit message. OK, will add explicit commit message, or, drop this patch if needed. The patch is correct, so keep it please. However, mention in the commit message that the CPUID values were valid even before this patch. Also, the QEMU side needs support for transferring the state in and out of KVM (kvm_put_xsave, kvm_get_xsave). On top of this you can add migration support using a new subsection of vmstate_cpu. Thanks! Paolo Thanks, Jinsong Signed-off-by: Liu Jinsong jinsong@intel.com --- target-i386/cpu.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 864c80e..544b57f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -335,7 +335,7 @@ typedef struct ExtSaveArea { static const ExtSaveArea ext_save_areas[] = { [2] = { .feature = FEAT_1_ECX, .bits = CPUID_EXT_AVX, -.offset = 0x100, .size = 0x240 }, +.offset = 0x240, .size = 0x100 }, }; const char *get_register_name_32(unsigned int reg) @@ -2225,8 +2225,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, const ExtSaveArea *esa = ext_save_areas[count]; if ((env-features[esa-feature] esa-bits) == esa-bits (kvm_mask (1 count)) != 0) { -*eax = esa-offset; -*ebx = esa-size; +*eax = esa-size; +*ebx = esa-offset; } } break; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] target-i386: Intel MPX support
Il 29/11/2013 15:50, Liu, Jinsong ha scritto: There is no reason to get the size and offset from the host. Peter Anvin confirmed that the sizes and offsets will never change (as should be the case for migration to work across different CPU versions). In fact, the size and offset is documented for every XSAVE feature except MPX in the copy I have of the Intel documentation. If the sizes and offsets will never change, what's the bad effect of getting them from host? In case TCG gets AVX/MPX support later, you will not be able to get CPUID values from the host. The leaf 0xd code was written so that it would work for both KVM and TCG. When QEMU got AVX support, we decided not to treat XSAVE data as opaque blobs, and instead unmarshal data out of it into the CPUX86State struct and back. This is again useful for TCG, but it also makes for easier interpretation of migration state. You will have to rely on precise sizes and offsets in the marshaling/unmarshaling code of kvm_get_xsave/kvm_put_xsave, so it is not a big problem to have them here as well. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM/X86: vpmu migration, make perf_event associated with vcpu thread
After applying Paolo's patch, vpmu's data was migrated correctly. https://patchwork.kernel.org/patch/2850813/ But when I wrote a test module to make IA32_PMC1 to count the event of unhalted cpu-cycles, after migration the value of IA32_PMC1 never grows up again. I found that after migration perf_event was created exactly, but when it was created, current is qemu's main thread which won't enter no-root mode, so the count of perf_event will never increase. I have tried pid in the struct of kvm_vcpu to get the vcpu thread's task_struct, but after migration when create perf_event, pid is pointed to qemu's main thread but not vcpu thread because of the pid switching in vcpu_load. I don't understand this very well, I think vcpu is created in qemu_kvm_cpu_thread_fn, which is the vcpu thread, use the pid of current is enough, why switch is needed? Maybe I was totally wrong, so I kept these code unchanged, add a extra tid to keep the vcpu thread's pid, and use this tid to get the task_struct of vcpu thread when create perf_event. Thanks Wang Hui Signed-off-by: Wang Hui john.wang...@huawei.com --- arch/x86/kvm/pmu.c | 8 +++- arch/x86/kvm/x86.c | 6 ++ include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 1 + 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 5c4f631..676227e 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -173,6 +173,8 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type, .exclude_kernel = exclude_kernel, .config = config, }; + struct task_struct *task = NULL; + if (in_tx) attr.config |= HSW_IN_TX; if (in_tx_cp) @@ -180,7 +182,11 @@ static void reprogram_counter(struct kvm_pmc *pmc, u32 type, attr.sample_period = (-pmc-counter) pmc_bitmask(pmc); - event = perf_event_create_kernel_counter(attr, -1, current, + if (pmc-vcpu) + task = pid_task(pmc-vcpu-tid, PIDTYPE_PID); + if (!task) + task = current; + event = perf_event_create_kernel_counter(attr, -1, task, intr ? kvm_perf_overflow_intr : kvm_perf_overflow, pmc); if (IS_ERR(event)) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 21ef1ba..f1f0e8e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6704,11 +6704,17 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) { int r; + struct pid *cpu_thread_pid; vcpu-arch.mtrr_state.have_fixed = 1; r = vcpu_load(vcpu); if (r) return r; + + cpu_thread_pid = get_task_pid(current, PIDTYPE_PID); + rcu_assign_pointer(vcpu-tid, cpu_thread_pid); + synchronize_rcu(); + kvm_vcpu_reset(vcpu); kvm_mmu_setup(vcpu); vcpu_put(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9523d2a..ad7af9d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -235,6 +235,7 @@ struct kvm_vcpu { int guest_fpu_loaded, guest_xcr0_loaded; wait_queue_head_t wq; struct pid *pid; + struct pid *tid; int sigset_active; sigset_t sigset; struct kvm_vcpu_stat stat; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a0aa84b..80bcce5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -242,6 +242,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_init); void kvm_vcpu_uninit(struct kvm_vcpu *vcpu) { + put_pid(vcpu-tid); put_pid(vcpu-pid); kvm_arch_vcpu_uninit(vcpu); free_page((unsigned long)vcpu-run); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] create a single workqueue for each vm to update vm irq routing table
On Tue, Nov 26, 2013 at 06:14:27PM +0200, Gleb Natapov wrote: On Tue, Nov 26, 2013 at 06:05:37PM +0200, Michael S. Tsirkin wrote: On Tue, Nov 26, 2013 at 02:56:10PM +0200, Gleb Natapov wrote: On Tue, Nov 26, 2013 at 01:47:03PM +0100, Paolo Bonzini wrote: Il 26/11/2013 13:40, Zhanghaoyu (A) ha scritto: When guest set irq smp_affinity, VMEXIT occurs, then the vcpu thread will IOCTL return to QEMU from hypervisor, then vcpu thread ask the hypervisor to update the irq routing table, in kvm_set_irq_routing, synchronize_rcu is called, current vcpu thread is blocked for so much time to wait RCU grace period, and during this period, this vcpu cannot provide service to VM, so those interrupts delivered to this vcpu cannot be handled in time, and the apps running on this vcpu cannot be serviced too. It's unacceptable in some real-time scenario, e.g. telecom. So, I want to create a single workqueue for each VM, to asynchronously performing the RCU synchronization for irq routing table, and let the vcpu thread return and VMENTRY to service VM immediately, no more need to blocked to wait RCU grace period. And, I have implemented a raw patch, took a test in our telecom environment, above problem disappeared. I don't think a workqueue is even needed. You just need to use call_rcu to free old after releasing kvm-irq_lock. What do you think? It should be rate limited somehow. Since it guest triggarable guest may cause host to allocate a lot of memory this way. The checks in __call_rcu(), should handle this I think. These keep a per-CPU counter, which can be adjusted via rcutree.blimit, which defaults to taking evasive action if more than 10K callbacks are waiting on a given CPU. Documentation/RCU/checklist.txt has: An especially important property of the synchronize_rcu() primitive is that it automatically self-limits: if grace periods are delayed for whatever reason, then the synchronize_rcu() primitive will correspondingly delay updates. In contrast, code using call_rcu() should explicitly limit update rate in cases where grace periods are delayed, as failing to do so can result in excessive realtime latencies or even OOM conditions. I just asked Paul what this means. My understanding shown as blow, The synchronous grace period API synchronize_rcu() can prevent current thread from generating a large number of rcu-update subsequently, just as the self-limits described above in Documentation/RCU/checklist.txt, can avoid memory exhaustion, but the asynchronous API call_rcu() cannot limit the update rate, need explicitly rate limit. Thanks, Zhang Haoyu -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Error in frreing hugepages with preemption enabled
On 29.11.2013, at 05:38, Bharat Bhushan bharat.bhus...@freescale.com wrote: Hi Alex, I am running KVM guest with host kernel having CONFIG_PREEMPT enabled. With allocated pages things seems to work fine but I uses hugepages for guest I see below prints when quit from qemu. (qemu) QEMU waiting for connection on: telnet:0.0.0.0:,server qemu-system-ppc64: pci_add_option_rom: failed to find romfile efi-virtio.rom q debug_smp_processor_id: 15 callbacks suppressed BUG: using smp_processor_id() in preemptible [] code: qemu-system-ppc/2504 caller is .free_hugepd_range+0xb0/0x21c CPU: 1 PID: 2504 Comm: qemu-system-ppc Not tainted 3.12.0-rc3-07733-gabf4907 #175 Call Trace: [c000fb433400] [c0007d38] .show_stack+0x7c/0x1cc (unreliable) [c000fb4334d0] [c05e8ce0] .dump_stack+0x9c/0xf4 [c000fb433560] [c02de5ec] .debug_smp_processor_id+0x108/0x11c [c000fb4335f0] [c0025e10] .free_hugepd_range+0xb0/0x21c [c000fb433680] [c00265bc] .hugetlb_free_pgd_range+0x2c8/0x3b0 [c000fb4337a0] [c00e428c] .free_pgtables+0x14c/0x158 [c000fb433840] [c00ef320] .exit_mmap+0xec/0x194 [c000fb433960] [c004d780] .mmput+0x64/0x124 [c000fb4339e0] [c0051f40] .do_exit+0x29c/0x9c8 [c000fb433ae0] [c00527c8] .do_group_exit+0x50/0xc4 [c000fb433b70] [c00606a0] .get_signal_to_deliver+0x21c/0x5d8 [c000fb433c70] [c0009b08] .do_signal+0x54/0x278 [c000fb433db0] [c0009e50] .do_notify_resume+0x64/0x78 [c000fb433e30] [cb44] .ret_from_except_lite+0x70/0x74 This mean that free_hugepd_range() must be called with preemption enabled. with preemption disabled. I tried below change and this seems to work fine (I am not having expertise in this area so not sure this is correct way) Not sure - the scope looks odd to me. Let's ask Andrea - I'm sure he knows what to do :). Alex diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index d67db4b..6bf8459 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -563,8 +563,10 @@ static void hugetlb_free_pmd_range(struct mmu_gather *tlb, pud_t *pud, */ next = addr + (1 hugepd_shift(*(hugepd_t *)pmd)); #endif + preempt_disable(); free_hugepd_range(tlb, (hugepd_t *)pmd, PMD_SHIFT, addr, next, floor, ceiling); + preempt_enable(); } while (addr = next, addr != end); start = PUD_MASK; Thanks -Bharat -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html