Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)
On Fri, Dec 03, 2010 at 05:38:06PM -0600, Anthony Liguori wrote: On 12/03/2010 05:32 PM, Joerg Roedel wrote: On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote: + if (yield_on_hlt) + min |= CPU_BASED_HLT_EXITING; This approach won't work out on AMD because in HLT the CPU may enter C1e. In C1e the local apic timer interupt is not delivered anymore and when this is the current timer in use the cpu may miss timer ticks or never comes out of HLT again. The guest has no chance to work around this as the Linux idle routine does. And this doesn't break old software on bare metal? Yes it does. In fact, this behavior is documented as Erratum 400 for AMD CPUs. Linux has a workaround for it for quite some time. You can have a look at the c1e_idle routine for details. C1e can also be disabled by the OS. But there are BIOSes which re-enable it in SMI. So there is the chance that it gets re-enabled whithout an vmexit. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)
In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. A similar option is possible for AMD but I don't have easy access to AMD test hardware. Signed-off-by: Anthony Liguori aligu...@us.ibm.com --- v3 - v2 - Clear HLT activity state on exception injection to fix issue with async PF v1 - v2 - Rename parameter to yield_on_hlt - Remove __read_mostly diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 42d9590..9642c22 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -297,6 +297,12 @@ enum vmcs_field { #define GUEST_INTR_STATE_SMI 0x0004 #define GUEST_INTR_STATE_NMI 0x0008 +/* GUEST_ACTIVITY_STATE flags */ +#define GUEST_ACTIVITY_ACTIVE 0 +#define GUEST_ACTIVITY_HLT 1 +#define GUEST_ACTIVITY_SHUTDOWN2 +#define GUEST_ACTIVITY_WAIT_SIPI 3 + /* * Exit Qualifications for MOV for Control Register Access */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index caa967e..e8e64cb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -69,6 +69,9 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO); static int __read_mostly vmm_exclusive = 1; module_param(vmm_exclusive, bool, S_IRUGO); +static int yield_on_hlt = 1; +module_param(yield_on_hlt, bool, S_IRUGO); + #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST \ (X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD) #define KVM_GUEST_CR0_MASK \ @@ -1016,6 +1019,10 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr, struct vcpu_vmx *vmx = to_vmx(vcpu); u32 intr_info = nr | INTR_INFO_VALID_MASK; +/* Cannot inject an exception in guest activity state is HLT */ + if (vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT) + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); + if (has_error_code) { vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code); intr_info |= INTR_INFO_DELIVER_CODE_MASK; @@ -1419,7 +1426,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) _pin_based_exec_control) 0) return -EIO; - min = CPU_BASED_HLT_EXITING | + min = #ifdef CONFIG_X86_64 CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING | @@ -1432,6 +1439,10 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) CPU_BASED_MWAIT_EXITING | CPU_BASED_MONITOR_EXITING | CPU_BASED_INVLPG_EXITING; + + if (yield_on_hlt) + min |= CPU_BASED_HLT_EXITING; + opt = CPU_BASED_TPR_SHADOW | CPU_BASED_USE_MSR_BITMAPS | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)
On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote: + if (yield_on_hlt) + min |= CPU_BASED_HLT_EXITING; This approach won't work out on AMD because in HLT the CPU may enter C1e. In C1e the local apic timer interupt is not delivered anymore and when this is the current timer in use the cpu may miss timer ticks or never comes out of HLT again. The guest has no chance to work around this as the Linux idle routine does. If you really wan't active idling of a guest, it should idle in the hypervisor where it can work around such problems. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)
On 12/03/2010 05:32 PM, Joerg Roedel wrote: On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote: + if (yield_on_hlt) + min |= CPU_BASED_HLT_EXITING; This approach won't work out on AMD because in HLT the CPU may enter C1e. In C1e the local apic timer interupt is not delivered anymore and when this is the current timer in use the cpu may miss timer ticks or never comes out of HLT again. The guest has no chance to work around this as the Linux idle routine does. And this doesn't break old software on bare metal? Regards, Anthony Liguori If you really wan't active idling of a guest, it should idle in the hypervisor where it can work around such problems. Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html