Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)

2010-12-04 Thread Joerg Roedel
On Fri, Dec 03, 2010 at 05:38:06PM -0600, Anthony Liguori wrote:
 On 12/03/2010 05:32 PM, Joerg Roedel wrote:
 On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote:

 +   if (yield_on_hlt)
 +   min |= CPU_BASED_HLT_EXITING;
  
 This approach won't work out on AMD because in HLT the CPU may enter
 C1e. In C1e the local apic timer interupt is not delivered anymore and
 when this is the current timer in use the cpu may miss timer ticks or
 never comes out of HLT again. The guest has no chance to work around
 this as the Linux idle routine does.


 And this doesn't break old software on bare metal?

Yes it does. In fact, this behavior is documented as Erratum 400 for AMD
CPUs. Linux has a workaround for it for quite some time. You can have a
look at the c1e_idle routine for details.
C1e can also be disabled by the OS. But there are BIOSes which re-enable
it in SMI. So there is the chance that it gets re-enabled whithout an
vmexit.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)

2010-12-03 Thread Anthony Liguori
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling.  There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.

Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.  A similar option is possible
for AMD but I don't have easy access to AMD test hardware.

Signed-off-by: Anthony Liguori aligu...@us.ibm.com
---
v3 - v2
 - Clear HLT activity state on exception injection to fix issue with async PF

v1 - v2
 - Rename parameter to yield_on_hlt
 - Remove __read_mostly

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 42d9590..9642c22 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -297,6 +297,12 @@ enum vmcs_field {
 #define GUEST_INTR_STATE_SMI   0x0004
 #define GUEST_INTR_STATE_NMI   0x0008
 
+/* GUEST_ACTIVITY_STATE flags */
+#define GUEST_ACTIVITY_ACTIVE  0
+#define GUEST_ACTIVITY_HLT 1
+#define GUEST_ACTIVITY_SHUTDOWN2
+#define GUEST_ACTIVITY_WAIT_SIPI   3
+
 /*
  * Exit Qualifications for MOV for Control Register Access
  */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index caa967e..e8e64cb 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -69,6 +69,9 @@ module_param(emulate_invalid_guest_state, bool, S_IRUGO);
 static int __read_mostly vmm_exclusive = 1;
 module_param(vmm_exclusive, bool, S_IRUGO);
 
+static int yield_on_hlt = 1;
+module_param(yield_on_hlt, bool, S_IRUGO);
+
 #define KVM_GUEST_CR0_MASK_UNRESTRICTED_GUEST  \
(X86_CR0_WP | X86_CR0_NE | X86_CR0_NW | X86_CR0_CD)
 #define KVM_GUEST_CR0_MASK \
@@ -1016,6 +1019,10 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu, 
unsigned nr,
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 intr_info = nr | INTR_INFO_VALID_MASK;
 
+/* Cannot inject an exception in guest activity state is HLT */
+   if (vmcs_read32(GUEST_ACTIVITY_STATE) == GUEST_ACTIVITY_HLT)
+   vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE);
+
if (has_error_code) {
vmcs_write32(VM_ENTRY_EXCEPTION_ERROR_CODE, error_code);
intr_info |= INTR_INFO_DELIVER_CODE_MASK;
@@ -1419,7 +1426,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
_pin_based_exec_control)  0)
return -EIO;
 
-   min = CPU_BASED_HLT_EXITING |
+   min =
 #ifdef CONFIG_X86_64
  CPU_BASED_CR8_LOAD_EXITING |
  CPU_BASED_CR8_STORE_EXITING |
@@ -1432,6 +1439,10 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
  CPU_BASED_MWAIT_EXITING |
  CPU_BASED_MONITOR_EXITING |
  CPU_BASED_INVLPG_EXITING;
+
+   if (yield_on_hlt)
+   min |= CPU_BASED_HLT_EXITING;
+
opt = CPU_BASED_TPR_SHADOW |
  CPU_BASED_USE_MSR_BITMAPS |
  CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
-- 
1.7.0.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)

2010-12-03 Thread Joerg Roedel
On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote:
 + if (yield_on_hlt)
 + min |= CPU_BASED_HLT_EXITING;

This approach won't work out on AMD because in HLT the CPU may enter
C1e. In C1e the local apic timer interupt is not delivered anymore and
when this is the current timer in use the cpu may miss timer ticks or
never comes out of HLT again. The guest has no chance to work around
this as the Linux idle routine does. 
If you really wan't active idling of a guest, it should idle in the
hypervisor where it can work around such problems.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-vmx: add module parameter to avoid trapping HLT instructions (v3)

2010-12-03 Thread Anthony Liguori

On 12/03/2010 05:32 PM, Joerg Roedel wrote:

On Fri, Dec 03, 2010 at 04:39:22PM -0600, Anthony Liguori wrote:
   

+   if (yield_on_hlt)
+   min |= CPU_BASED_HLT_EXITING;
 

This approach won't work out on AMD because in HLT the CPU may enter
C1e. In C1e the local apic timer interupt is not delivered anymore and
when this is the current timer in use the cpu may miss timer ticks or
never comes out of HLT again. The guest has no chance to work around
this as the Linux idle routine does.
   


And this doesn't break old software on bare metal?

Regards,

Anthony Liguori


If you really wan't active idling of a guest, it should idle in the
hypervisor where it can work around such problems.

Joerg

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html