Re: [PATCH 3/24] Implement VMXON and VMXOFF

2010-06-16 Thread Nadav Har'El
On Tue, Jun 15, 2010, Marcelo Tosatti wrote about Re: [PATCH 3/24] Implement 
VMXON and VMXOFF:
  +   if (!(vcpu-arch.cr4  X86_CR4_VMXE) ||
  +   !(vcpu-arch.cr0  X86_CR0_PE) ||
 
 kvm_read_cr0_bits, kvm_read_cr4_bits.

Thanks. I'll change that.

-- 
Nadav Har'El|Wednesday, Jun 16 2010, 4 Tammuz 5770
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |In Fortran, God is real unless declared
http://nadav.harel.org.il   |an integer.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/24] Implement VMXON and VMXOFF

2010-06-16 Thread Nadav Har'El
Hi,

On Mon, Jun 14, 2010, Avi Kivity wrote about Re: [PATCH 3/24] Implement VMXON 
and VMXOFF:
 On 06/13/2010 03:24 PM, Nadav Har'El wrote:
 This patch allows a guest to use the VMXON and VMXOFF instructions, and
 emulates them accordingly. Basically this amounts to checking some
 prerequisites, and then remembering whether the guest has enabled or 
 disabled
 VMX operation.
 
 Should probably reorder with next patch.

I can't do this if I want the code to compile after each patch, because the
next patch (controlling when setting cr4.VMXE can be set) needs to check
whether VMXON was done.

 Please (here and elsewhere) use the standard kernel style for multiline 
 comments - start with /* on a line by itself.

Sure, sorry about that. I guess I need to (re)read the Linux coding style
document.

 +vmx-nested.vmxon = 1;

 = true

I'll change that. I learned C more than a decade before the advent of
stdbool.h, so in my mind, 1 has always been, and still is, the right and
only way to write true... But of course it doesn't mean I need to inflict
my old style on everybody else ;-)

 Need to block INIT signals in the local apic as well (fine for a 
 separate patch).

I've been looking into how I might best go about achieving this.

The APIC_DM_INIT handler is in lapic.c, which is not aware of VMX or
(obviously) nested VMX. So I need to add some sort of generic block INIT
flag which that code will check. Is this the sort of fix you had in mind?

A different change could be to write a handler for exit reason 3, which we 
get if there's a real INIT signal in the host; If we get exit reason 3 from
L2, we need to exit to L1 to handle it, while if we get exit reason 3 from
L1 that has done VMXON, we simply need to do nothing (according to the spec).

So I'm not sure which of these two things is what we really about. What kind
of scenario did you have in mind where this INIT business is relevant?



Here is the patch with your above comments fixed *except* the INIT thing:

---
Subject: [PATCH 3/24] Implement VMXON and VMXOFF

This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
--- .before/arch/x86/kvm/vmx.c  2010-06-16 13:20:19.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2010-06-16 13:20:19.0 +0300
@@ -125,6 +125,17 @@ struct shared_msr_entry {
u64 mask;
 };
 
+/*
+ * The nested_vmx structure is part of vcpu_vmx, and holds information we need
+ * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example,
+ * the current VMCS set by L1, a list of the VMCSs used to run the active
+ * L2 guests on the hardware, and more.
+ */
+struct nested_vmx {
+   /* Has the level1 guest done vmxon? */
+   bool vmxon;
+};
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
struct list_head  local_vcpus_link;
@@ -176,6 +187,9 @@ struct vcpu_vmx {
u32 exit_reason;
 
bool rdtscp_enabled;
+
+   /* Support for guest hypervisors (nested VMX) */
+   struct nested_vmx nested;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -3361,6 +3375,90 @@ static int handle_vmx_insn(struct kvm_vc
return 1;
 }
 
+/*
+ * Emulate the VMXON instruction.
+ * Currently, we just remember that VMX is active, and do not save or even
+ * inspect the argument to VMXON (the so-called VMXON pointer) because we
+ * do not currently need to store anything in that guest-allocated memory
+ * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their
+ * argument is different from the VMXON pointer (which the spec says they do).
+ */
+static int handle_vmon(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /* The Intel VMX Instruction Reference lists a bunch of bits that
+* are prerequisite to running VMXON, most notably CR4.VMXE must be
+* set to 1. Otherwise, we should fail with #UD. We test these now:
+*/
+   if (!nested ||
+   !kvm_read_cr4_bits(vcpu, X86_CR4_VMXE) ||
+   !kvm_read_cr0_bits(vcpu, X86_CR0_PE) ||
+   (vmx_get_rflags(vcpu)  X86_EFLAGS_VM)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   vmx_get_segment(vcpu, cs, VCPU_SREG_CS);
+   if (is_long_mode(vcpu)  !cs.l) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (vmx_get_cpl(vcpu)) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   vmx-nested.vmxon = true;
+
+   skip_emulated_instruction(vcpu);
+   return 1;
+}
+
+/*
+ * Intel's VMX Instruction Reference specifies a common set of prerequisites
+ * for running VMX instructions (except VMXON, whose prerequisites are
+ * slightly

Re: [PATCH 3/24] Implement VMXON and VMXOFF

2010-06-16 Thread Avi Kivity

On 06/16/2010 02:14 PM, Nadav Har'El wrote:

Hi,

On Mon, Jun 14, 2010, Avi Kivity wrote about Re: [PATCH 3/24] Implement VMXON and 
VMXOFF:
   

On 06/13/2010 03:24 PM, Nadav Har'El wrote:
 

This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or
disabled
VMX operation.
   

Should probably reorder with next patch.
 

I can't do this if I want the code to compile after each patch, because the
next patch (controlling when setting cr4.VMXE can be set) needs to check
whether VMXON was done.
   


You can have this patch add the vmxon check.  But it doesn't matter too 
much, you can keep the current order.



Need to block INIT signals in the local apic as well (fine for a
separate patch).
 

I've been looking into how I might best go about achieving this.

The APIC_DM_INIT handler is in lapic.c, which is not aware of VMX or
(obviously) nested VMX. So I need to add some sort of generic block INIT
flag which that code will check. Is this the sort of fix you had in mind?
   


It's not enough to block INIT, there is also exit reason 3, INIT 
signal.  So you need to call x86.c code from the lapic, which needs to 
call a kvm_x86_op hook which lets vmx.c decide whether the INIT needs to 
be intercepted or not, and what to do with it (ignore in root mode, exit 
in non-root mode)


Note the check needs to be done in vcpu context, not during delivery as 
it is done now.  So we probably need a KVM_REQ_INIT bit in 
vcpu-requests, which we can check during guest entry where we know if 
we're in root or non-root mode.


Pretty complicated and esoteric.  We can defer this now while we work 
out more immediate issues, but it needs to be addressed.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/24] Implement VMXON and VMXOFF

2010-06-15 Thread Marcelo Tosatti
On Sun, Jun 13, 2010 at 03:24:06PM +0300, Nadav Har'El wrote:
 This patch allows a guest to use the VMXON and VMXOFF instructions, and
 emulates them accordingly. Basically this amounts to checking some
 prerequisites, and then remembering whether the guest has enabled or disabled
 VMX operation.
 
 Signed-off-by: Nadav Har'El n...@il.ibm.com
 ---
 --- .before/arch/x86/kvm/vmx.c2010-06-13 15:01:28.0 +0300
 +++ .after/arch/x86/kvm/vmx.c 2010-06-13 15:01:28.0 +0300
 @@ -117,6 +117,16 @@ struct shared_msr_entry {
   u64 mask;
  };
  
 +/* The nested_vmx structure is part of vcpu_vmx, and holds information we 
 need
 + * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example,
 + * the current VMCS set by L1, a list of the VMCSs used to run the active
 + * L2 guests on the hardware, and more.
 + */
 +struct nested_vmx {
 + /* Has the level1 guest done vmxon? */
 + bool vmxon;
 +};
 +
  struct vcpu_vmx {
   struct kvm_vcpu   vcpu;
   struct list_head  local_vcpus_link;
 @@ -168,6 +178,9 @@ struct vcpu_vmx {
   u32 exit_reason;
  
   bool rdtscp_enabled;
 +
 + /* Support for guest hypervisors (nested VMX) */
 + struct nested_vmx nested;
  };
  
  static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
 @@ -3353,6 +3366,93 @@ static int handle_vmx_insn(struct kvm_vc
   return 1;
  }
  
 +/* Emulate the VMXON instruction.
 + * Currently, we just remember that VMX is active, and do not save or even
 + * inspect the argument to VMXON (the so-called VMXON pointer) because we
 + * do not currently need to store anything in that guest-allocated memory
 + * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the 
 their
 + * argument is different from the VMXON pointer (which the spec says they 
 do).
 + */
 +static int handle_vmon(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_segment cs;
 + struct vcpu_vmx *vmx = to_vmx(vcpu);
 +
 + /* The Intel VMX Instruction Reference lists a bunch of bits that
 +  * are prerequisite to running VMXON, most notably CR4.VMXE must be
 +  * set to 1. Otherwise, we should fail with #UD. We test these now:
 +  */
 + if (!nested) {
 + kvm_queue_exception(vcpu, UD_VECTOR);
 + return 1;
 + }
 +
 + if (!(vcpu-arch.cr4  X86_CR4_VMXE) ||
 + !(vcpu-arch.cr0  X86_CR0_PE) ||

kvm_read_cr0_bits, kvm_read_cr4_bits.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/24] Implement VMXON and VMXOFF

2010-06-14 Thread Avi Kivity

On 06/13/2010 03:24 PM, Nadav Har'El wrote:

This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.

   


Should probably reorder with next patch.


+/* The nested_vmx structure is part of vcpu_vmx, and holds information we need
+ * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example,
+ * the current VMCS set by L1, a list of the VMCSs used to run the active
+ * L2 guests on the hardware, and more.
+ */
   


Please (here and elsewhere) use the standard kernel style for multiline 
comments - start with /* on a line by itself.




+/* Emulate the VMXON instruction.
+ * Currently, we just remember that VMX is active, and do not save or even
+ * inspect the argument to VMXON (the so-called VMXON pointer) because we
+ * do not currently need to store anything in that guest-allocated memory
+ * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their
+ * argument is different from the VMXON pointer (which the spec says they do).
+ */
+static int handle_vmon(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /* The Intel VMX Instruction Reference lists a bunch of bits that
+* are prerequisite to running VMXON, most notably CR4.VMXE must be
+* set to 1. Otherwise, we should fail with #UD. We test these now:
+*/
+   if (!nested) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (!(vcpu-arch.cr4  X86_CR4_VMXE) ||
+   !(vcpu-arch.cr0  X86_CR0_PE) ||
+   (vmx_get_rflags(vcpu)  X86_EFLAGS_VM)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   vmx_get_segment(vcpu,cs, VCPU_SREG_CS);
+   if (is_long_mode(vcpu)  !cs.l) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (vmx_get_cpl(vcpu)) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   vmx-nested.vmxon = 1;
   


= true


+
+   skip_emulated_instruction(vcpu);
+   return 1;
+}
   


Need to block INIT signals in the local apic as well (fine for a 
separate patch).


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/24] Implement VMXON and VMXOFF

2010-06-13 Thread Nadav Har'El
This patch allows a guest to use the VMXON and VMXOFF instructions, and
emulates them accordingly. Basically this amounts to checking some
prerequisites, and then remembering whether the guest has enabled or disabled
VMX operation.

Signed-off-by: Nadav Har'El n...@il.ibm.com
---
--- .before/arch/x86/kvm/vmx.c  2010-06-13 15:01:28.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2010-06-13 15:01:28.0 +0300
@@ -117,6 +117,16 @@ struct shared_msr_entry {
u64 mask;
 };
 
+/* The nested_vmx structure is part of vcpu_vmx, and holds information we need
+ * for correct emulation of VMX (i.e., nested VMX) on this vcpu. For example,
+ * the current VMCS set by L1, a list of the VMCSs used to run the active
+ * L2 guests on the hardware, and more.
+ */
+struct nested_vmx {
+   /* Has the level1 guest done vmxon? */
+   bool vmxon;
+};
+
 struct vcpu_vmx {
struct kvm_vcpu   vcpu;
struct list_head  local_vcpus_link;
@@ -168,6 +178,9 @@ struct vcpu_vmx {
u32 exit_reason;
 
bool rdtscp_enabled;
+
+   /* Support for guest hypervisors (nested VMX) */
+   struct nested_vmx nested;
 };
 
 static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -3353,6 +3366,93 @@ static int handle_vmx_insn(struct kvm_vc
return 1;
 }
 
+/* Emulate the VMXON instruction.
+ * Currently, we just remember that VMX is active, and do not save or even
+ * inspect the argument to VMXON (the so-called VMXON pointer) because we
+ * do not currently need to store anything in that guest-allocated memory
+ * region. Consequently, VMCLEAR and VMPTRLD also do not verify that the their
+ * argument is different from the VMXON pointer (which the spec says they do).
+ */
+static int handle_vmon(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   /* The Intel VMX Instruction Reference lists a bunch of bits that
+* are prerequisite to running VMXON, most notably CR4.VMXE must be
+* set to 1. Otherwise, we should fail with #UD. We test these now:
+*/
+   if (!nested) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (!(vcpu-arch.cr4  X86_CR4_VMXE) ||
+   !(vcpu-arch.cr0  X86_CR0_PE) ||
+   (vmx_get_rflags(vcpu)  X86_EFLAGS_VM)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   vmx_get_segment(vcpu, cs, VCPU_SREG_CS);
+   if (is_long_mode(vcpu)  !cs.l) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   if (vmx_get_cpl(vcpu)) {
+   kvm_inject_gp(vcpu, 0);
+   return 1;
+   }
+
+   vmx-nested.vmxon = 1;
+
+   skip_emulated_instruction(vcpu);
+   return 1;
+}
+
+/*
+ * Intel's VMX Instruction Reference specifies a common set of prerequisites
+ * for running VMX instructions (except VMXON, whose prerequisites are
+ * slightly different). It also specifies what exception to inject otherwise.
+ */
+static int nested_vmx_check_permission(struct kvm_vcpu *vcpu)
+{
+   struct kvm_segment cs;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (!vmx-nested.vmxon) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 0;
+   }
+
+   vmx_get_segment(vcpu, cs, VCPU_SREG_CS);
+   if ((vmx_get_rflags(vcpu)  X86_EFLAGS_VM) ||
+   (is_long_mode(vcpu)  !cs.l)) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 0;
+   }
+
+   if (vmx_get_cpl(vcpu)) {
+   kvm_inject_gp(vcpu, 0);
+   return 0;
+   }
+
+   return 1;
+}
+
+/* Emulate the VMXOFF instruction */
+static int handle_vmoff(struct kvm_vcpu *vcpu)
+{
+   if (!nested_vmx_check_permission(vcpu))
+   return 1;
+
+   to_vmx(vcpu)-nested.vmxon = 0;
+
+   skip_emulated_instruction(vcpu);
+   return 1;
+}
+
 static int handle_invlpg(struct kvm_vcpu *vcpu)
 {
unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
@@ -3642,8 +3742,8 @@ static int (*kvm_vmx_exit_handlers[])(st
[EXIT_REASON_VMREAD]  = handle_vmx_insn,
[EXIT_REASON_VMRESUME]= handle_vmx_insn,
[EXIT_REASON_VMWRITE] = handle_vmx_insn,
-   [EXIT_REASON_VMOFF]   = handle_vmx_insn,
-   [EXIT_REASON_VMON]= handle_vmx_insn,
+   [EXIT_REASON_VMOFF]   = handle_vmoff,
+   [EXIT_REASON_VMON]= handle_vmon,
[EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
[EXIT_REASON_APIC_ACCESS] = handle_apic_access,
[EXIT_REASON_WBINVD]  = handle_wbinvd,
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at