Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-27 Thread Paolo Bonzini
On 27/04/2018 17:19, Jim Mattson wrote:
> 
> If the default treatment of SMIs and SMM (see Section 34.14) is
> active, the VMX-preemption timer counts across an SMI to VMX non-root
> operation, subsequent execution in SMM, and the return from SMM via
> the RSM instruction. However, the timer can cause a VM exit only from
> VMX non-root operation. If the timer expires during SMI, in SMM, or
> during RSM, a timer-induced VM exit occurs immediately after RSM with
> its normal priority unless it is blocked based on activity state
> (Section 25.2).
> 
> So, there's no loophole here that allows us to reset the VMX
> preemption timer when restoring nested state.

Or when an SMI occurs.  So the expiration TSC of the preemption timer
should be stored into an "artificial" field of the vmcs12 at vmentry
time and later reused.

vmx->nested.smm.guest_node should also be saved...

Paolo

> As a follow-on change, we should probably fix this.



Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-27 Thread Jim Mattson
On Fri, Apr 27, 2018 at 3:03 AM, Paolo Bonzini  wrote:
> On 27/04/2018 00:28, Jim Mattson wrote:
>> The other thing that comes to mind is that there are some new fields
>> in the VMCS12 since I first implemented this. One potentially
>> troublesome field is the VMX preemption timer. If the current timer
>> value is not saved on VM-exit, then it won't be stashed in the shadow
>> VMCS12 by sync_vmcs12. Post-migration, the timer will be reset to its
>> original value.
>>
>> Do we care? Is this any different from what happens on real hardware
>> when there's an SMI? According to the SDM, this appears to be exacty
>> what happens when the dual-monitor treatment of SMIs and SMM is
>> active, but it's not clear what happens with the default treatment of
>> SMIs and SMM.
>
> I think it should be the same, because the preemption timer countdown is
> not part of the VMX-critical state.
>
> Paolo

Section 25.5.1 of the SDM says:

If the default treatment of SMIs and SMM (see Section 34.14) is
active, the VMX-preemption timer counts across an SMI to VMX non-root
operation, subsequent execution in SMM, and the return from SMM via
the RSM instruction. However, the timer can cause a VM exit only from
VMX non-root operation. If the timer expires during SMI, in SMM, or
during RSM, a timer-induced VM exit occurs immediately after RSM with
its normal priority unless it is blocked based on activity state
(Section 25.2).

So, there's no loophole here that allows us to reset the VMX
preemption timer when restoring nested state.

As a follow-on change, we should probably fix this.


Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-27 Thread Paolo Bonzini
On 27/04/2018 00:28, Jim Mattson wrote:
> The other thing that comes to mind is that there are some new fields
> in the VMCS12 since I first implemented this. One potentially
> troublesome field is the VMX preemption timer. If the current timer
> value is not saved on VM-exit, then it won't be stashed in the shadow
> VMCS12 by sync_vmcs12. Post-migration, the timer will be reset to its
> original value.
> 
> Do we care? Is this any different from what happens on real hardware
> when there's an SMI? According to the SDM, this appears to be exacty
> what happens when the dual-monitor treatment of SMIs and SMM is
> active, but it's not clear what happens with the default treatment of
> SMIs and SMM.

I think it should be the same, because the preemption timer countdown is
not part of the VMX-critical state.

Paolo


Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-26 Thread Jim Mattson
I'll send out a patch to deal with nested_run_pending.

The other thing that comes to mind is that there are some new fields
in the VMCS12 since I first implemented this. One potentially
troublesome field is the VMX preemption timer. If the current timer
value is not saved on VM-exit, then it won't be stashed in the shadow
VMCS12 by sync_vmcs12. Post-migration, the timer will be reset to its
original value.

Do we care? Is this any different from what happens on real hardware
when there's an SMI? According to the SDM, this appears to be exacty
what happens when the dual-monitor treatment of SMIs and SMM is
active, but it's not clear what happens with the default treatment of
SMIs and SMM.

On Mon, Apr 16, 2018 at 10:15 AM, Raslan, KarimAllah  wrote:
> On Mon, 2018-04-16 at 09:22 -0700, Jim Mattson wrote:
>> On Thu, Apr 12, 2018 at 8:12 AM, KarimAllah Ahmed  wrote:
>>
>> >
>> > v2 -> v3:
>> > - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>> >   problem is solved.
>> > - Rebase again!
>> > - Set nested_run_pending during restore (not sure if it makes sense yet or
>> >   not).
>>
>> This doesn't actually make sense. Nested_run_pending should only be
>> set between L1 doing a VMLAUNCH/VMRESUME and the first instruction
>> executing in L2. That is extremely unlikely at a restore point.
>
> Yeah, I am afraid I put very little thought into it as I was focused
> on the TSC issue :)
>
> Will handle it properly in next version.
>
>>
>> To deal with nested_run_pending and nested save/restore,
>> nested_run_pending should be set to 1 before calling
>> enter_vmx_non_root_mode, as it was prior to commit 7af40ad37b3f. That
>> means that it has to be cleared when emulating VM-entry to the halted
>> state (prior to calling kvm_vcpu_halt). And all of the from_vmentry
>> arguments that Paolo added when rebasing commit cf8b84f48a59 should be
>> removed, so that nested_run_pending is propagated correctly duting a
>> restore.
>>
>> It should be possible to eliminate this strange little wart, but I
>> haven't looked deeply into it.
>>
> Amazon Development Center Germany GmbH
> Berlin - Dresden - Aachen
> main office: Krausenstr. 38, 10117 Berlin
> Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> Ust-ID: DE289237879
> Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-16 Thread Raslan, KarimAllah
On Mon, 2018-04-16 at 09:22 -0700, Jim Mattson wrote:
> On Thu, Apr 12, 2018 at 8:12 AM, KarimAllah Ahmed  wrote:
> 
> > 
> > v2 -> v3:
> > - Remove the forced VMExit from L2 after reading the kvm_state. The actual
> >   problem is solved.
> > - Rebase again!
> > - Set nested_run_pending during restore (not sure if it makes sense yet or
> >   not).
> 
> This doesn't actually make sense. Nested_run_pending should only be
> set between L1 doing a VMLAUNCH/VMRESUME and the first instruction
> executing in L2. That is extremely unlikely at a restore point.

Yeah, I am afraid I put very little thought into it as I was focused
on the TSC issue :)

Will handle it properly in next version.

> 
> To deal with nested_run_pending and nested save/restore,
> nested_run_pending should be set to 1 before calling
> enter_vmx_non_root_mode, as it was prior to commit 7af40ad37b3f. That
> means that it has to be cleared when emulating VM-entry to the halted
> state (prior to calling kvm_vcpu_halt). And all of the from_vmentry
> arguments that Paolo added when rebasing commit cf8b84f48a59 should be
> removed, so that nested_run_pending is propagated correctly duting a
> restore.
> 
> It should be possible to eliminate this strange little wart, but I
> haven't looked deeply into it.
> 
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B


Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-16 Thread Jim Mattson
On Thu, Apr 12, 2018 at 8:12 AM, KarimAllah Ahmed  wrote:

> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>   problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
>   not).

This doesn't actually make sense. Nested_run_pending should only be
set between L1 doing a VMLAUNCH/VMRESUME and the first instruction
executing in L2. That is extremely unlikely at a restore point.

To deal with nested_run_pending and nested save/restore,
nested_run_pending should be set to 1 before calling
enter_vmx_non_root_mode, as it was prior to commit 7af40ad37b3f. That
means that it has to be cleared when emulating VM-entry to the halted
state (prior to calling kvm_vcpu_halt). And all of the from_vmentry
arguments that Paolo added when rebasing commit cf8b84f48a59 should be
removed, so that nested_run_pending is propagated correctly duting a
restore.

It should be possible to eliminate this strange little wart, but I
haven't looked deeply into it.


Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-14 Thread Raslan, KarimAllah
On Sat, 2018-04-14 at 15:56 +, Raslan, KarimAllah wrote:
> On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote:
> > 
> > From: Jim Mattson 
> > 
> > For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> > this state can not be captured through the currently available IOCTLs. In
> > fact the state captured through all of these IOCTLs is usually a mix of L1
> > and L2 state. It is also dependent on whether the L2 guest was running at
> > the moment when the process was interrupted to save its state.
> > 
> > With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and
> > KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is
> > in VMX operation.
> > 
> > Cc: Paolo Bonzini 
> > Cc: Radim Krčmář 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: H. Peter Anvin 
> > Cc: x...@kernel.org
> > Cc: k...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Signed-off-by: Jim Mattson 
> > [karahmed@ - rename structs and functions and make them ready for AMD and
> >  address previous comments.
> >- rebase & a bit of refactoring.
> >- Merge 7/8 and 8/8 into one patch.
> >- Force a VMExit from L2 after reading the kvm_state to avoid
> >  mixed state between L1 and L2 on resurrecting the instance. ]
> > Signed-off-by: KarimAllah Ahmed 
> > ---
> > v2 -> v3:
> > - Remove the forced VMExit from L2 after reading the kvm_state. The actual
> >   problem is solved.
> > - Rebase again!
> > - Set nested_run_pending during restore (not sure if it makes sense yet or
> >   not).
> > - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
> >   to switch everything to u64)
> > 
> > v1 -> v2:
> > - Rename structs and functions and make them ready for AMD and address
> >   previous comments.
> > - Rebase & a bit of refactoring.
> > - Merge 7/8 and 8/8 into one patch.
> > - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
> >   between L1 and L2 on resurrecting the instance.
> > ---
> >  Documentation/virtual/kvm/api.txt |  47 ++
> >  arch/x86/include/asm/kvm_host.h   |   7 ++
> >  arch/x86/include/uapi/asm/kvm.h   |  38 
> >  arch/x86/kvm/vmx.c| 177 
> > +-
> >  arch/x86/kvm/x86.c|  21 +
> >  include/linux/kvm_host.h  |   2 +-
> >  include/uapi/linux/kvm.h  |   5 ++
> >  7 files changed, 292 insertions(+), 5 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index 1c7958b..c51d5d3 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -3548,6 +3548,53 @@ Returns: 0 on success,
> > -ENOENT on deassign if the conn_id isn't registered
> > -EEXIST on assign if the conn_id is already registered
> >  
> > +4.114 KVM_GET_STATE
> > +
> > +Capability: KVM_CAP_STATE
> > +Architectures: x86
> > +Type: vcpu ioctl
> > +Parameters: struct kvm_state (in/out)
> > +Returns: 0 on success, -1 on error
> > +Errors:
> > +  E2BIG: the data size exceeds the value of 'size' specified by
> > + the user (the size required will be written into size).
> > +
> > +struct kvm_state {
> > +   __u16 flags;
> > +   __u16 format;
> > +   __u32 size;
> > +   union {
> > +   struct kvm_vmx_state vmx;
> > +   struct kvm_svm_state svm;
> > +   __u8 pad[120];
> > +   };
> > +   __u8 data[0];
> > +};
> > +
> > +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace.
> > +
> > +4.115 KVM_SET_STATE
> > +
> > +Capability: KVM_CAP_STATE
> > +Architectures: x86
> > +Type: vcpu ioctl
> > +Parameters: struct kvm_state (in)
> > +Returns: 0 on success, -1 on error
> > +
> > +struct kvm_state {
> > +   __u16 flags;
> > +   __u16 format;
> > +   __u32 size;
> > +   union {
> > +   struct kvm_vmx_state vmx;
> > +   struct kvm_svm_state svm;
> > +   __u8 pad[120];
> > +   };
> > +   __u8 data[0];
> > +};
> > +
> > +This copies the vcpu's kvm_state struct from userspace to the kernel.
> > +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE
> >  
> >  5. The kvm_run structure
> >  
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index 9fa4f57..ad2116a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -75,6 +75,7 @@
> >  #define KVM_REQ_HV_EXITKVM_ARCH_REQ(21)
> >  #define KVM_REQ_HV_STIMER  KVM_ARCH_REQ(22)
> >  #define KVM_REQ_LOAD_EOI_EXITMAP   KVM_ARCH_REQ(23)
> > +#define KVM_REQ_GET_VMCS12_PAGES   KVM_ARCH_REQ(24)
> >  
> >  #define CR0_RESERVED_BITS   \
> > 

Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-14 Thread Raslan, KarimAllah
On Thu, 2018-04-12 at 17:12 +0200, KarimAllah Ahmed wrote:
> From: Jim Mattson 
> 
> For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> this state can not be captured through the currently available IOCTLs. In
> fact the state captured through all of these IOCTLs is usually a mix of L1
> and L2 state. It is also dependent on whether the L2 guest was running at
> the moment when the process was interrupted to save its state.
> 
> With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and
> KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is
> in VMX operation.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Jim Mattson 
> [karahmed@ - rename structs and functions and make them ready for AMD and
>  address previous comments.
>- rebase & a bit of refactoring.
>- Merge 7/8 and 8/8 into one patch.
>- Force a VMExit from L2 after reading the kvm_state to avoid
>  mixed state between L1 and L2 on resurrecting the instance. ]
> Signed-off-by: KarimAllah Ahmed 
> ---
> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>   problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
>   not).
> - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
>   to switch everything to u64)
> 
> v1 -> v2:
> - Rename structs and functions and make them ready for AMD and address
>   previous comments.
> - Rebase & a bit of refactoring.
> - Merge 7/8 and 8/8 into one patch.
> - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
>   between L1 and L2 on resurrecting the instance.
> ---
>  Documentation/virtual/kvm/api.txt |  47 ++
>  arch/x86/include/asm/kvm_host.h   |   7 ++
>  arch/x86/include/uapi/asm/kvm.h   |  38 
>  arch/x86/kvm/vmx.c| 177 
> +-
>  arch/x86/kvm/x86.c|  21 +
>  include/linux/kvm_host.h  |   2 +-
>  include/uapi/linux/kvm.h  |   5 ++
>  7 files changed, 292 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 1c7958b..c51d5d3 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -3548,6 +3548,53 @@ Returns: 0 on success,
>   -ENOENT on deassign if the conn_id isn't registered
>   -EEXIST on assign if the conn_id is already registered
>  
> +4.114 KVM_GET_STATE
> +
> +Capability: KVM_CAP_STATE
> +Architectures: x86
> +Type: vcpu ioctl
> +Parameters: struct kvm_state (in/out)
> +Returns: 0 on success, -1 on error
> +Errors:
> +  E2BIG: the data size exceeds the value of 'size' specified by
> + the user (the size required will be written into size).
> +
> +struct kvm_state {
> + __u16 flags;
> + __u16 format;
> + __u32 size;
> + union {
> + struct kvm_vmx_state vmx;
> + struct kvm_svm_state svm;
> + __u8 pad[120];
> + };
> + __u8 data[0];
> +};
> +
> +This ioctl copies the vcpu's kvm_state struct from the kernel to userspace.
> +
> +4.115 KVM_SET_STATE
> +
> +Capability: KVM_CAP_STATE
> +Architectures: x86
> +Type: vcpu ioctl
> +Parameters: struct kvm_state (in)
> +Returns: 0 on success, -1 on error
> +
> +struct kvm_state {
> + __u16 flags;
> + __u16 format;
> + __u32 size;
> + union {
> + struct kvm_vmx_state vmx;
> + struct kvm_svm_state svm;
> + __u8 pad[120];
> + };
> + __u8 data[0];
> +};
> +
> +This copies the vcpu's kvm_state struct from userspace to the kernel.
> +>>> 13a7c9e... kvm: nVMX: Introduce KVM_CAP_STATE
>  
>  5. The kvm_run structure
>  
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 9fa4f57..ad2116a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -75,6 +75,7 @@
>  #define KVM_REQ_HV_EXIT  KVM_ARCH_REQ(21)
>  #define KVM_REQ_HV_STIMERKVM_ARCH_REQ(22)
>  #define KVM_REQ_LOAD_EOI_EXITMAP KVM_ARCH_REQ(23)
> +#define KVM_REQ_GET_VMCS12_PAGES KVM_ARCH_REQ(24)
>  
>  #define CR0_RESERVED_BITS   \
>   (~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
> @@ -1084,6 +1085,12 @@ struct kvm_x86_ops {
>  
>   void (*setup_mce)(struct kvm_vcpu *vcpu);
>  
> + int (*get_state)(struct kvm_vcpu *vcpu,
> +  struct kvm_state __user 

Re: [PATCH 2/2] kvm: nVMX: Introduce KVM_CAP_STATE

2018-04-12 Thread Paolo Bonzini
On 12/04/2018 17:12, KarimAllah Ahmed wrote:
> From: Jim Mattson 
> 
> For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> this state can not be captured through the currently available IOCTLs. In
> fact the state captured through all of these IOCTLs is usually a mix of L1
> and L2 state. It is also dependent on whether the L2 guest was running at
> the moment when the process was interrupted to save its state.
> 
> With this capability, there are two new vcpu ioctls: KVM_GET_VMX_STATE and
> KVM_SET_VMX_STATE. These can be used for saving and restoring a VM that is
> in VMX operation.
> 
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: H. Peter Anvin 
> Cc: x...@kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Jim Mattson 
> [karahmed@ - rename structs and functions and make them ready for AMD and
>  address previous comments.
>- rebase & a bit of refactoring.
>- Merge 7/8 and 8/8 into one patch.
>- Force a VMExit from L2 after reading the kvm_state to avoid
>  mixed state between L1 and L2 on resurrecting the instance. ]
> Signed-off-by: KarimAllah Ahmed 
> ---
> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
>   problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
>   not).
> - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
>   to switch everything to u64)

You still have to rename everything to KVM_{CAP,GET,SET}_NESTED_STATE
(and {vmx_{get,set}_nested state) though. :)

Paolo