Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-21 Thread Maxim Levitsky
On Thu, 2021-01-14 at 16:14 -0800, Sean Christopherson wrote:
> On Thu, Jan 14, 2021, Maxim Levitsky wrote:
> > This is very helpful for debugging nested VMX issues.
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  arch/x86/kvm/trace.h  | 30 ++
> >  arch/x86/kvm/vmx/nested.c |  6 ++
> >  arch/x86/kvm/x86.c|  1 +
> >  3 files changed, 37 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> > index 2de30c20bc264..663d1b1d8bf64 100644
> > --- a/arch/x86/kvm/trace.h
> > +++ b/arch/x86/kvm/trace.h
> > @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun,
> > __entry->npt ? "on" : "off")
> >  );
> >  
> > +
> > +/*
> > + * Tracepoint for nested VMLAUNCH/VMRESUME
> 
> VM-Enter, as below.

Will do

> 
> > + */
> > +TRACE_EVENT(kvm_nested_vmlaunch_resume,
> 
> s/vmlaunc_resume/vmenter, both for consistency with other code and so that it
> can sanely be reused by SVM.  IMO, trace_kvm_entry is wrong :-).
SVM already has trace_kvm_nested_vmrun and it contains some SVM specific
stuff that doesn't exist on VMX and vise versa.
So I do want to keep these trace points separate.


> 
> > +   TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
> > +__u32 entry_intr_info),
> > +   TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
> > +
> > +   TP_STRUCT__entry(
> > +   __field(__u64,  rip )
> > +   __field(__u64,  vmcs)
> > +   __field(__u64,  nested_rip  )
> > +   __field(__u32,  entry_intr_info )
> > +   ),
> > +
> > +   TP_fast_assign(
> > +   __entry->rip= rip;
> > +   __entry->vmcs   = vmcs;
> > +   __entry->nested_rip = nested_rip;
> > +   __entry->entry_intr_info= entry_intr_info;
> > +   ),
> > +
> > +   TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
> > + "entry_intr_info: 0x%08x",
> > +   __entry->rip, __entry->vmcs, __entry->nested_rip,
> > +   __entry->entry_intr_info)
> > +);
> > +
> > +
> >  TRACE_EVENT(kvm_nested_intercepts,
> > TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
> >  __u32 intercept1, __u32 intercept2, __u32 intercept3),
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 776688f9d1017..cd51b66480d52 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status 
> > nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> > !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
> > vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
> >  
> > +   trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> 
> Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
> as is the case for the "true" nested VM-Enter path.

> 
> > +vmx->nested.current_vmptr,
> > +vmcs12->guest_rip,
> > +vmcs12->vm_entry_intr_info_field);
> 
> The placement is a bit funky.  I assume you put it here so that calls from
> vmx_set_nested_state() also get traced.  But, that also means
> vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
> some nested VM-Enters that VM-Fail will get traced, but others will not.
> 
> Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
> especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> migration case.
> 
> Not sure what would be a good answer.
> 
> > +
> > +
> > /*
> >  * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
> >  * nested early checks are disabled.  In the event of a "late" VM-Fail,
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a480804ae27a3..7c6e94e32100e 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
> > +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
> >  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
> > -- 
> > 2.26.2
> > 




Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-21 Thread Maxim Levitsky
On Fri, 2021-01-15 at 08:30 -0800, Sean Christopherson wrote:
> On Fri, Jan 15, 2021, Paolo Bonzini wrote:
> > On 15/01/21 01:14, Sean Christopherson wrote:
> > > > +   trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, 
> > > not L1
> > > as is the case for the "true" nested VM-Enter path.
> > 
> > It will be the previous RIP---might as well be 0xfff0 depending on what
> > userspace does.  I don't think you can do much better than that, using
> > vmcs12->host_rip would be confusing in the SMM case.
> > 
> > > > +vmx->nested.current_vmptr,
> > > > +vmcs12->guest_rip,
> > > > +
> > > > vmcs12->vm_entry_intr_info_field);
> > > The placement is a bit funky.  I assume you put it here so that calls from
> > > vmx_set_nested_state() also get traced.  But, that also means
> > > vmx_pre_leave_smm() will get traced, and it also creates some weirdness 
> > > where
> > > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > > 
> > > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be 
> > > confusing,
> > > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > > migration case.
> > 
> > Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes
> > sense so I'm not worried about that.
> 
> Ideally there would something in the tracepoint to differentiate the various
> cases.  Not that the RSM/migration cases will pop up often, but I think it's 
> an
> easily solved problem that could avoid confusion.
> 
> What if we captured vmx->nested.smm.guest_mode and from_vmentry, and 
> explicitly
> record what triggered the entry?
> 
>   TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx 
> intr_info: 0x%08x",
> __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : 
> "SET_STATE",
> __entry->rip, __entry->vmcs, __entry->nested_rip,
> __entry->entry_intr_info

I think that this is a good idea, but should be done in a separate patch.

> 
> Side topic, can we have an "official" ruling on whether KVM tracepoints should
> use colons and/or commas? And probably same question for whether or not to
> prepend zeros.  E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx
> vmcs: 0x%016llx".  It bugs me that we're so inconsistent.
> 

As I said the kvm tracing has a lot of things that can be imporoved, 
and as it is often the only way to figure out complex bugs as these I had to 
deal with recently,
I will do more improvements in this area as time permits.

Best regards,
Maxim Levitsky





Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-21 Thread Maxim Levitsky
On Fri, 2021-01-15 at 14:48 +0100, Paolo Bonzini wrote:
> On 15/01/21 01:14, Sean Christopherson wrote:
> > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not 
> > L1
> > as is the case for the "true" nested VM-Enter path.

Actually in this case, the initial RIP of 0xfff0 will be printed
which isn't that bad.

A tracepoint in nested state load function would be very nice to add
to mark this explicitly. I'll do this later.

> 
> It will be the previous RIP---might as well be 0xfff0 depending on 
> what userspace does.  I don't think you can do much better than that, 
> using vmcs12->host_rip would be confusing in the SMM case.
> 
> > > +  vmx->nested.current_vmptr,
> > > +  vmcs12->guest_rip,
> > > +  vmcs12->vm_entry_intr_info_field);
> > The placement is a bit funky.  I assume you put it here so that calls from
> > vmx_set_nested_state() also get traced.  But, that also means
> > vmx_pre_leave_smm() will get traced, and it also creates some weirdness 
> > where
> > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > 
> > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be 
> > confusing,
> > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > migration case.
> 
> Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes 
> sense so I'm not worried about that.
> 
> Paolo
> 

I agree with that and indeed this was my intention.

In fact I will change the svm's tracepoint to behave the same way
in the next patch series (I'll move it to enter_svm_guest_mode).

(When I wrote this patch I somehow thought that this is what SVM already does).

Best regards,
Maxim Levitsky





Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-15 Thread Sean Christopherson
On Fri, Jan 15, 2021, Paolo Bonzini wrote:
> On 15/01/21 01:14, Sean Christopherson wrote:
> > > + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
> > Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not 
> > L1
> > as is the case for the "true" nested VM-Enter path.
> 
> It will be the previous RIP---might as well be 0xfff0 depending on what
> userspace does.  I don't think you can do much better than that, using
> vmcs12->host_rip would be confusing in the SMM case.
> 
> > > +  vmx->nested.current_vmptr,
> > > +  vmcs12->guest_rip,
> > > +  vmcs12->vm_entry_intr_info_field);
> > The placement is a bit funky.  I assume you put it here so that calls from
> > vmx_set_nested_state() also get traced.  But, that also means
> > vmx_pre_leave_smm() will get traced, and it also creates some weirdness 
> > where
> > some nested VM-Enters that VM-Fail will get traced, but others will not.
> > 
> > Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be 
> > confusing,
> > especially if the debugger looks up the RIP and sees RSM.  Ditto for the
> > migration case.
> 
> Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes
> sense so I'm not worried about that.

Ideally there would something in the tracepoint to differentiate the various
cases.  Not that the RSM/migration cases will pop up often, but I think it's an
easily solved problem that could avoid confusion.

What if we captured vmx->nested.smm.guest_mode and from_vmentry, and explicitly
record what triggered the entry?

TP_printk("from: %s rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx 
intr_info: 0x%08x",
  __entry->vmenter ? "VM-Enter" : __entry->smm ? "RSM" : 
"SET_STATE",
  __entry->rip, __entry->vmcs, __entry->nested_rip,
  __entry->entry_intr_info

Side topic, can we have an "official" ruling on whether KVM tracepoints should
use colons and/or commas? And probably same question for whether or not to
prepend zeros.  E.g. kvm_entry has "vcpu %u, rip 0x%lx" versus "rip: 0x%016llx
vmcs: 0x%016llx".  It bugs me that we're so inconsistent.


Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-15 Thread Paolo Bonzini

On 15/01/21 01:14, Sean Christopherson wrote:

+   trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),

Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
as is the case for the "true" nested VM-Enter path.


It will be the previous RIP---might as well be 0xfff0 depending on 
what userspace does.  I don't think you can do much better than that, 
using vmcs12->host_rip would be confusing in the SMM case.



+vmx->nested.current_vmptr,
+vmcs12->guest_rip,
+vmcs12->vm_entry_intr_info_field);

The placement is a bit funky.  I assume you put it here so that calls from
vmx_set_nested_state() also get traced.  But, that also means
vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
some nested VM-Enters that VM-Fail will get traced, but others will not.

Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
especially if the debugger looks up the RIP and sees RSM.  Ditto for the
migration case.


Actually tracing vmx_pre_leave_smm() is good, and pointing to RSM makes 
sense so I'm not worried about that.


Paolo



Re: [PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-14 Thread Sean Christopherson
On Thu, Jan 14, 2021, Maxim Levitsky wrote:
> This is very helpful for debugging nested VMX issues.
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  arch/x86/kvm/trace.h  | 30 ++
>  arch/x86/kvm/vmx/nested.c |  6 ++
>  arch/x86/kvm/x86.c|  1 +
>  3 files changed, 37 insertions(+)
> 
> diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
> index 2de30c20bc264..663d1b1d8bf64 100644
> --- a/arch/x86/kvm/trace.h
> +++ b/arch/x86/kvm/trace.h
> @@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun,
>   __entry->npt ? "on" : "off")
>  );
>  
> +
> +/*
> + * Tracepoint for nested VMLAUNCH/VMRESUME

VM-Enter, as below.

> + */
> +TRACE_EVENT(kvm_nested_vmlaunch_resume,

s/vmlaunc_resume/vmenter, both for consistency with other code and so that it
can sanely be reused by SVM.  IMO, trace_kvm_entry is wrong :-).

> + TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
> +  __u32 entry_intr_info),
> + TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
> +
> + TP_STRUCT__entry(
> + __field(__u64,  rip )
> + __field(__u64,  vmcs)
> + __field(__u64,  nested_rip  )
> + __field(__u32,  entry_intr_info )
> + ),
> +
> + TP_fast_assign(
> + __entry->rip= rip;
> + __entry->vmcs   = vmcs;
> + __entry->nested_rip = nested_rip;
> + __entry->entry_intr_info= entry_intr_info;
> + ),
> +
> + TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
> +   "entry_intr_info: 0x%08x",
> + __entry->rip, __entry->vmcs, __entry->nested_rip,
> + __entry->entry_intr_info)
> +);
> +
> +
>  TRACE_EVENT(kvm_nested_intercepts,
>   TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
>__u32 intercept1, __u32 intercept2, __u32 intercept3),
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 776688f9d1017..cd51b66480d52 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status 
> nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>   !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
>   vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
>  
> + trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),

Hmm, won't this RIP be wrong for the migration case?  I.e. it'll be L2, not L1
as is the case for the "true" nested VM-Enter path.

> +  vmx->nested.current_vmptr,
> +  vmcs12->guest_rip,
> +  vmcs12->vm_entry_intr_info_field);

The placement is a bit funky.  I assume you put it here so that calls from
vmx_set_nested_state() also get traced.  But, that also means
vmx_pre_leave_smm() will get traced, and it also creates some weirdness where
some nested VM-Enters that VM-Fail will get traced, but others will not.

Tracing vmx_pre_leave_smm() isn't necessarily bad, but it could be confusing,
especially if the debugger looks up the RIP and sees RSM.  Ditto for the
migration case.

Not sure what would be a good answer.

> +
> +
>   /*
>* Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
>* nested early checks are disabled.  In the event of a "late" VM-Fail,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a480804ae27a3..7c6e94e32100e 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
>  EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
> -- 
> 2.26.2
> 


[PATCH v2 2/3] KVM: nVMX: add kvm_nested_vmlaunch_resume tracepoint

2021-01-14 Thread Maxim Levitsky
This is very helpful for debugging nested VMX issues.

Signed-off-by: Maxim Levitsky 
---
 arch/x86/kvm/trace.h  | 30 ++
 arch/x86/kvm/vmx/nested.c |  6 ++
 arch/x86/kvm/x86.c|  1 +
 3 files changed, 37 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 2de30c20bc264..663d1b1d8bf64 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -554,6 +554,36 @@ TRACE_EVENT(kvm_nested_vmrun,
__entry->npt ? "on" : "off")
 );
 
+
+/*
+ * Tracepoint for nested VMLAUNCH/VMRESUME
+ */
+TRACE_EVENT(kvm_nested_vmlaunch_resume,
+   TP_PROTO(__u64 rip, __u64 vmcs, __u64 nested_rip,
+__u32 entry_intr_info),
+   TP_ARGS(rip, vmcs, nested_rip, entry_intr_info),
+
+   TP_STRUCT__entry(
+   __field(__u64,  rip )
+   __field(__u64,  vmcs)
+   __field(__u64,  nested_rip  )
+   __field(__u32,  entry_intr_info )
+   ),
+
+   TP_fast_assign(
+   __entry->rip= rip;
+   __entry->vmcs   = vmcs;
+   __entry->nested_rip = nested_rip;
+   __entry->entry_intr_info= entry_intr_info;
+   ),
+
+   TP_printk("rip: 0x%016llx vmcs: 0x%016llx nrip: 0x%016llx "
+ "entry_intr_info: 0x%08x",
+   __entry->rip, __entry->vmcs, __entry->nested_rip,
+   __entry->entry_intr_info)
+);
+
+
 TRACE_EVENT(kvm_nested_intercepts,
TP_PROTO(__u16 cr_read, __u16 cr_write, __u32 exceptions,
 __u32 intercept1, __u32 intercept2, __u32 intercept3),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 776688f9d1017..cd51b66480d52 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3327,6 +3327,12 @@ enum nvmx_vmentry_status 
nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);
 
+   trace_kvm_nested_vmlaunch_resume(kvm_rip_read(vcpu),
+vmx->nested.current_vmptr,
+vmcs12->guest_rip,
+vmcs12->vm_entry_intr_info_field);
+
+
/*
 * Overwrite vmcs01.GUEST_CR3 with L1's CR3 if EPT is disabled *and*
 * nested early checks are disabled.  In the event of a "late" VM-Fail,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a480804ae27a3..7c6e94e32100e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11562,6 +11562,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
+EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmlaunch_resume);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmrun);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
-- 
2.26.2