RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Thursday, September 20, 2018 8:32 PM, Peter Zijlstra wrote:
> On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:
> How do you deal with that situation, where the LBR is in use by a host event?

I guess you were referring to this use case: "perf record -b 
qemu-system-x86_64..", where the qemu process also needs lbr.

I also discussed with Andi about this in v1, where we switch the lbr state on 
VMX transitions. That could achieve the above, but adds too much overhead to 
VMX transitions, which are frequent. Considering the above usage is not common 
(it's more common to just have the task in the guest use the lbr feature), we 
decided to not take that usage into account, and switch lbr state only on the 
vCPU switching.

Best,
Wei


RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Thursday, September 20, 2018 8:32 PM, Peter Zijlstra wrote:
> On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:
> How do you deal with that situation, where the LBR is in use by a host event?

I guess you were referring to this use case: "perf record -b 
qemu-system-x86_64..", where the qemu process also needs lbr.

I also discussed with Andi about this in v1, where we switch the lbr state on 
VMX transitions. That could achieve the above, but adds too much overhead to 
VMX transitions, which are frequent. Considering the above usage is not common 
(it's more common to just have the task in the guest use the lbr feature), we 
decided to not take that usage into account, and switch lbr state only on the 
vCPU switching.

Best,
Wei


RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Friday, September 21, 2018 12:24 AM, Peter Zijlstra wrote:
> On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote:
> > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) {
> > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > > + struct perf_event *event;
> > > + struct perf_event_attr attr = {
> > > + .type = PERF_TYPE_RAW,
> > > + .size = sizeof(attr),
> > > + .pinned = true,
> > > + .exclude_host = true,
> > > + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK
> |
> > > +   PERF_SAMPLE_BRANCH_USER |
> > > +   PERF_SAMPLE_BRANCH_KERNEL,
> >
> > I think that will allocate an extra perfmon counter, right?
> 
> I throught the same too, but I think the exclude_host/guest, whichever is the
> right one makes that work for regular counters too.

Sorry for being late.

I'm not sure if exclude_host/guest would be suitable, for example, if the guest 
wants to use a perf counter, host will create a perf event with 
"exclude_host=true" to have the counter not count in host. And 
"exclude_guest=true" is a flag to the perf core that the counter should not 
count when the guest runs.

What would you think if we add a new flag (e.g. .force_no_counters) to the perf 
core to indicate not allocating a perf counter?

> That code is a wee bit magical and I didn't take the time to reverse engineer
> that. It most certainly needs a comment.

No problem. I will add more comments in the next version.

Best,
Wei



RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Friday, September 21, 2018 12:24 AM, Peter Zijlstra wrote:
> On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote:
> > > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) {
> > > + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > > + struct perf_event *event;
> > > + struct perf_event_attr attr = {
> > > + .type = PERF_TYPE_RAW,
> > > + .size = sizeof(attr),
> > > + .pinned = true,
> > > + .exclude_host = true,
> > > + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > > + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK
> |
> > > +   PERF_SAMPLE_BRANCH_USER |
> > > +   PERF_SAMPLE_BRANCH_KERNEL,
> >
> > I think that will allocate an extra perfmon counter, right?
> 
> I throught the same too, but I think the exclude_host/guest, whichever is the
> right one makes that work for regular counters too.

Sorry for being late.

I'm not sure if exclude_host/guest would be suitable, for example, if the guest 
wants to use a perf counter, host will create a perf event with 
"exclude_host=true" to have the counter not count in host. And 
"exclude_guest=true" is a flag to the perf core that the counter should not 
count when the guest runs.

What would you think if we add a new flag (e.g. .force_no_counters) to the perf 
core to indicate not allocating a perf counter?

> That code is a wee bit magical and I didn't take the time to reverse engineer
> that. It most certainly needs a comment.

No problem. I will add more comments in the next version.

Best,
Wei



RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Thursday, September 20, 2018 11:31 PM, Andi Kleen wrote:
> > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) {
> > +   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > +   struct perf_event *event;
> > +   struct perf_event_attr attr = {
> > +   .type = PERF_TYPE_RAW,
> > +   .size = sizeof(attr),
> > +   .pinned = true,
> > +   .exclude_host = true,
> > +   .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +   .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK
> |
> > + PERF_SAMPLE_BRANCH_USER |
> > + PERF_SAMPLE_BRANCH_KERNEL,


Sorry for my late response (I was digging into some of the details).


> I think that will allocate an extra perfmon counter, right?
> 
> So if the guest wants to use all perfmon counters we would start to multiplex,
> which would be bad

Right. The host side counter allocation is not necessary.


> Would need to fix perf core to not allocate a perfmon counter in this case, if
> no period and no event count is requested.
> 

If no period (i.e. .sample_period = 0), the current code still uses one counter 
with the period set to the max period.
If we change that to do no counter allocation, would that break any of the 
existing usages (I didn't find the reason why the existing code not avoid that 
allocation for a non-sampling event)?

Best,
Wei


RE: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-27 Thread Wang, Wei W
On Thursday, September 20, 2018 11:31 PM, Andi Kleen wrote:
> > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu) {
> > +   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > +   struct perf_event *event;
> > +   struct perf_event_attr attr = {
> > +   .type = PERF_TYPE_RAW,
> > +   .size = sizeof(attr),
> > +   .pinned = true,
> > +   .exclude_host = true,
> > +   .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +   .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK
> |
> > + PERF_SAMPLE_BRANCH_USER |
> > + PERF_SAMPLE_BRANCH_KERNEL,


Sorry for my late response (I was digging into some of the details).


> I think that will allocate an extra perfmon counter, right?
> 
> So if the guest wants to use all perfmon counters we would start to multiplex,
> which would be bad

Right. The host side counter allocation is not necessary.


> Would need to fix perf core to not allocate a perfmon counter in this case, if
> no period and no event count is requested.
> 

If no period (i.e. .sample_period = 0), the current code still uses one counter 
with the period set to the max period.
If we change that to do no counter allocation, would that break any of the 
existing usages (I didn't find the reason why the existing code not avoid that 
allocation for a non-sampling event)?

Best,
Wei


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote:
> > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > +   struct perf_event *event;
> > +   struct perf_event_attr attr = {
> > +   .type = PERF_TYPE_RAW,
> > +   .size = sizeof(attr),
> > +   .pinned = true,
> > +   .exclude_host = true,
> > +   .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +   .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> > + PERF_SAMPLE_BRANCH_USER |
> > + PERF_SAMPLE_BRANCH_KERNEL,
> 
> I think that will allocate an extra perfmon counter, right? 

I throught the same too, but I think the exclude_host/guest, whichever
is the right one makes that work for regular counters too.

That code is a wee bit magical and I didn't take the time to reverse
engineer that. It most certainly needs a comment.


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 08:30:35AM -0700, Andi Kleen wrote:
> > +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> > +{
> > +   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> > +   struct perf_event *event;
> > +   struct perf_event_attr attr = {
> > +   .type = PERF_TYPE_RAW,
> > +   .size = sizeof(attr),
> > +   .pinned = true,
> > +   .exclude_host = true,
> > +   .sample_type = PERF_SAMPLE_BRANCH_STACK,
> > +   .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> > + PERF_SAMPLE_BRANCH_USER |
> > + PERF_SAMPLE_BRANCH_KERNEL,
> 
> I think that will allocate an extra perfmon counter, right? 

I throught the same too, but I think the exclude_host/guest, whichever
is the right one makes that work for regular counters too.

That code is a wee bit magical and I didn't take the time to reverse
engineer that. It most certainly needs a comment.


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Andi Kleen
> +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_RAW,
> + .size = sizeof(attr),
> + .pinned = true,
> + .exclude_host = true,
> + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +   PERF_SAMPLE_BRANCH_USER |
> +   PERF_SAMPLE_BRANCH_KERNEL,

I think that will allocate an extra perfmon counter, right? 

So if the guest wants to use all perfmon counters we would start to 
multiplex, which would be bad

Would need to fix perf core to not allocate a perfmon counter in
this case, if no period and no event count is requested.

-Andi


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Andi Kleen
> +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_RAW,
> + .size = sizeof(attr),
> + .pinned = true,
> + .exclude_host = true,
> + .sample_type = PERF_SAMPLE_BRANCH_STACK,
> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +   PERF_SAMPLE_BRANCH_USER |
> +   PERF_SAMPLE_BRANCH_KERNEL,

I think that will allocate an extra perfmon counter, right? 

So if the guest wants to use all perfmon counters we would start to 
multiplex, which would be bad

Would need to fix perf core to not allocate a perfmon counter in
this case, if no period and no event count is requested.

-Andi


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:

> diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
> index 5ab4a36..97a29d7 100644
> --- a/arch/x86/kvm/pmu_intel.c
> +++ b/arch/x86/kvm/pmu_intel.c
> @@ -342,6 +342,47 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>   pmu->global_ovf_ctrl = 0;
>  }
>  
> +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_RAW,
> + .size = sizeof(attr),

Bit yuck to imply .config = 0, like that.

> + .pinned = true,

Note that this can still result in failing to schedule the event, you
create a pinned task event, but a pinned cpu event takes precedence and
can claim the LBR.

How do you deal with that situation, where the LBR is in use by a host
event?

> + .exclude_host = true,

Didn't you mean to use .exclude_guest ? You don't want this thing to run
when the guest is running, right?

But I still don't like this hack much, what happens if userspace sets
that bit? You seems to have forgotten to combine it with PF_VCPU or
whatever is the appropriate bit to check.

Similarly, how does the existing exclude_{gues,host} crud work?

> + .sample_type = PERF_SAMPLE_BRANCH_STACK,

What's the point of setting .sample_type if !.sample_period ?

> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +   PERF_SAMPLE_BRANCH_USER |
> +   PERF_SAMPLE_BRANCH_KERNEL,
> + };

I think this function wants a comment to explain wth its doing and why,
because if someone stumbles over this code in a few months nobody will
remember those things.

> +
> + if (pmu->guest_lbr_event)
> + return 0;
> +
> + event = perf_event_create_kernel_counter(, -1, current, NULL,
> +  NULL);
> + if (IS_ERR(event)) {
> + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
> + return -ENOENT;
> + }
> + pmu->guest_lbr_event = event;
> +
> + return 0;
> +}


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:

> diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
> index 5ab4a36..97a29d7 100644
> --- a/arch/x86/kvm/pmu_intel.c
> +++ b/arch/x86/kvm/pmu_intel.c
> @@ -342,6 +342,47 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>   pmu->global_ovf_ctrl = 0;
>  }
>  
> +int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
> + struct perf_event *event;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_RAW,
> + .size = sizeof(attr),

Bit yuck to imply .config = 0, like that.

> + .pinned = true,

Note that this can still result in failing to schedule the event, you
create a pinned task event, but a pinned cpu event takes precedence and
can claim the LBR.

How do you deal with that situation, where the LBR is in use by a host
event?

> + .exclude_host = true,

Didn't you mean to use .exclude_guest ? You don't want this thing to run
when the guest is running, right?

But I still don't like this hack much, what happens if userspace sets
that bit? You seems to have forgotten to combine it with PF_VCPU or
whatever is the appropriate bit to check.

Similarly, how does the existing exclude_{gues,host} crud work?

> + .sample_type = PERF_SAMPLE_BRANCH_STACK,

What's the point of setting .sample_type if !.sample_period ?

> + .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
> +   PERF_SAMPLE_BRANCH_USER |
> +   PERF_SAMPLE_BRANCH_KERNEL,
> + };

I think this function wants a comment to explain wth its doing and why,
because if someone stumbles over this code in a few months nobody will
remember those things.

> +
> + if (pmu->guest_lbr_event)
> + return 0;
> +
> + event = perf_event_create_kernel_counter(, -1, current, NULL,
> +  NULL);
> + if (IS_ERR(event)) {
> + pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
> + return -ENOENT;
> + }
> + pmu->guest_lbr_event = event;
> +
> + return 0;
> +}


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 1562863..a91fdef 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -223,6 +223,7 @@ struct cpu_hw_events {
>*/
>   u64 intel_ctrl_guest_mask;
>   u64 intel_ctrl_host_mask;
> + boolguest_lbr;
>   struct perf_guest_switch_msrguest_switch_msrs[X86_PMC_IDX_MAX];
>  
>   /*

Please, no bools in aggregate types.

Either use a bitfield or u8, there's more LBR related bools, convert and
group the lot to avoid holes.


Re: [PATCH v3 4/5] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack

2018-09-20 Thread Peter Zijlstra
On Thu, Sep 20, 2018 at 06:05:58PM +0800, Wei Wang wrote:
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 1562863..a91fdef 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -223,6 +223,7 @@ struct cpu_hw_events {
>*/
>   u64 intel_ctrl_guest_mask;
>   u64 intel_ctrl_host_mask;
> + boolguest_lbr;
>   struct perf_guest_switch_msrguest_switch_msrs[X86_PMC_IDX_MAX];
>  
>   /*

Please, no bools in aggregate types.

Either use a bitfield or u8, there's more LBR related bools, convert and
group the lot to avoid holes.