On Mon, Nov 23, 2015 at 6:35 AM, Boris Ostrovsky <boris.ostrov...@oracle.com
> wrote:

> (+ Dietmar)
>
> On 11/20/2015 07:21 PM, Brendan Gregg wrote:
>
>> G'Day,
>>
>> The vpmu feature of Xen is incredibly useful for performance analysis,
>> however, it's currently all counters or nothing. In secure environments,
>> there can be hesitation to enable access to all PMCs (there are hundreds of
>> them). I've included a prototype patch that introduces two new restricted
>> vpmu modes:
>>
>> vpmu=ipc: As the most restricted minimum set. This enables cycles,
>> reference cycles, and instructions only. This is enough to calculate
>> instructions per cycle (IPC).
>>
>> vpm=arch: This enables the 7 pre-defined architectural events as listed
>> in cpuid, and in Table 18-1 of the Intel software developer's manual, vol
>> 3B.
>>
>> There can be a third mode added later on, with a larger set (including
>> micro-ops PMCs).
>>
>
> These new features would need a corresponding change in Linux for PV
> guests (or for dom0 to change feature set globally). But before that
> do_xenpmu_op()'s XENPMU_feature_set clause will have to be updated to deal
> with new modes.
>

Ok, thanks. For now this is HVM, and I can EINVAL in do_xenpmu_op() for the
new modes, like with BTS. (Do you know if the later Linux changes would be
more than a feature version of pmu_mode_store/pmu_mode_show?)


>
>
>> I've included the short patch below for Xen 4.6.0, which provides these
>> modes (it also fixes a minor copy-and-paste error with
>> core2_get_fixed_pmc_count(), which I believe was accessing the wrong
>> register). I am not a veteran Xen programmer, so please feel free to edit
>> or rewrite this patch. In case this email messes it up, it's also on:
>> https://github.com/brendangregg/Misc/blob/master/xen/xen-4.6.0-vpmu-filter.diff
>>
>> I've shown testing of four modes (off, on, ipc, arch) here:
>> https://gist.github.com/brendangregg/b7318c0f49bf906dc8df
>>
>> For example, here is Linux perf running in a PVHVM guest with the new
>> vpmu=ipc mode:
>>
>> root@vm0hvm:~# perf stat -d ./noploop
>>
>>  Performance counter stats for './noploop':
>>
>>        1511.326375 task-clock (msec)         #    0.999 CPUs utilized
>>                 24 context-switches          #    0.016 K/sec
>>                  0 cpu-migrations            #    0.000 K/sec
>>                113 page-faults               #    0.075 K/sec
>>      5,028,638,883 cycles                    #    3.327 GHz
>>                  0 stalled-cycles-frontend   #    0.00% frontend cycles
>> idle
>>                  0 stalled-cycles-backend    #    0.00% backend  cycles
>> idle
>>     20,043,427,933 instructions              #    3.99  insns per cycle
>>                  0 branches                  #    0.000 K/sec
>>                  0 branch-misses             #    0.00% of all branches
>>                  0 L1-dcache-loads           #    0.000 K/sec
>>                  0 L1-dcache-load-misses     #    0.00% of all L1-dcache
>> hits
>>                  0 LLC-loads                 #    0.000 K/sec
>>    <not supported> LLC-load-misses:HG
>>
>> Note that IPC is shown ("insns per cycle"), but other counters are not.
>>
>>
>> ---patch---
>> diff -ur xen-4.6.0-clean/docs/misc/xen-command-line.markdown
>> xen-4.6.0-brendan/docs/misc/xen-command-line.markdown
>> --- xen-4.6.0-clean/docs/misc/xen-command-line.markdown2015-10-05
>> 07:33:39.000000000 -0700
>> +++ xen-4.6.0-brendan/docs/misc/xen-command-line.markdown2015-11-20
>> 15:29:05.663781176 -0800
>> @@ -1444,7 +1444,7 @@
>>  flushes on VM entry and exit, increasing performance.
>>  ### vpmu
>> -> `= ( bts )`
>> +> `= ( <boolean> | bts | ipc | arch )`
>>  > Default: `off`
>> @@ -1460,6 +1460,15 @@
>>  If 'vpmu=bts' is specified the virtualisation of the Branch Trace Store
>> (BTS)
>>  feature is switched on on Intel processors supporting this feature.
>> +vpmu=ipc enables performance monitoring, but restricts the counters to
>> the
>> +most minimum set possible: instructions, cycles, and reference cycles.
>> These
>> +can be used to calculate instructions per cycle (IPC).
>> +
>> +vpmu=arch enables performance monitoring, but restricts the counters to
>> the
>> +pre-defined architectural events only. These are exposed by cpuid, and
>> listed
>> +in Table 18-1 from the Intel 64 and IA-32 Architectures Software
>> Developer's
>> +Manual, Volume 3B, System Programming Guide, Part 2.
>> +
>>  Note that if **watchdog** option is also specified vpmu will be turned
>> off.
>>  *Warning:*
>> diff -ur xen-4.6.0-clean/xen/arch/x86/cpu/vpmu.c
>> xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu.c
>> --- xen-4.6.0-clean/xen/arch/x86/cpu/vpmu.c2015-10-05 07:33:39.000000000
>> -0700
>> +++ xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu.c2015-11-20
>> 15:29:50.847781176 -0800
>>
>> @@ -43,9 +43,11 @@
>>  CHECK_pmu_params;
>>  /*
>> - * "vpmu" :     vpmu generally enabled
>> - * "vpmu=off" : vpmu generally disabled
>> - * "vpmu=bts" : vpmu enabled and Intel BTS feature switched on.
>> + * "vpmu" :     vpmu generally enabled (all counters)
>> + * "vpmu=off"  : vpmu generally disabled
>> + * "vpmu=bts"  : vpmu enabled and Intel BTS feature switched on.
>> + * "vpmu=ipc"  : vpmu enabled for IPC counters only (most restrictive)
>> + * "vpmu=arch" : vpmu enabled for predef arch counters only (restrictive)
>>   */
>>  static unsigned int __read_mostly opt_vpmu_enabled;
>>  unsigned int __read_mostly vpmu_mode = XENPMU_MODE_OFF;
>> @@ -67,6 +69,10 @@
>>      default:
>>          if ( !strcmp(s, "bts") )
>>              vpmu_features |= XENPMU_FEATURE_INTEL_BTS;
>> +        else if ( !strcmp(s, "ipc") )
>> +            vpmu_features |= XENPMU_FEATURE_IPC_ONLY;
>> +        else if ( !strcmp(s, "arch") )
>> +            vpmu_features |= XENPMU_FEATURE_ARCH_ONLY;
>>          else if ( *s )
>>          {
>>              printk("VPMU: unknown flag: %s - vpmu disabled!\n", s);
>> diff -ur xen-4.6.0-clean/xen/arch/x86/cpu/vpmu_intel.c
>> xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu_intel.c
>> --- xen-4.6.0-clean/xen/arch/x86/cpu/vpmu_intel.c2015-10-05
>> 07:33:39.000000000 -0700
>> +++ xen-4.6.0-brendan/xen/arch/x86/cpu/vpmu_intel.c2015-11-20
>> 15:29:42.571781176 -0800
>> @@ -166,10 +166,10 @@
>>   */
>>  static int core2_get_fixed_pmc_count(void)
>>  {
>> -    u32 eax;
>> +    u32 edx;
>> -    eax = cpuid_eax(0xa);
>> -    return MASK_EXTR(eax, PMU_FIXED_NR_MASK);
>> +    edx = cpuid_edx(0xa);
>> +    return MASK_EXTR(edx, PMU_FIXED_NR_MASK);
>>  }
>>
>
> This would need to be made into a separate patch since it fixes a bug.


Ok, thanks.


>
>
>  /* edx bits 5-12: Bit width of fixed-function performance counters  */
>> @@ -652,12 +652,52 @@
>>          tmp = msr - MSR_P6_EVNTSEL(0);
>>          if ( tmp >= 0 && tmp < arch_pmc_cnt )
>>          {
>> +            int umaskevent, blocked = 0;
>>
>
> Should be uint64_t and bool_t.


Ok, thanks.


>
>
>              struct xen_pmu_cntr_pair *xen_pmu_cntr_pair =
>>                  vpmu_reg_pointer(core2_vpmu_cxt, arch_counters);
>>              if ( msr_content & ARCH_CTRL_MASK )
>>                  return -EINVAL;
>> +            /* PMC filters */
>> +            umaskevent = msr_content & MSR_IA32_CMT_EVTSEL_UE_MASK;
>>
>
>
> I don't see this mask defined anywhere. (I assume it's 0xffffffff).
>

Ah, sorry, will be there in v2 (I'm switching to git send-email, thanks
Jan). It's 0x0000ffff.


>
> Also, if either of those two flags is set we probably want to block
> MSR_IA32_DS_AREA and MSR_IA32_PEBS_ENABLE accesses as well.
>

Ok, yes, good idea.


>
>
> +            if ( vpmu_features & XENPMU_FEATURE_IPC_ONLY ||
>> +                 vpmu_features & XENPMU_FEATURE_ARCH_ONLY )
>> +            {
>> +                blocked = 1;
>> +                switch ( umaskevent )
>> +                {
>> +                /*
>> +                 * See Table 18-1 from the Intel 64 and IA-32
>> Architectures Software
>> +                 * Developer's Manual, Volume 3B, System Programming
>> Guide, Part 2.
>> +                 */
>> +                case 0x003c:/* unhalted core cycles */
>> +                case 0x013c:/* unhalted ref cycles */
>> +                case 0x00c0:/* instruction retired */
>> +                    blocked = 0;
>> +                default:
>> +                    break;
>> +                }
>> +            }
>> +
>> +            if ( vpmu_features & XENPMU_FEATURE_ARCH_ONLY )
>> +            {
>> +                /* additional counters beyond IPC only; blocked already
>> set */
>> +                switch ( umaskevent )
>> +                {
>> +                case 0x4f2e:/* LLC reference */
>> +                case 0x412e:/* LLC misses */
>> +                case 0x00c4:/* branch instruction retired */
>> +                case 0x00c5:/* branch */
>> +                    blocked = 0;
>> +                default:
>> +                    break;
>> +               }
>> +            }
>> +
>> +            if ( blocked )
>> +                return -EINVAL;
>> +
>>              if ( has_hvm_container_vcpu(v) )
>>  vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
>> &core2_vpmu_cxt->global_ctrl);
>> diff -ur xen-4.6.0-clean/xen/include/public/pmu.h
>> xen-4.6.0-brendan/xen/include/public/pmu.h
>> --- xen-4.6.0-clean/xen/include/public/pmu.h2015-10-05 07:33:39.000000000
>> -0700
>> +++ xen-4.6.0-brendan/xen/include/public/pmu.h2015-11-20
>> 15:30:08.887781176 -0800
>> @@ -84,9 +84,17 @@
>>  /*
>>   * PMU features:
>> - * - XENPMU_FEATURE_INTEL_BTS: Intel BTS support (ignored on AMD)
>> + * - XENPMU_FEATURE_INTEL_BTS:  Intel BTS support (ignored on AMD)
>> + * - XENPMU_FEATURE_IPC_ONLY:   Restrict PMC to the most minimum set
>> possible.
>> + *                              Instructions, cycles, and ref cycles.
>> Can be
>> + *                              used to calculate instructions-per-cycle
>> (IPC).
>> + * - XENPMU_FEATURE_ARCH_ONLY:  Restrict PMCs to the Intel pre-defined
>> + *                              architecteral events exposed by cpuid and
>> + *                              listed in Table 18-1 of the developer's
>> manual.
>>
>
> Needs "(ignored on AMD)"


Ok, thanks.

Brendan


>
>
> -boris
>
>
>
>   */
>> -#define XENPMU_FEATURE_INTEL_BTS  1
>> +#define XENPMU_FEATURE_INTEL_BTS  (1<<0)
>> +#define XENPMU_FEATURE_IPC_ONLY   (1<<1)
>> +#define XENPMU_FEATURE_ARCH_ONLY  (1<<2)
>>  /*
>>   * Shared PMU data between hypervisor and PV(H) domains.
>> ---patch---
>>
>>
>> Brendan
>>
>> --
>> Brendan Gregg, Senior Performance Architect, Netflix
>>
>
>


-- 
Brendan Gregg, Senior Performance Architect, Netflix
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to