Hi,

As discussed earlier on this list, I have been working on the
next generation libpfm. A version that will handle both
perfmon and PCL and which will also make it much simpler
for tool writers to enable advanced features.

There will be a new event naming scheme. All features of
an event or counter will be controlled in the event name
specification.

The PCL support covers both the PCL generic HW & SW
events and the usual raw PMU events. There is a dedicated
PCL call to encode the key fields on struct perf_counter_attr:

   int pfm_get_pcl_event_encoding(const char *str, struct perf_counter_attr *hw)

You can also retrieve just the raw event encoding:
   int pfm_get_event_encoding(const char *str, uint64_t *codes, int
*count, int *plm);

Here are some examples (screenshots) on AMD64 and Intel Core.

$ showeventinfo | head -20
PMU model: AMD64 (Family 10h RevB, Barcelona)
#-----------------------------
Name     : DISPATCHED_FPU
Desc     : Dispatched FPU Operations
Code     : 0x0
Counters : [ 0 1 2 3 ]
Attr-00 : 0x01 : [OPS_ADD] : Add pipe ops excluding load ops and SSE move ops
Attr-01 : 0x02 : [OPS_MULTIPLY] : Multiply pipe ops excluding load ops
and SSE move ops
Attr-02 : 0x04 : [OPS_STORE] : Store pipe ops excluding load ops and
SSE move ops
Attr-03 : 0x08 : [OPS_ADD_PIPE_LOAD_OPS] : Add pipe load ops and SSE move ops
Attr-04 : 0x10 : [OPS_MULTIPLY_PIPE_LOAD_OPS] : Multiply pipe load ops
and SSE move ops
Attr-05 : 0x20 : [OPS_STORE_PIPE_LOAD_OPS] : Store pipe load ops and
SSE move ops
Attr-06 : 0x3f : [ALL] : All sub-events selected
Attr-07 : 0x07 : [i] : invert (0 or 1)
Attr-08 : 0x08 : [e] : edge level (0 or 1)
Attr-09 : 0x09 : [c] : counter-mask=[0-255]
Attr-10 : 0x0a : [u] : measure at priv level 1, 2, 3 (0 or 1)
Attr-11 : 0x0b : [k] : measure at priv level 0 (0 or 1)
Attr-12 : 0x0c : [g] : measure at guest level (0 or 1)
Attr-13 : 0x0d : [h] : measure at hypervisor level (0 or 1)


You notice the new attributes now merged with the regular unit masks.
To enable invert + edge + counter-mask on this event for OPS_ADD, you
simply need
to pass:
       DISPATCHED_FPU:OPS_ADD:i=1:e=1:c=2

This counts every cycle in which less than 2 FPU add ops are
dispatched. Key value
add for tool is that there is no need to pass AMD64 specific
structures to enable AMD-specific
features. Proof with the libpfm self examples shown here on top of PCL:

$ self DISPATCHED_FPU:OPS_ADD:i=1:e=1:c=2
[0x2d40100 event_sel=0x0 event_sel2=0x0 umask=0x1 os=0 usr=0 en=1
int=1 inv=1 edge=1 cnt_mask=2 guest=0 host=0]DISPATCHED_FPU
[type=4 val=0x2d40100 e_u=0 e_k=0 e_hv=0 plm=0x0]
DISPATCHED_FPU:OPS_ADD:i=1:e=1:c=2
                   0 DISPATCHED_FPU:OPS_ADD:i=1:e=1:c=2

The 3rd line shows the PCL encoding for this event.

As I said, PCL events are automatically added if PCL is detected on the host:
$ showeventinfo
...
#-----------------------------
Name     : PERF_COUNT_CPU_CYCLES
Desc     : PERF_COUNT_CPU_CYCLES
Code     : 0x0
Counters : [ ]
Attr-00 : 0x00 : [u] : measure at priv level 1, 2, 3, (0 or 1)
Attr-01 : 0x01 : [k] : measure at priv level 0 (0 or 1)
Attr-02 : 0x02 : [hv] : measure at hypervisor level (0 or 1)
#-----------------------------
Name     : PERF_COUNT_INSTRUCTIONS
Desc     : PERF_COUNT_INSTRUCTIONS
Code     : 0x1
Counters : [ ]
Attr-00 : 0x00 : [u] : measure at priv level 1, 2, 3, (0 or 1)
Attr-01 : 0x01 : [k] : measure at priv level 0 (0 or 1)
Attr-02 : 0x02 : [hv] : measure at hypervisor level (0 or 1)
...
Name     : PERF_COUNT_CONTEXT_SWITCHES
Desc     : PERF_COUNT_CONTEXT_SWITCHES
Code     : 0x100000003
Counters : [ ]
Attr-00 : 0x00 : [u] : measure at priv level 1, 2, 3, (0 or 1)
Attr-01 : 0x01 : [k] : measure at priv level 0 (0 or 1)
Attr-02 : 0x02 : [hv] : measure at hypervisor level (0 or 1)
#-----------------------------
Name     : PERF_COUNT_CPU_MIGRATIONS
Desc     : PERF_COUNT_CPU_MIGRATIONS
Code     : 0x100000004
Counters : [ ]
Attr-00 : 0x00 : [u] : measure at priv level 1, 2, 3, (0 or 1)
Attr-01 : 0x01 : [k] : measure at priv level 0 (0 or 1)
Attr-02 : 0x02 : [hv] : measure at hypervisor level (0 or 1)

And same thing, you can measure those with an unmodified program:
$ self perf_count_context_switches
[type=1 val=0x3 e_u=0 e_k=0 e_hv=0 plm=0x0] perf_count_context_switches
                1002 perf_count_context_switches

On AMD64 Family 10h, IBS will be enabled using the same mechanism.

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to