Hi Phil,

On Thu, Jun 16, 2016 at 2:50 AM, Philip Mucci <p...@minimalmetrics.com>
wrote:

> Greetings Stephane!
>
> Thanks for the note. More info/output below.
>
> First showing all the qualifiers for a PMU event.
>
> $ ./examples/showevtinfo -O perf_ext power8::
> .
> .
> .
> IDX : 228589659
> PMU name : power8 (POWER8)
> Name     : PM_CYC
> Equiv : None
> Flags    : None
> Desc     : Cycles .
> Code     : 0x1e
> Modif-00 : 0x00 : perf_event : [u] : monitor at user level (boolean)
> Modif-01 : 0x01 : perf_event : [k] : monitor at kernel level (boolean)
> Modif-02 : 0x02 : perf_event : [h] : monitor at hypervisor level (boolean)
> Modif-03 : 0x03 : perf_event : [period] : sampling period (integer)
> Modif-04 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
> Modif-05 : 0x06 : perf_event : [excl] : exclusive access (boolean)
> Modif-06 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
> Modif-07 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
> Modif-08 : 0x09 : perf_event : [cpu] : CPU to program (integer)
> Modif-09 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
>
> Looking at the standard PERF_CYCLES event:
>
> IDX : 106954753
> PMU name : perf (perf_events generic PMU)
> Name     : CYCLES
> Equiv : PERF_COUNT_HW_CPU_CYCLES
> Flags    : None
> Desc     : PERF_COUNT_HW_CPU_CYCLES
> Code     : 0x0
> Modif-00 : 0x03 : perf_event : [period] : sampling period (integer)
> Modif-01 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
> Modif-02 : 0x05 : perf_event : [precise] : precise ip (integer)
> Modif-03 : 0x06 : perf_event : [excl] : exclusive access (boolean)
> Modif-04 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
> Modif-05 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
> Modif-06 : 0x09 : perf_event : [cpu] : CPU to program (integer)
> Modif-07 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
>
> But strangely, some of them do! I don’t know why SW_PAGE_FAULTS would be
> supported in kernel mode but I digress...
>
> IDX : 106954780
> PMU name : perf (perf_events generic PMU)
> Name     : PERF_COUNT_SW_PAGE_FAULTS
> Equiv : None
> Flags    : None
> Desc     : PERF_COUNT_SW_PAGE_FAULTS
> Code     : 0x2
> Modif-00 : 0x00 : perf_event : [u] : monitor at user level (boolean)
> Modif-01 : 0x01 : perf_event : [k] : monitor at kernel level (boolean)
> Modif-02 : 0x03 : perf_event : [period] : sampling period (integer)
> Modif-03 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
> Modif-04 : 0x06 : perf_event : [excl] : exclusive access (boolean)
> Modif-05 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
> Modif-06 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
> Modif-07 : 0x09 : perf_event : [cpu] : CPU to program (integer)
> Modif-08 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
>
> Next let’s see the PMU event PM_CYC. Indeed, it seems to not be passing
> the qualifiers to the perf subsystem.
>
> pjmucci@white24:~/libpfm4$ LIBPFM_VERBOSE=1 perf_examples/task -e
> PM_CYC:k ls
> PERF[type=4 config=0x1e config1=0x0 excl=0 e_u=0 e_k=0 e_hv=0 e_host=0
> e_gu=0 period=0 freq=0 precise=0 pinned=0] PM_CYC:k
> config.mk  debian  examples  lib   Makefile python  rules.mk
> COPYING    docs    include   libpfm.spec  perf_examples  README  tests
>
>            3,042,502 PM_CYC:k (0.00% scaling, ena=791,638, run=791,638)
>
> pjmucci@white24:~/libpfm4$ LIBPFM_VERBOSE=1 perf_examples/task -e
> PM_CYC:u ls
> PERF[type=4 config=0x1e config1=0x0 excl=0 e_u=0 e_k=0 e_hv=0 e_host=0
> e_gu=0 period=0 freq=0 precise=0 pinned=0] PM_CYC:u
> config.mk  debian  examples  lib   Makefile python  rules.mk
> COPYING    docs    include   libpfm.spec  perf_examples  README  tests
>
>            3,048,441 PM_CYC:u (0.00% scaling, ena=793,210, run=793,210)
>
>
>
>
I found the problem. It is not in the pfmlib_perf_event.c code. Everything
is behaving as expected there.
The issue is in the pfmlib_power*.c modules. The module must define the
priv level supported by the hardware in the .supported_plm bitmask field.
For instance, if you do power8_support.support_plm = PFM_PLM0 | PFM_PLM3,
then your cmdline works:

LIBPFM_VERBOSE=1 perf_examples/task -e power8::PM_CYC:k:freq=100 ls
PERF[type=4 config=0x1e config1=0x0 excl=0 e_u=1 e_k=0 e_hv=0 e_host=0
e_gu=1 period=100 freq=1 precise=0 pinned=0] power8::PM_CYC:k:freq=100

So what's needed if for some Power* expert to check what in PMU since
Power4 supports in terms of priv level mask and then add the .supported_plm
mask definitions that correspond. Should be fairly simple.

Why is this needed?
Because, although perf_events can override your settings, libpfm4 is trying
to be cuatious and not letting you set perf_event priv level which it knows
the hardware does not support.

Trying this same command with a ‘perf’ event (cycles):
>
> pjmucci@white24:~/libpfm4$ LIBPFM_VERBOSE=1 perf_examples/task -e
> cycles:u ls
> task: event cycles:u: invalid event attribute
>
> No surprise there since the result of the first… It seems like there is
> just some boilerplate perf code missing here, nothing specific to the PMU
> AFAICT.
>
> Lastly, I’d to point out something confusing in the output you sent me
> from your run of showevtinfo on the IVB. One run you have (without -O
> perf_ext) shows this:
>
> PMU name : ivb (Intel Ivy Bridge)
> Name     : L1D
>
> Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
>
> Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
> Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1)
> (boolean)
> Modif-03 : 0x03 : PMU : [i] : invert (boolean)
> Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
> Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
>
> But the next run
>
> $ showevtinfo -O perf_ext
> IDX      : 155189273
> PMU name : ivb (Intel Ivy Bridge)
> Name     : L1D
> Umask-00 : 0x01 : PMU : [REPLACEMENT] : [default] : Number of cache lines
> brought into the L1D cache
> Modif-00 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1)
> (boolean)
> Modif-01 : 0x03 : PMU : [i] : invert (boolean)
> Modif-02 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
> Modif-03 : 0x05 : PMU : [t] : measure any thread (boolean)
> Modif-04 : 0x00 : perf_event : [u] : monitor at user level (boolean)
> Modif-05 : 0x01 : perf_event : [k] : monitor at kernel level (boolean)
> Modif-06 : 0x03 : perf_event : [period] : sampling period (integer)
> Modif-07 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
> Modif-08 : 0x06 : perf_event : [excl] : exclusive access (boolean)
> Modif-09 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
> Modif-10 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
> Modif-11 : 0x09 : perf_event : [cpu] : CPU to program (integer)
> Modif-12 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
>
>
> Why would the output be different when we’ve just asked for more detail? u
> and k both live in the PMU and in perf_event. I might suggest this is
> confusing…(or at least it is to me). Granted I understand those priv bits
> live in the PMC’s on x86, but they still need to be programmed through
> perf. Does the first output just show what can be programmed through the
> PMC’s?
>
> Good question!

This is because perf_events will always overrides whatever priv level bits
you set in the attr->config field. As such, it has override capabilities.
Therefore, libpfm4 recognizes this and if you are encoding for perf_events,
it will remove u, k from the attributes offered by hw and move them to
attributes offered by software. It is just to avoid bad surprises and it
reflects how the system behaves. Had to write some code to handle this well!



> In any case, for the Power8, can we assume that the PMC itself cannot set
> the bits (i.e. we don’t know what they are) and since perf can, we just
> leverage the standard perf_event boilerplate code for other platforms to
> allow this to work? I’m thinking the perf_event_encode @ line 161
>
> Correct.


So someone from IBM or yourself just needs to send me a patch for
pfmlib_power*.c which sets up the .supported_plm bitmask based on what each
PMU actually supports which I am guessing would be PLM_PLM0| PLM_PLM3 at
the minimum, PFM_PLM* mask definitions in include/perfmon/pfmlib.h.
Let me know if you need more infos.

Thanks for tracking this down.
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://sdm.link/zohomanageengine
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to