On Wed, Jun 15, 2016 at 1:58 AM, Philip Mucci <mu...@icl.utk.edu> wrote:
> Hi Stephane (and Carl Love and others @ IBM)
>
> We’ve been doing some testing on Power8@Sandia and noticed that none of
> the perf qualifiers are valid on Power8 for the internal perf_pmu events
> and they don’t work (but are accepted) on the power8_pmu. However, these
> qualifiers do work with distros perf and thus something seems to be
> missing. I’ve pulled the latest libpfm just to be sure.
>
> First notice the uhk qualifiers are missing from the generic perf events…
>
> pjmucci@white26:~/libpfm4/lib$ LIBPFM_DEBUG=2 ../perf_examples/self
> cycles:u=1
> pfmlib_common.c (pfmlib_init_os.787): OS layer No OS (raw PMU) activated
> pfmlib_common.c (pfmlib_init_os.787): OS layer perf_event activated
> pfmlib_common.c (pfmlib_init_os.787): OS layer perf_event extended
> activated
> pfmlib_common.c (pfmlib_init_os.790): default OS layer: perf_event
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER4
> pfmlib_common.c (pfmlib_init_pmus.712): trying PPC970
> pfmlib_common.c (pfmlib_init_pmus.712): trying PPC970MP
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER5
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER5+
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER6
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER7
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWER8
> pfmlib_common.c (pfmlib_pmu_activate.654): activated POWER8
> pfmlib_common.c (pfmlib_init_pmus.712): trying IBM Power Torrent PMU
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWERPC_NEST_MCS_RD_BW
> pfmlib_common.c (pfmlib_pmu_activate.654): activated POWERPC_NEST_MCS_RD_BW
> pfmlib_common.c (pfmlib_init_pmus.712): trying POWERPC_NEST_MCS_WR_BW
> pfmlib_common.c (pfmlib_pmu_activate.654): activated POWERPC_NEST_MCS_WR_BW
> pfmlib_common.c (pfmlib_init_pmus.712): trying perf_events generic PMU
> pfmlib_perf_event_pmu.c (pfm_perf_pmu_supported_plm.133): guessing plm
> from power8 PMU plm=0x0
> pfmlib_common.c (pfmlib_pmu_activate.654): activated perf_events generic
> PMU
> pfmlib_common.c (pfmlib_init_pmus.712): trying perf_events raw PMU
> pfmlib_common.c (pfmlib_pmu_activate.654): activated perf_events raw PMU
> pfmlib_common.c (pfmlib_init_pmus.765): 5 PMU detected out of 13 supported
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 0 3 2 3 period
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 1 3 2 4 freq
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 2 3 2 5 precise
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 3 2 2 6 excl
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 4 2 2 7 mg
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 5 2 2 8 mh
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 6 3 2 9 cpu
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 0 7 2 2 10 pinned
> pfmlib_common.c (pfmlib_parse_event_attr.963): cannot find attribute u
> self: event cycles:u=1: invalid event attribute
> self: cannot setup events
>
> Ok, no dice. We can see the missing qualifiers. Let’s check if the kernel
> can do it...
>
> pjmucci@white26:~/libpfm4/lib$ perf stat -e cycles:u sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 372,007 cycles:u
>
>
> 1.002781228 seconds time elapsed
>
> pjmucci@white26:~/libpfm4/lib$ perf stat -e cycles:k sleep 1
>
> Performance counter stats for 'sleep 1':
>
> 1,476,676 cycles:k
>
>
> 1.001160681 seconds time elapsed
>
> pjmucci@white26:~/libpfm4/lib$
>
> These look good! So there’s one bug… Now let’s try libpfm’s power8_pmu
> event, PMU_CYC:u=1
>
> pjmucci@white26:~/libpfm4/lib$ LIBPFM_DEBUG=2 ../perf_examples/self
> PM_RUN_CYC:u=1
> <clipped>
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 0 2 2 0 u
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 1 2 2 1 k
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 2 2 2 2 h
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 3 3 2 3 period
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 4 3 2 4 freq
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 5 2 2 6 excl
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 6 2 2 7 mg
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 7 2 2 8 mh
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 8 3 2 9 cpu
> pfmlib_common.c (pfmlib_build_event_pattrs.1114): 917 9 2 2 10 pinned
> pfmlib_common.c (pfmlib_parse_event.1294): 917 0 0 u
> INITIAL: 4,605 PM_RUN_CYC:u=1 (0.00% scaling, raw=4,605,
> ena=2,100, run=2,100)
> Final counts:
> FINAL: 38,666,160,882 PM_RUN_CYC:u=1 (0.00% scaling,
> raw=38,666,160,882, ena=9,999,885,066, run=9,999,885,066)
>
> Aha! It took the qualifier… but we don’t know if the number is
> decent...Lets try measuring kernel cycles.
>
> pjmucci@white26:~/libpfm4/lib$ LIBPFM_DEBUG=2 ../perf_examples/self
> PM_RUN_CYC:u=0:k=1
> <clipped>
> pfmlib_common.c (pfmlib_parse_event.1294): 917 0 0 u
> pfmlib_common.c (pfmlib_parse_event.1294): 917 1 1 k
> INITIAL: 4,277 PM_RUN_CYC:u=0:k=1 (0.00% scaling,
> raw=4,277, ena=1,876, run=1,876)
> Final counts:
> FINAL: 38,666,014,163 PM_RUN_CYC:u=0:k=1 (0.00% scaling,
> raw=38,666,014,163, ena=9,999,846,334, run=9,999,846,334)
>
> And just for completeness… Just k=1...
>
> pfmlib_common.c (pfmlib_parse_event.1294): 917 0 1 k
> INITIAL: 4,922 PM_RUN_CYC:k=1 (0.00% scaling, raw=4,922,
> ena=2,218, run=2,218)
> Final counts:
> FINAL: 38,666,064,958 PM_RUN_CYC:k=1 (0.00% scaling,
> raw=38,666,064,958, ena=9,999,862,602, run=9,999,862,602)
>
> So, it seems a few problems.
>
> 1) qualifiers for the perf_pmu (ukh) are not supported
> 2) qualifiers for the power8_pmu (ukh) are supported but not working/being
> passed to the kernel
>
First problem I see is that the power8 support does not list hardware priv
level filters. Yet if the kernel does it, it means the counters support it.
You have to distinguish the perf_events modifiers from the hw modifiers. A
perf_event modifier is for instance freq=, period=.
The examples/showevtinfo shows by default only the hw modifiers and they
are marked as 'PMU', for instance:
IDX : 155189273
PMU name : ivb (Intel Ivy Bridge)
Name : L1D
Equiv : None
Flags : None
Desc : L1D cache
Code : 0x51
Umask-00 : 0x01 : PMU : [REPLACEMENT] : [default] : Number of cache lines
brought into the L1D cache
Modif-00 : 0x00 : PMU : [k] : monitor at priv level 0 (boolean)
Modif-01 : 0x01 : PMU : [u] : monitor at priv level 1, 2, 3 (boolean)
Modif-02 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1)
(boolean)
Modif-03 : 0x03 : PMU : [i] : invert (boolean)
Modif-04 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-05 : 0x05 : PMU : [t] : measure any thread (boolean)
Here k,u,,i,c,t are hw modifiers.
To print hw + perf use:
$ showevtinfo -O perf_ext
IDX : 155189273
PMU name : ivb (Intel Ivy Bridge)
Name : L1D
Equiv : None
Flags : None
Desc : L1D cache
Code : 0x51
Umask-00 : 0x01 : PMU : [REPLACEMENT] : [default] : Number of cache lines
brought into the L1D cache
Modif-00 : 0x02 : PMU : [e] : edge level (may require counter-mask >= 1)
(boolean)
Modif-01 : 0x03 : PMU : [i] : invert (boolean)
Modif-02 : 0x04 : PMU : [c] : counter-mask in range [0-255] (integer)
Modif-03 : 0x05 : PMU : [t] : measure any thread (boolean)
Modif-04 : 0x00 : perf_event : [u] : monitor at user level (boolean)
Modif-05 : 0x01 : perf_event : [k] : monitor at kernel level (boolean)
Modif-06 : 0x03 : perf_event : [period] : sampling period (integer)
Modif-07 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
Modif-08 : 0x06 : perf_event : [excl] : exclusive access (boolean)
Modif-09 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
Modif-10 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
Modif-11 : 0x09 : perf_event : [cpu] : CPU to program (integer)
Modif-12 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
Notice the new modifiers, they are marked 'perf_event'. Also notice the the
u,k, mg, mh are controlled by perf_event and now hw.
This is because libpfm4 can make the distinction (and the encodings) for
different OS targets or raw (i.e., whatever the hardware offers).
In the case of perf_event kernel interface, the perf_event u,k modifiers
override the hw settings.
Going back to power8::
$ showevtinfo -O perf_ext power8::
IDX : 228589659
PMU name : power8 (POWER8)
Name : PM_CYC
Equiv : None
Flags : None
Desc : Cycles .
Code : 0x1e
Modif-00 : 0x00 : perf_event : [u] : monitor at user level (boolean)
Modif-01 : 0x01 : perf_event : [k] : monitor at kernel level (boolean)
Modif-02 : 0x02 : perf_event : [h] : monitor at hypervisor level (boolean)
Modif-03 : 0x03 : perf_event : [period] : sampling period (integer)
Modif-04 : 0x04 : perf_event : [freq] : sampling frequency (Hz) (integer)
Modif-05 : 0x06 : perf_event : [excl] : exclusive access (boolean)
Modif-06 : 0x07 : perf_event : [mg] : monitor guest execution (boolean)
Modif-07 : 0x08 : perf_event : [mh] : monitor host execution (boolean)
Modif-08 : 0x09 : perf_event : [cpu] : CPU to program (integer)
Modif-09 : 0x0a : perf_event : [pinned] : pin event to counters (boolean)
You see the support for the perf_event modifiers.
Next step is to verify what libpfm4 encodes in the perf_event_attr when you
say PM_CYC:u.
For this, you need to enable VERBOSE and not DEBUG
LIBPFM_VERBOSE=1 examples/task -e PM_CYC:u ls
I looked through the code, but it was not intuitive how this all gets
> handled.
>
> This is normally handled in pfmlib_perf_event.c: pfmlib_perf_event_encode()
Thoughts?
>
> Try what I suggest and report back to me.
> Thanks
>
> P.S. I can’t figure out how these qualifiers are supposed to work in
> check_events, when we always say the OS_encoding is PFM_OS_NONE...
>
>
> check_events only encodes for raw hardware and not perf_event OS, that's
why it is not in the perf_examples subdir. So it will not work with u, k
modifiers because IBM did not provide support for raw encodings of these
modifiers, even though I am sure they exist. That could be fixed in a
patch....
Hope this helps.
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning
> reports.
> http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports. http://pubads.g.doubleclick.net/gampad/clk?id=1444514421&iu=/41014381
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel