Re: [perfmon2] Monitoring core and uncore events in the same testrun.

Rob Fowler Wed, 23 Sep 2009 20:38:18 -0700

I've routinely done a similar thing on Barcelonas.

Given the lack of dedicated uncore counters on the AMD, a system-wide session 
started
by a process pinned to one core measures 4 events in the shared resources (L3 
cache, memory
controller, HT).  A second session constrained to threads that run on the other 
three
cores measures a set of on-core events.


The semantics of uncore events are such that trying to associate these events
to a specific process/thread is of limited use and generates more confusion than
useful information.  For example, suppose I count L3 misses in a particular
region of one thread, either using calipers or sampling.  The counter will
be incremented by any miss across the chip, but only while that thread
is running.  The count (or number of samples) attributed to any section
of the code thus will have no relation to the behavior of the code in that 
section.
The count seen by any thread is also only a lower bound on the total number of
events that occured.  In the AMD implementation, Perfmon will let one look at 
L3 misses
in all the threads/processes in an application.  In that case, each time there
is a miss, all of the counters in the active threads are incremented.
Thus, the maximum number of misses seen by any thread is still a lower bound
on the global number.  If at least one thread in an application is active from
start to finish, then the sum of the counts across all threads is a very weak
upper bound that is also less than T times the global count.  To reiterate,
not every event is seen, many events are seen multiple times, and there's
at best a very weak link between events and code elements.

If the number of events in the shared resources is high and your performance
is bad, this is a problem only if the event rate is high enough to be a
bottleneck, i.e., the resource is fully utilized, and the thread you are
measuring is being delayed because it is waiting on the resource.  This requires
quantitatiave accuracy.

On the Intels, I'd like to see the option of creating a system-wide "uncore" 
session from one
core (e.g., from a process pinned to a socket or a core on a socket), and 
independently creating
other sessions that use on-core events/counters.  It would be gravy if the 
"uncore" session
could allow other threads on the socket to examine, but not write or control 
the uncore
counters.

Dan Terpstra wrote:
> Interestingly, there's a guy at ZIH Dresden who implemented a PAPI-C
> component specifically to measure events on the uncore. He never tried to
> measure both per-thread and uncore at the same time, and I doubt that it
> would work, but found it intriguing that he was able to get reasonable data.
> Like the dancing dog, it's not how well he dances, but that he dances at
> all...
> - d
> 
>> -----Original Message-----
>> From: stephane eranian [mailto:eran...@googlemail.com]
>> Sent: Wednesday, September 23, 2009 5:12 AM
>> To: gary.m...@bull.com
>> Cc: perfmon2-devel@lists.sourceforge.net
>> Subject: Re: [perfmon2] Monitoring core and uncore events in the same
>> testrun.
>>
>> Gary,
>>
>> Sorry for the delay.
>>
>> The reason there is a restriction with uncore PMU is because it is shared
>> by all cores on the socket. Given the model used by perfmon, i.e., event
>> are assigned to counters in user space, the kernel needs to enforce some
>> access control to ensure no two sessions try to use the same resource,
>> here uncore registers.
>>
>> The current implementation uses a coarse-grain access control policy:
>>    - only system-wide sessions can access uncore PMU
>>    - the first session to access uncore PMU, grabs it all
>>
>> The core and uncore PMU do not share any resource except the interrupt
>> vector. Theoretically we could allow distinct uncore and core sessions.
>>
>> Some people have also argued that allowing uncore access to per-thread
>> sessions may also be beneficial. The reason being that you'd want to know
>> what is going on around you. It could be hinting at what you are
>> experiencing
>> in your core. I believe this is similar to what you are trying to do with
>> your
>> measurement. I think this is a perfectly good reason to do this.
>>
>> Going back to your example of a system-wide session, I think it would be
>> easier
>> to add enough smart to the tool to suppress uncore events to all but
>> the first cpu
>> of each socket given the list of monitored cpus (either all or
>> --cpu-list). I think adding
>> this to pfmon may not  so trivial because of internal data structures,
>> but it is doable.
>>
>> The alternative has some problems because you would not return an error
>> when the
>> uncore registers are written. Thus applications would not be able to
>> tell apart whether
>> a zero value on read is because no event occurred or because the event
>> was suppressed.
>>
>> Another alternative would be to consider uncore session as a third
>> kind of sessions distinct
>> from system-wide. We would allow uncore sessions when there are
>> per-thread and system-wide
>> sessions. uncore sessions would only support uncore events, of course.
>> You would need a
>> distinct pfmon session for them.
>>
>>
>> On Fri, Sep 18, 2009 at 8:41 PM,  <gary.m...@bull.com> wrote:
>>> Stephane
>>>
>>> We would like to be able to collect both core and uncore counters with
>>> pfmon during
>>> the same test run.  This works (if you are careful) as shown below:
>>>
>>> [kirk] (hpctk) test_cases> pfmon --system-wide -u -k --cpu-list 0,1 -e
>>>
>> UNC_LLC_MISS:READ,UNHALTED_CORE_CYCLES,INSTRUCTIONS_RETIRED,FP_COMP_OPS_EX
>> E:SSE_FP
>>>  ./LoopTest
>>>
>>> .... application dribble ....
>>>
>>> CPU0                         12080 UNC_LLC_MISS:READ
>>> CPU0                         26709 UNHALTED_CORE_CYCLES
>>> CPU0                          9766 INSTRUCTIONS_RETIRED
>>> CPU0                             0 FP_COMP_OPS_EXE:SSE_FP
>>> CPU1                           197 UNC_LLC_MISS:READ
>>> CPU1                         29020 UNHALTED_CORE_CYCLES
>>> CPU1                         10715 INSTRUCTIONS_RETIRED
>>> CPU1                             0 FP_COMP_OPS_EXE:SSE_FP
>>>
>>> But our system also has cpu cores 2-15 which can not be included in the
>> cpu
>>> list
>>> because they share the same cpu socket as 0 or 1 so the uncore event
>> causes
>>> a problem creating the perfmon session on behalf of those cpu cores.
>>>
>>> Would it be possible for pfmon to detect when multiple cpu cores on the
>>> same
>>> socket are included in the cpu list then only put the uncore events in
>> the
>>> event
>>> list used when creating a session to the first cpu core on that socket.
>>> Then
>>> sessions to other cpu cores that share the same socket would contain
>> only
>>> the core events so that perfmon would allow sessions to all the cores.
>>>
>>> One other possible approach I considered is to leave pfmon alone and
>> change
>>> perfmon to just remove the uncore event from the event list when the
>>> session is
>>> created to the second cpu core on the same socket.  This could possibly
>> be
>>> done where the error is currently being detected and then allow the
>> session
>>> to be created with a subset of the events (minus all uncore events)
>>> requested by
>>> the caller.
>>>
>>> If either of these approaches could be implemented it would make it
>>> possible for
>>> us to get all the data we need in a single test run (and that makes sure
>>> the data is
>>> consistent and complete).
>>>
>>> Just interested in your thoughts.
>>> Gary
>>>
>>>
>>> ------------------------------------------------------------------------
>> ------
>>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>>> is the only developer event you need to attend this year. Jumpstart your
>>> developing skills, take BlackBerry mobile applications to market and
>> stay
>>> ahead of the curve. Join us from November 9&#45;12, 2009. Register
>> now&#33;
>>> http://p.sf.net/sfu/devconf
>>> _______________________________________________
>>> perfmon2-devel mailing list
>>> perfmon2-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>>>
>> --------------------------------------------------------------------------
>> ----
>> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
>> is the only developer event you need to attend this year. Jumpstart your
>> developing skills, take BlackBerry mobile applications to market and stay
>> ahead of the curve. Join us from November 9&#45;12, 2009. Register
>> now&#33;
>> http://p.sf.net/sfu/devconf
>> _______________________________________________
>> perfmon2-devel mailing list
>> perfmon2-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> 
> 
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
> http://p.sf.net/sfu/devconf
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

-- 
Robert J. Fowler
Chief Domain Scientist, HPC
Renaissance Computing Institute
The University of North Carolina at Chapel Hill
100 Europa Dr, Suite 540
Chapel Hill, NC 27517
V: 919.445.9670
F: 919 445.9669
r...@renci.org

------------------------------------------------------------------------------
Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9&#45;12, 2009. Register now&#33;
http://p.sf.net/sfu/devconf
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] Monitoring core and uncore events in the same testrun.

Reply via email to