Gary,
Here is a better way of doing this. In the libpfm4 perf_event layer, there
was already provision
to handle the cpu to program. With this, you can simply do:
unhalted_core_cycles:cpu=2
The cpu index is returned by the encoding call, in arg.cpu, up to you to
use it.
Please try the attached patch on top of libpfm4 git tree and let me know if
it helps solve your
problem.
On Thu, Apr 17, 2014 at 8:57 PM, Gary Mohr <gary.m...@bull.com> wrote:
> Hi Stephane,
>
>
>
> Thanks for the reply.
>
>
>
> I understand that uncore events are system wide and that the cpu number
> being passed is done to allow the kernel to identify which package should
> be used for an uncore event. Just as a note, I am running on a Redhat
> kernel which includes backported uncore support. In this kernel there are
> no /sys/device/uncore_xxx/cpumask files, but uncore counters work just
> fine. Apparently these files were added after the initial uncore logic was
> put in.
>
>
>
> Papi provides a way to set a cpu number to be used for all events in an
> event set. In order to use it a papi application must call PAPI_set_opt to
> attach a cpu to the event set. When using this interface however, it is
> only possible to count events on one uncore package at a time. This is
> because the attached cpu is used for all events in an event set and papi
> only allows one event set per component to be active at any given time.
>
>
>
> If the cpu can be specified as a mask for uncore events, it allows the
> user to count uncore events on multiple packages at the same time. I hope
> this explains why converting from the current way papi sets a cpu number
> (API call) to the new approach (event mask) is worthwhile. The other
> advantage of using an event mask to specify the core number is that it
> automatically extends to all existing papi applications the ability to use
> uncore events without having to change a single line of code in any of
> them. Papi also has a couple of other cases where moving an event
> attributes scope from the event set to an event would be desirable.
>
>
>
> So the question is really about the best way to extend the set of
> supported event masks to include some that contain information to be
> processed by papi. It is true that this could be done without making
> changes to libpfm4 but it is my belief that doing so would make the papi
> code much more complicated and one of them (adding new event string
> delimiters) may even introduce problems in existing papi applications.
>
>
>
> So it was my feeling that if libpfm4 could be enhanced to provide an
> additional interface, it would make this much easier to handle in papi.
> One of my requirements for this change was that all existing calls to
> libpfm4 must continue to work exactly as they already do. Did not want to
> have any adverse impact on any other possible users of libpfm4. But I also
> feel that if there is something libpfm4 can do to help papi evolve into a
> better product, it is reasonable to modify libpfm4 (as long as it does not
> break any existing libpfm4 features).
>
>
>
> I implemented the new interface I had in mind and changed the papi uncore
> component to use it. When I got it working, it provided what I hoped it
> would. An unchanged existing papi application (papi_command_line) was able
> to use uncore events and count events on multiple packages at the same
> time. Furthermore the core component in papi still uses the old interface
> to libpfm4 and it also still works correctly.
>
>
>
> This implementation makes fairly small changes in both libpfm4 and papi to
> make this work. Since all existing interfaces in libpfm4 were preserved, I
> do not see any risk to libfpm4.
>
>
>
> Phil’s response to your message below gave a pretty good overview of what
> the changes did. I hope that my explanations above help you to understand
> why I choose to take this approach.
>
>
>
> I am willing to go back and do it a different way if you are not happy
> with these changes. But I still feel this is the best solution and would
> like you to understand my thought process and the changes before making
> that choice. So if you have any more questions with the approach or the
> code, I will be happy to answer them.
>
>
>
> Thanks again for your time.
>
> Gary
>
>
>
>
>
>
>
> *From:* Stephane Eranian [mailto:eran...@googlemail.com]
> *Sent:* Thursday, April 17, 2014 8:00 AM
> *To:* Gary Mohr
> *Cc:* perfmon2-devel
>
> *Subject:* Re: [perfmon2] FW: Proposed enhancement to libpfm4.
>
>
>
> Gary,
>
>
>
>
>
> I am trying to understand the underpinning here. You are saying there is
> no way to pass a CPU to the PAPI call
>
> to pin an uncore event to a particular socket.
>
>
>
> First, uncore events are system-wide only events. This is why you need to
> pass a CPU number (as a substitute
>
> for a socket number). Second, the kernel always exports a list of CPUs to
> monitor for each uncore PMU. It is
>
> located in /sys/device/uncore_xxx/cpumask.
>
>
>
> I don't really like the libpfm4 changes you are proposing. They do not
> make sense to me because you are trying
>
> to work around a limitation of PAPI by modifying libpfm4.
>
>
>
> My understanding is that PAPI is not designed to handle system-wide
> events. System-wide events require a CPU
>
> number. So why not extend PAPI to handle this instead so it would work
> with or without libpfm4? I understand it
>
> would break existing tools, but then those tools are not ready to cope
> with CPU or socket-level measurements, maybe.
>
>
>
>
>
> On Thu, Apr 10, 2014 at 5:40 PM, Gary Mohr <gary.m...@bull.com> wrote:
>
> Also send this to the perfmon mailing list.
>
>
>
> *From:* Gary Mohr
> *Sent:* Wednesday, April 09, 2014 4:56 PM
> *To:* Stephane Eranian
> *Cc:* Vince Weaver; Philip Mucci; Heike McCraw; Michel Brown
> *Subject:* Proposed enhancement to libpfm4.
>
>
>
> Hi Stephane,
>
>
>
> There has been quite a bit of discussion in the PAPI community lately
> regarding ways to make the PAPI uncore component useful to existing PAPI
> applications.
>
>
>
> A short description of the problem:
>
>
>
> The kernel requires a cpu number to be provided on the open when setting
> up to count uncore events (used by kernel to pick the package/socket to
> count).
>
> PAPI currently provides a way to set a cpu number but it requires a call
> to PAPI_set_opt which existing papi applications that are currently used
> with core events almost never use.
>
> This means that existing PAPI applications cannot use uncore events
> without coding changes.
>
>
>
> Possible solution:
>
>
>
> Change the uncore event string to include information to specify the core
> number that should pass to the kernel for this event.
>
> PAPI applications normally get the event to use from a user or config
> file, so they would have access to uncore events if the user just adds a
> little extra information to the event string.
>
>
>
> Two approaches were considered:
>
>
>
> 1 -- The event name could be extended to include a package component.
> This would result in the event names being replicated once for each package
> on the system.
>
> 2 -- A new event mask could be added to provide the number of the core
> which should be passed to the kernel for the event.
>
>
>
> Since the SNBEP system already has 315 uncore events, replicating them for
> each package could lead to over 1200 different event names. The current
> list output for uncore events on this system produces 6,000+ lines of
> output. Replicating each event could drive that to about 24,000 lines of
> output. This makes the first approach less than desirable.
>
>
>
> A new mask for the uncore events could be added to identify which core
> number should be passed to the kernel. But this information is needed by
> PAPI and does not end up in the attribute structure built by libpfm4 and
> passed to the kernel by PAPI. This means that we would be introducing a
> mask that should be processed by PAPI and not libpfm4. The new mask
> approach would have no effect on the number of events and little or no
> effect on the list output. So it seemed to be the preferred approach.
>
>
>
> In addition during these discussions, it was felt that a small number of
> other PAPI attributes could also be handled with PAPI specific event masks
> rather than through independent API calls (as is required today). This
> encouraged looking for a general solution.
>
>
>
> Two different approaches for adding a mask have been considered:
>
>
>
> 1 -- Modify PAPI to prescan the event strings to remove and process the
> new mask.
>
> 2 – Enhance libpfm4 to allow event strings which contain masks it does
> know about.
>
>
>
> The first approach probably can be done but there is some concern that if
> PAPI prescreens and removes some of the event masks, it may remove masks
> that would have been meaningful to libpfm4. This would be undesirable but
> could be avoided with careful PAPI mask names.
>
>
>
> The idea behind the second approach is to add a feature to libpfm4 which
> would allow PAPI to pass an event string which contains some masks which
> libpfm4 may not understand. When this is done, libpfm4 would be able to
> return a table to the caller which contains the events libpfm4 did not
> recognize. When using this new feature, libpfm4 would not consider an
> unknown mask as an error. It would just return unprocessed masks to the
> caller and let the caller decide if those masks were valid. This provides
> PAPI with an easy way to extend the set of event masks an application can
> use. Of course when this new feature is not being used, libpfm4 would
> continue to behave exactly as it has in the past.
>
>
>
> I spent some time adding this feature to libpfm4 and now have it working.
> The end result is that I can now use papi_command_line to count uncore
> events without any changes to the application.
>
>
>
> A high level summary of what I did to libpfm4:
>
>
>
> I created two new libpfm4 functions which provide the same service as two
> existing functions but accept an additional calling argument. The
> additional calling argument is a pointer to a table where libpfm4 can store
> any unprocessed masks. The new functions are pfm_find_event_mask and
> pfm_get_os_event_encoding_mask. The current function names also still
> exist and just call the new functions passing a NULL pointer for the
> unprocessed masks table. Then the code in these new functions was changed
> to handle the case where it finds an unrecognized mask so it now behaves as
> described above.
>
>
>
> Attached you will find a patch file that contains the libpfm4 changes that
> I made (code is always more interesting than descriptions).
>
>
>
> I am hoping to persuade you that this code is worth putting into libpfm4
> but in either case, I am interested in your views on the topic.
>
>
>
> There are still a few things in these patches that I think should be
> changed to make it more robust but if you are in agreement with this
> approach, I will gladly adjust it to meet expectations.
>
>
>
> I hope I did not bore you too much with details but I thought some of the
> background to explain why something in this area is needed was important.
>
>
>
> Thanks
>
> Gary
>
>
>
>
>
>
>
> ------------------------------------------------------------------------------
> Put Bad Developers to Shame
> Dominate Development with Jenkins Continuous Integration
> Continuously Automate Build, Test & Deployment
> Start a new project now. Try Jenkins in the cloud.
> http://p.sf.net/sfu/13600_Cloudbees
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
>
>
From 31cacd606cd933aa97bb728db87efdd70447b78c Mon Sep 17 00:00:00 2001
From: Stephane Eranian <eran...@gmail.com>
Date: Fri, 18 Apr 2014 14:11:36 +0200
Subject: [PATCH] activate perf_event_ext cpu= modifier
Enable cpu=X perf_events modifier on events.
The cpu number is returned in the pfm_perf_encode_arg.cpu
field. If modifier not set in event string, then field has
value -1.
Signed-off-by: Stephane Eranian <eran...@gmail.com>
---
include/perfmon/pfmlib_perf_event.h | 2 +-
lib/pfmlib_perf_event.c | 11 +++++++++++
lib/pfmlib_perf_event_priv.h | 2 ++
perf_examples/perf_util.c | 3 ++-
perf_examples/perf_util.h | 1 +
5 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/perfmon/pfmlib_perf_event.h b/include/perfmon/pfmlib_perf_event.h
index d4620f2..8b3dae2 100644
--- a/include/perfmon/pfmlib_perf_event.h
+++ b/include/perfmon/pfmlib_perf_event.h
@@ -37,7 +37,7 @@ typedef struct {
char **fstr; /* out/in: fully qualified event string */
size_t size; /* sizeof struct */
int idx; /* out: opaque event identifier */
- int cpu; /* out: cpu to program */
+ int cpu; /* out: cpu to program, -1 = not set */
int flags; /* out: perf_event_open() flags */
int pad0; /* explicit 64-bit mode padding */
} pfm_perf_encode_arg_t;
diff --git a/lib/pfmlib_perf_event.c b/lib/pfmlib_perf_event.c
index 4458d98..dbc5dd9 100644
--- a/lib/pfmlib_perf_event.c
+++ b/lib/pfmlib_perf_event.c
@@ -25,6 +25,7 @@
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
+#include <limits.h>
#include <perfmon/pfmlib_perf_event.h>
#include "pfmlib_priv.h"
@@ -67,6 +68,7 @@ static const pfmlib_attr_desc_t perf_event_ext_mods[]={
PFM_ATTR_B("excl", "exclusive access"), /* exclusive PMU access */
PFM_ATTR_B("mg", "monitor guest execution"), /* monitor guest level */
PFM_ATTR_B("mh", "monitor host execution"), /* monitor host level */
+ PFM_ATTR_I("cpu", "CPU to program"), /* CPU to program */
PFM_ATTR_NULL /* end-marker to avoid exporting number of entries */
};
@@ -84,6 +86,7 @@ pfmlib_perf_event_encode(void *this, const char *str, int dfl_plm, void *data)
uint64_t ival;
int has_plm = 0, has_vmx_plm = 0;
int i, plm = 0, ret, vmx_plm = 0;
+ int cpu = -1;
sz = pfmlib_check_struct(uarg, uarg->size, PFM_PERF_ENCODE_ABI0, sz);
if (!sz)
@@ -203,6 +206,11 @@ pfmlib_perf_event_encode(void *this, const char *str, int dfl_plm, void *data)
vmx_plm |= PFM_PLM0;
has_vmx_plm = 1;
break;
+ case PERF_ATTR_CPU:
+ if (ival >= INT_MAX)
+ return PFM_ERR_ATTR_VAL;
+ cpu = (int)ival;
+ break;
}
}
/*
@@ -251,6 +259,9 @@ pfmlib_perf_event_encode(void *this, const char *str, int dfl_plm, void *data)
*/
arg.idx = pfmlib_pidx2idx(e.pmu, e.event);
+ /* propagate cpu */
+ arg.cpu = cpu;
+
/* propagate our changes, that overwrites attr->size */
memcpy(uarg->attr, attr, asz);
diff --git a/lib/pfmlib_perf_event_priv.h b/lib/pfmlib_perf_event_priv.h
index 0063c77..ee2afc7 100644
--- a/lib/pfmlib_perf_event_priv.h
+++ b/lib/pfmlib_perf_event_priv.h
@@ -35,6 +35,7 @@
#define PERF_ATTR_EX 6 /* exclusive event */
#define PERF_ATTR_MG 7 /* monitor guest execution */
#define PERF_ATTR_MH 8 /* monitor host execution */
+#define PERF_ATTR_CPU 9 /* CPU to program */
#define _PERF_ATTR_U (1 << PERF_ATTR_U)
#define _PERF_ATTR_K (1 << PERF_ATTR_K)
@@ -45,6 +46,7 @@
#define _PERF_ATTR_EX (1 << PERF_ATTR_EX)
#define _PERF_ATTR_MG (1 << PERF_ATTR_MG)
#define _PERF_ATTR_MH (1 << PERF_ATTR_MH)
+#define _PERF_ATTR_CPU (1 << PERF_ATTR_CPU)
#define PERF_PLM_ALL (PFM_PLM0|PFM_PLM3|PFM_PLMH)
diff --git a/perf_examples/perf_util.c b/perf_examples/perf_util.c
index a5635d5..5a5d761 100644
--- a/perf_examples/perf_util.c
+++ b/perf_examples/perf_util.c
@@ -83,7 +83,7 @@ perf_setup_argv_events(const char **argv, perf_event_desc_t **fds, int *num_fds)
}
/* ABI compatibility, set before calling libpfm */
fd[num].hw.size = sizeof(fd[num].hw);
-
+
memset(&arg, 0, sizeof(arg));
arg.attr = &fd[num].hw;
arg.fstr = &fd[num].fstr; /* fd[].fstr is NULL */
@@ -97,6 +97,7 @@ perf_setup_argv_events(const char **argv, perf_event_desc_t **fds, int *num_fds)
fd[num].name = strdup(*argv);
fd[num].group_leader = group_leader;
fd[num].idx = arg.idx;
+ fd[num].cpu = arg.cpu;
num++;
argv++;
diff --git a/perf_examples/perf_util.h b/perf_examples/perf_util.h
index 04704bd..4571500 100644
--- a/perf_examples/perf_util.h
+++ b/perf_examples/perf_util.h
@@ -41,6 +41,7 @@ typedef struct {
int fd;
int max_fds;
int idx; /* opaque libpfm event identifier */
+ int cpu; /* cpu to program */
char *fstr; /* fstr from library, must be freed */
} perf_event_desc_t;
--
1.7.9.5
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel