On Tue, 9 Sep 2014, Gary Mohr wrote:
> --- ls output removed ---
>
> Performance counter stats for '/bin/ls':
>
> 5,625 uncore_cbox_0/event=0x35,umask=0xa/
> [26.27%]
> <not supported> uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/
>
> 0.002038929 seconds time elapsed
>
>
> So this behaved similar to PAPI/libpfm4. The first event returned a count
> and the second event got an error.
> Just for fun, I used the same events in the opposite order:
>
>
> perf stat -a -e
> \{"uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/","uncore_cbox_0/event=0x35,umask=0xa/"\}
> /bin/ls
>
> --- ls output removed ---
>
> Performance counter stats for '/bin/ls':
>
> <not counted> uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/
> <not supported> uncore_cbox_0/event=0x35,umask=0xa/
>
> 0.002003219 seconds time elapsed
>
>
> This caused both events to report an error. This seems to me like a kernel
> problem. I also tried using each event by itself and they both returned
> counts. With PAPI/libpfm4 I believe that this test will return a count for
> the first event and an error on the second.
> You implied that the { } 's may influence if or how events are grouped. So I
> tried the command again in the original order without the { } characters and
> got this:
>
>
> perf stat -a -e
> "uncore_cbox_0/event=0x35,umask=0xa/","uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/"
> /bin/ls
>
> --- ls output removed ---
>
> Performance counter stats for '/bin/ls':
>
> 57,288 uncore_cbox_0/event=0x35,umask=0xa/
> [18.05%]
> 158,292 uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/
> [ 3.07%]
>
> 0.001963151 seconds time elapsed
>
>
> Both events give a count. I have never seen this result with PAPI/libpfm4
> but I have never tried them with grouping enabled when calling the kernel.
>
> In PAPI we turned grouping off so that the kernel would allow us to use
> events from different uncore pmu's at the same time. I can try turning it
> back on and running these two events to see what happens. If they work,
> maybe a better solution is to try a hybrid form of grouping. We could create
> a different group for each uncore pmu and put all the events associated with
> a given pmu into that pmu's group. We would then call the kernel once for
> each group rather than once for each event as we are doing now.
>
> Any idea if the kernel will let us play the game this way ??
Interesting, I'll have to run some more tests on my Sandybridge-EP
machine.
What kernel are you running again? I'm testing on a machine running 3.14
so possibly there were scheduling bugs with older kernels that were fixed
at some point.
When running both with and without {} I get something like:
Performance counter stats for 'system wide':
606 uncore_cbox_0/event=0x35,umask=0xa/
[99.61%]
247 uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/
0.000851895 seconds time elapsed
Which makes it look like it's multiplexing the events in some sort of way
I'm not really following, maybe to avoid a scheduling issue.
Vince
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel