On Tue, 9 Sep 2014, Gary Mohr wrote: > --- ls output removed --- > > Performance counter stats for '/bin/ls': > > 5,625 uncore_cbox_0/event=0x35,umask=0xa/ > [26.27%] > <not supported> uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/ > > 0.002038929 seconds time elapsed > > > So this behaved similar to PAPI/libpfm4. The first event returned a count > and the second event got an error. > Just for fun, I used the same events in the opposite order: > > > perf stat -a -e > \{"uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/","uncore_cbox_0/event=0x35,umask=0xa/"\} > /bin/ls > > --- ls output removed --- > > Performance counter stats for '/bin/ls': > > <not counted> uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/ > <not supported> uncore_cbox_0/event=0x35,umask=0xa/ > > 0.002003219 seconds time elapsed > > > This caused both events to report an error. This seems to me like a kernel > problem. I also tried using each event by itself and they both returned > counts. With PAPI/libpfm4 I believe that this test will return a count for > the first event and an error on the second. > You implied that the { } 's may influence if or how events are grouped. So I > tried the command again in the original order without the { } characters and > got this: > > > perf stat -a -e > "uncore_cbox_0/event=0x35,umask=0xa/","uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/" > /bin/ls > > --- ls output removed --- > > Performance counter stats for '/bin/ls': > > 57,288 uncore_cbox_0/event=0x35,umask=0xa/ > [18.05%] > 158,292 uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/ > [ 3.07%] > > 0.001963151 seconds time elapsed > > > Both events give a count. I have never seen this result with PAPI/libpfm4 > but I have never tried them with grouping enabled when calling the kernel. > > In PAPI we turned grouping off so that the kernel would allow us to use > events from different uncore pmu's at the same time. I can try turning it > back on and running these two events to see what happens. If they work, > maybe a better solution is to try a hybrid form of grouping. We could create > a different group for each uncore pmu and put all the events associated with > a given pmu into that pmu's group. We would then call the kernel once for > each group rather than once for each event as we are doing now. > > Any idea if the kernel will let us play the game this way ??
Interesting, I'll have to run some more tests on my Sandybridge-EP machine. What kernel are you running again? I'm testing on a machine running 3.14 so possibly there were scheduling bugs with older kernels that were fixed at some point. When running both with and without {} I get something like: Performance counter stats for 'system wide': 606 uncore_cbox_0/event=0x35,umask=0xa/ [99.61%] 247 uncore_cbox_0/event=0x35,umask=0x4a,filter_nid=0x1/ 0.000851895 seconds time elapsed Which makes it look like it's multiplexing the events in some sort of way I'm not really following, maybe to avoid a scheduling issue. Vince ------------------------------------------------------------------------------ Want excitement? Manually upgrade your production database. When you want reliability, choose Perforce. Perforce version control. Predictably reliable. http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel