Kenneth, Let me check on this with Intel. It does not appear like there is a bug in either Core 2 or Core i7 event tables based on existing documentation.
On Tue, Jun 30, 2009 at 2:45 PM, Kenneth Hoste<kenneth.ho...@ugent.be> wrote: > Hello, > > Just now, Stijn (in CC), a colleague of mine, and I have been seeing > some weird > counts on a Core i7 machine for the SPEC CPU2000 and CPU2006 workloads, > more specifically for the L1 instruction cache misses. > > Comparing the counts on Core i7 with those obtained on a Core 2, Stijn > noticed > unexpected differences, i.e. large overcounts for the Core i7. This is > strange, because > the L1 instruction caches on both types of processors are equally big > (32k), and the more > recent Core i7 has additional features such as a victim cache and a > stream buffer cache. > So, the counts should be (slightly?) lower instead of higher... > > I'm using the perfex tool that comes with the perfctr kernel patch on > both systems, > and also the pfmon tool on the Core i7 system to validate the counts. > On the Core2, I'm using the L1I_MISSES event (event code 81h), on the > Core i7 > I'm using the L1I.MISSES event (event code 80h with mask 02h). > More specifically: > > *) Core 2: > > perfex -e 0x410081 ./gcc 200.i -o 200.s > > *) Core i7: > > perfex -e 0x410280 ./gcc 200.i -o 200.s > and > pfmon -e L1I:MISSES ./gcc 200.i -o 200.s > > One example is CPU2000's gcc with the 200.s reference input set. > > On the Core 2 we counted ~76M (million) L1-I misses. Also counting > the cycles during which the instruction decoder is stalled due to the > misses > leads to an estimation of roughly 19 cycles penalty for each L1-I > miss, which > makes perfect sense, because the latency of the L2 cache is about 19 > cycles. > > On the Core i7 system we counted ~292M L1-I misses, thus a lot more > than on the Core 2 system with the same L1-I cache size. Also counting > cycles during which the decoder is stalled yields of penalty of ~2.1 > cycles/miss, > a surprisingly low number because the L2 cache latency is > significantly higher. > > So, our conclusion is that the L1-I misses event on the Core i7 isn't > counting what > is claimed. The documentation says that the L1I.MISSES event also > includes > streaming buffer and victim cache misses, but to our knowledge those > are only > looked at if the request already misses the L1-I cache. And it says > explicitly that > every L1-I miss is only counted once... > > Does anyone have suggestions on what we might be seeing here? Is it > a problem with the event, or are we misinterpreting what the event is > actually counting? > > Any comments/suggestions are highly appreciated... > > greetings, > > Kenneth > -- > > Kenneth Hoste > Paris research group - ELIS - Ghent University, Belgium > email: kenneth.ho...@elis.ugent.be > website: http://www.elis.ugent.be/~kehoste > blog: http://boegel.kejo.be > > > ------------------------------------------------------------------------------ > _______________________________________________ > perfmon2-devel mailing list > perfmon2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > ------------------------------------------------------------------------------ _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel