Re: [perfmon2] weird counts for instruction cache misses on Core i7 (vs Core 2)

stephane eranian Tue, 30 Jun 2009 22:04:09 -0700

Kenneth,

Let me check on this with Intel.
It does not appear like there is a bug in either Core 2 or Core i7 event tables
based on existing documentation.



On Tue, Jun 30, 2009 at 2:45 PM, Kenneth Hoste<kenneth.ho...@ugent.be> wrote:
> Hello,
>
> Just now, Stijn (in CC), a colleague of mine, and I have been seeing
> some weird
> counts on a Core i7 machine for the SPEC CPU2000 and CPU2006 workloads,
> more specifically for the L1 instruction cache misses.
>
> Comparing the counts on Core i7 with those obtained on a Core 2, Stijn
> noticed
> unexpected differences, i.e. large overcounts for the Core i7. This is
> strange, because
> the L1 instruction caches on both types of processors are equally big
> (32k), and the more
> recent Core i7 has additional features such as a victim cache and a
> stream buffer cache.
> So, the counts should be (slightly?) lower instead of higher...
>
> I'm using the perfex tool that comes with the perfctr kernel patch on
> both systems,
> and also the pfmon tool on the Core i7 system to validate the counts.
> On the Core2, I'm using the L1I_MISSES event (event code 81h), on the
> Core i7
> I'm using the L1I.MISSES event (event code 80h with mask 02h).
> More specifically:
>
> *) Core 2:
>
> perfex -e 0x410081 ./gcc 200.i -o 200.s
>
> *) Core i7:
>
> perfex -e 0x410280 ./gcc 200.i -o 200.s
> and
> pfmon -e L1I:MISSES ./gcc 200.i -o 200.s
>
> One example is CPU2000's gcc with the 200.s reference input set.
>
> On the Core 2 we counted ~76M (million) L1-I misses. Also counting
> the cycles during which the instruction decoder is stalled due to the
> misses
> leads to an estimation of roughly 19 cycles penalty for each L1-I
> miss, which
> makes perfect sense, because the latency of the L2 cache is about 19
> cycles.
>
> On the Core i7 system we counted ~292M L1-I misses, thus a lot more
> than on the Core 2 system with the same L1-I cache size. Also counting
> cycles during which the decoder is stalled yields of penalty of  ~2.1
> cycles/miss,
> a surprisingly low number because the L2 cache latency is
> significantly higher.
>
> So, our conclusion is that the L1-I misses event on the Core i7 isn't
> counting what
> is claimed. The documentation says that the L1I.MISSES event also
> includes
> streaming buffer and victim cache misses, but to our knowledge those
> are only
> looked at if the request already misses the L1-I cache. And it says
> explicitly that
> every L1-I miss is only counted once...
>
> Does anyone have suggestions on what we might be seeing here? Is it
> a problem with the event, or are we misinterpreting what the event is
> actually counting?
>
> Any comments/suggestions are highly appreciated...
>
> greetings,
>
> Kenneth
> --
>
> Kenneth Hoste
> Paris research group - ELIS - Ghent University, Belgium
> email: kenneth.ho...@elis.ugent.be
> website: http://www.elis.ugent.be/~kehoste
> blog: http://boegel.kejo.be
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>

------------------------------------------------------------------------------
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Re: [perfmon2] weird counts for instruction cache misses on Core i7 (vs Core 2)

Reply via email to