Ken,

On Tue, Oct 20, 2009 at 4:06 PM, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
> Hello,
>
> We've gotten helpful replies on related problems a few months ago,
> so I hope someone here is able to help us out with this mystery too.
>
> We are comparing micro-operation counts between three generations
> of Intel processors, and are unable to make sense of them.
> I've attached a barplot graph to this mail, showing micro-ops per
> instruction rates for SPEC CPU2006 on the three different processor
> generations.
>
>
>
> The counts were obtained using the perfex tool that comes
> with the perfctr kernel patch, and using the following events:
> *) Intel Pentium 4: uops_retired (perfex -e )
> *) Intel Core 2: UOPS_RETIRED.ANY (perfex -e 0x410FC2)
> *) Intel Core i7: UOPS_RETIRED.ANY (perfex -e 0x4101C2)
>
> The thing we are unable to explain is that the micro-ops per instruction
> rate rises significantly when comparing Core i7 (Nehalem architecture)
> to Core 2 (Core architecture). And that while micro-op fusion is reported
> to be improved in the more recent Core i7 processors.
>
For Nehalem, things are a bit more complicated. Here is
what the documentation says:

C2H 01H UOPS_RETIRED.ANY
Counts the number of micro-ops
retired, (macro-fused=1, micro-
fused=2, others=1; maximum count
of 8 per cycle). Most instructions
are composed of one or two micro-
ops. Some instructions are decoded
into longer sequences such as
repeat instructions, floating point
transcendental instructions, and
assists.

You need to subtract the number of uops micro-fused. I think
there is another event for this.

Although, would you have a micro-benchmark that demonstrate
this behavior? That would help figure out what is going on.

I am not aware of an erratum for this event on Nehalem.

> Same goes for the significant drop when comparing Pentium 4 to Core 2:
> the uops/instr. rate drops from 1.41 on average on Pentium 4 to just 1.08
> on Core 2 (Core i7: 1.32 on average).
>
> This suggests that something might be wrong with some of the micro-ops
> counts were are getting through these specific events, most probably
> with the Core 2 counts being too low.
>
> I'd like to stress that the instruction counts are not to blame here; these
> match within a 1% range across the three different processor generations.
> Of course we'r using the exact same binaries on all three systems.
>
> Has anyone noticed similar issues when comparing micro-op counts
> across processor familes?
> Is anyone aware of bugs in the performance counters that might explain
> these numbers?
> Is there any way we can get some feedback from Intel, to make sure
> we are doing everything correctly, and maybe that's were stumbling into
> a known processor bug or some such?
>
> greetings,
>
> Kenneth
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> perfmon2-devel mailing list
> perfmon2-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to