On Oct 21, 2009, at 1:56 PM, stephane eranian wrote:

> On Wed, Oct 21, 2009 at 9:46 AM, Kenneth Hoste  
> <kenneth.ho...@ugent.be> wrote:
>> Hi Stephane,
>>
>> Thanks for your quick reply. Some clarification and further questions
>> below...
>>
>> On Oct 21, 2009, at 8:41 AM, stephane eranian wrote:
>>
>>> Ken,
>>>
>>> On Tue, Oct 20, 2009 at 4:06 PM, Kenneth Hoste <kenneth.ho...@ugent.be 
>>> >
>>> wrote:
>>>>
>>>> The thing we are unable to explain is that the micro-ops per  
>>>> instruction
>>>> rate rises significantly when comparing Core i7 (Nehalem  
>>>> architecture)
>>>> to Core 2 (Core architecture). And that while micro-op fusion is  
>>>> reported
>>>> to be improved in the more recent Core i7 processors.
>>>>
>>> For Nehalem, things are a bit more complicated. Here is
>>> what the documentation says:
>>>
>>> C2H 01H UOPS_RETIRED.ANY
>>> Counts the number of micro-ops
>>> retired, (macro-fused=1, micro-
>>> fused=2, others=1; maximum count
>>> of 8 per cycle). Most instructions
>>> are composed of one or two micro-
>>> ops. Some instructions are decoded
>>> into longer sequences such as
>>> repeat instructions, floating point
>>> transcendental instructions, and
>>> assists.
>>>
>>> You need to subtract the number of uops micro-fused. I think
>>> there is another event for this.
>>
>> It's unclear to me why we would need to substract the number of uops
>> micro-fused...
>> Could you elaborate on this?
>
> My mistake, I think the count is correct in that it give you the  
> number of
> uops that would have retired without fusion. 2 fused micro-ops =  
> increment
> of 2.
>
> As for Core 2, there are some errata but that's for instructions  
> retired.
> Did you try breaking down ops_retired:any, into its various components
> to see how they add up?

We think you're mentioning something important here...

It seems Core 2 might be counting uops *after* fusion, i.e. counting
multiple uops that got fused as one single uop.

The Intel documentation for Core i7 suggests that this is not the case
on Nehalem, i.e. uops fused through micro-fusion are counted as 2 uops
(while macro-fused uops are counted as 1), like you mentioned above.

We missed this point somehow in your first reply, sorry about that.

Thus, in order to obtain sensible numbers, it does indeed seem that
we need to figure out the number of micro-fused uops in Core i7, and
subtract that from the UOPS_RETIRED.ANY count we have now...

It seems like there is no event for directly counting retied micro- 
fused uops
however (while there is for UOPS_RETIRED.MACRO_FUSED).

Are we missing something? Should we be able to get counts for retired
micro-fused uops? If so, which event are you referring to? If not, any  
ideas
on how we can obtain uops_retired counts on Core i7 where micro-fused
uops are counted as 1?

Thanks for your help,

Kenneth

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to