Ken,

Sorry for late reply.

Thanks for this valuable information. This looks like a decent
approach. However it seems you may factor in uops issued on the
wrong speculative path. I am guessing that the impact of this
depends on the workload.



On Wed, Nov 4, 2009 at 6:13 PM, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
> Hi Stephane,
>
> On Oct 23, 2009, at 2:18 PM, stephane eranian wrote:
>
>> Ken,
>>
>> On Fri, Oct 23, 2009 at 1:41 PM, Kenneth Hoste <kenneth.ho...@ugent.be>
>> wrote:
>>>
>>> The Intel documentation for Core i7 suggests that this is not the case
>>> on Nehalem, i.e. uops fused through micro-fusion are counted as 2 uops
>>> (while macro-fused uops are counted as 1), like you mentioned above.
>>>
>>> We missed this point somehow in your first reply, sorry about that.
>>>
>>> Thus, in order to obtain sensible numbers, it does indeed seem that
>>> we need to figure out the number of micro-fused uops in Core i7, and
>>> subtract that from the UOPS_RETIRED.ANY count we have now...
>>>
>>> It seems like there is no event for directly counting retied micro-fused
>>> uops
>>> however (while there is for UOPS_RETIRED.MACRO_FUSED).
>>>
>>
>>> Are we missing something? Should we be able to get counts for retired
>>> micro-fused uops? If so, which event are you referring to? If not, any
>>> ideas
>>> on how we can obtain uops_retired counts on Core i7 where micro-fused
>>> uops are counted as 1?
>>>
>> It does not seem possible because you have 2 unknowns in the equation:
>>
>> uops_retired.any = uops_retired.normal + uops_retired.micro +
>> 2*uops_retired.micro.
>>
>> You are missing micro and normal.
>> I thought you could may be do it by comparing what goes in with what comes
>> out,
>> but that would only give you the wasted uops (on the wrong speculative
>> path).
>>
>> I will on that some more.
>> There are some information of interest in the Intel Optimization guide
>> (Appendix B2).
>>
>> http://www.intel.com/Assets/PDF/manual/248966.pdf
>
> We ended up using a pragmatic approach in order to obtaini
> sensible retired uop counts on Core i7, which are comparable
> to Core2 retired uop counts.
>
> As stated above, the problem is that micro-fused uops are counted as 2 in
> the
> UOPS_RETIRED.ANY event (as opposed to the Core 2 event), and that there
> is no way to obtain the retired micro-fused uops count separately.
>
> We ended up by estimating the amount of micro-fused uops by counting the
> number of issues uops, using UOPS_ISSUED.ANY and UOPS_ISSUED.FUSED.
> We figured out that the UOPS_ISSUED.FUSED only counts micro-fused uops
> as one.
>
> That way, we can estimate the amount of micro-ops in UOPS_RETIRED.ANY
> by using the UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY ratio, and thus
> correct the UOPS_RETIRED.ANY accordingly.
>
> In short, we can use the following as an estimate for retired uops where
> each type of uop (non-fused, macro-fused, micro-fused) is counted as one:
>
> UOPS_RETIRED.ANY * (1 - UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY)
>
>
> I hope that makes sense....
>
> greetings,
>
> Kenneth
>
>
>
>
>

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to