On Nov 17, 2009, at 2:57 AM, stephane eranian wrote:

> Ken,
> 
> Sorry for late reply.
> 
> Thanks for this valuable information. This looks like a decent
> approach. However it seems you may factor in uops issued on the
> wrong speculative path. I am guessing that the impact of this
> depends on the workload.

Yes, the uops issued count does include uops on speculative paths.
That's why the ratio obtained in an estimate for the actual ratio in the
retired uops count.
It's not unlikely that estimate is pretty close though...

K.

> 
> 
> 
> On Wed, Nov 4, 2009 at 6:13 PM, Kenneth Hoste <kenneth.ho...@ugent.be> wrote:
>> Hi Stephane,
>> 
>> On Oct 23, 2009, at 2:18 PM, stephane eranian wrote:
>> 
>>> Ken,
>>> 
>>> On Fri, Oct 23, 2009 at 1:41 PM, Kenneth Hoste <kenneth.ho...@ugent.be>
>>> wrote:
>>>> 
>>>> The Intel documentation for Core i7 suggests that this is not the case
>>>> on Nehalem, i.e. uops fused through micro-fusion are counted as 2 uops
>>>> (while macro-fused uops are counted as 1), like you mentioned above.
>>>> 
>>>> We missed this point somehow in your first reply, sorry about that.
>>>> 
>>>> Thus, in order to obtain sensible numbers, it does indeed seem that
>>>> we need to figure out the number of micro-fused uops in Core i7, and
>>>> subtract that from the UOPS_RETIRED.ANY count we have now...
>>>> 
>>>> It seems like there is no event for directly counting retied micro-fused
>>>> uops
>>>> however (while there is for UOPS_RETIRED.MACRO_FUSED).
>>>> 
>>> 
>>>> Are we missing something? Should we be able to get counts for retired
>>>> micro-fused uops? If so, which event are you referring to? If not, any
>>>> ideas
>>>> on how we can obtain uops_retired counts on Core i7 where micro-fused
>>>> uops are counted as 1?
>>>> 
>>> It does not seem possible because you have 2 unknowns in the equation:
>>> 
>>> uops_retired.any = uops_retired.normal + uops_retired.micro +
>>> 2*uops_retired.micro.
>>> 
>>> You are missing micro and normal.
>>> I thought you could may be do it by comparing what goes in with what comes
>>> out,
>>> but that would only give you the wasted uops (on the wrong speculative
>>> path).
>>> 
>>> I will on that some more.
>>> There are some information of interest in the Intel Optimization guide
>>> (Appendix B2).
>>> 
>>> http://www.intel.com/Assets/PDF/manual/248966.pdf
>> 
>> We ended up using a pragmatic approach in order to obtaini
>> sensible retired uop counts on Core i7, which are comparable
>> to Core2 retired uop counts.
>> 
>> As stated above, the problem is that micro-fused uops are counted as 2 in
>> the
>> UOPS_RETIRED.ANY event (as opposed to the Core 2 event), and that there
>> is no way to obtain the retired micro-fused uops count separately.
>> 
>> We ended up by estimating the amount of micro-fused uops by counting the
>> number of issues uops, using UOPS_ISSUED.ANY and UOPS_ISSUED.FUSED.
>> We figured out that the UOPS_ISSUED.FUSED only counts micro-fused uops
>> as one.
>> 
>> That way, we can estimate the amount of micro-ops in UOPS_RETIRED.ANY
>> by using the UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY ratio, and thus
>> correct the UOPS_RETIRED.ANY accordingly.
>> 
>> In short, we can use the following as an estimate for retired uops where
>> each type of uop (non-fused, macro-fused, micro-fused) is counted as one:
>> 
>> UOPS_RETIRED.ANY * (1 - UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY)
>> 
>> 
>> I hope that makes sense....
>> 
>> greetings,
>> 
>> Kenneth
>> 
>> 
>> 
>> 
>> 


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to