Ken, Sorry for late reply.
Thanks for this valuable information. This looks like a decent approach. However it seems you may factor in uops issued on the wrong speculative path. I am guessing that the impact of this depends on the workload. On Wed, Nov 4, 2009 at 6:13 PM, Kenneth Hoste <kenneth.ho...@ugent.be> wrote: > Hi Stephane, > > On Oct 23, 2009, at 2:18 PM, stephane eranian wrote: > >> Ken, >> >> On Fri, Oct 23, 2009 at 1:41 PM, Kenneth Hoste <kenneth.ho...@ugent.be> >> wrote: >>> >>> The Intel documentation for Core i7 suggests that this is not the case >>> on Nehalem, i.e. uops fused through micro-fusion are counted as 2 uops >>> (while macro-fused uops are counted as 1), like you mentioned above. >>> >>> We missed this point somehow in your first reply, sorry about that. >>> >>> Thus, in order to obtain sensible numbers, it does indeed seem that >>> we need to figure out the number of micro-fused uops in Core i7, and >>> subtract that from the UOPS_RETIRED.ANY count we have now... >>> >>> It seems like there is no event for directly counting retied micro-fused >>> uops >>> however (while there is for UOPS_RETIRED.MACRO_FUSED). >>> >> >>> Are we missing something? Should we be able to get counts for retired >>> micro-fused uops? If so, which event are you referring to? If not, any >>> ideas >>> on how we can obtain uops_retired counts on Core i7 where micro-fused >>> uops are counted as 1? >>> >> It does not seem possible because you have 2 unknowns in the equation: >> >> uops_retired.any = uops_retired.normal + uops_retired.micro + >> 2*uops_retired.micro. >> >> You are missing micro and normal. >> I thought you could may be do it by comparing what goes in with what comes >> out, >> but that would only give you the wasted uops (on the wrong speculative >> path). >> >> I will on that some more. >> There are some information of interest in the Intel Optimization guide >> (Appendix B2). >> >> http://www.intel.com/Assets/PDF/manual/248966.pdf > > We ended up using a pragmatic approach in order to obtaini > sensible retired uop counts on Core i7, which are comparable > to Core2 retired uop counts. > > As stated above, the problem is that micro-fused uops are counted as 2 in > the > UOPS_RETIRED.ANY event (as opposed to the Core 2 event), and that there > is no way to obtain the retired micro-fused uops count separately. > > We ended up by estimating the amount of micro-fused uops by counting the > number of issues uops, using UOPS_ISSUED.ANY and UOPS_ISSUED.FUSED. > We figured out that the UOPS_ISSUED.FUSED only counts micro-fused uops > as one. > > That way, we can estimate the amount of micro-ops in UOPS_RETIRED.ANY > by using the UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY ratio, and thus > correct the UOPS_RETIRED.ANY accordingly. > > In short, we can use the following as an estimate for retired uops where > each type of uop (non-fused, macro-fused, micro-fused) is counted as one: > > UOPS_RETIRED.ANY * (1 - UOPS_ISSUED.FUSED / UOPS_ISSUED.ANY) > > > I hope that makes sense.... > > greetings, > > Kenneth > > > > > ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel