Ken,

Have you tried with INSTR_RETIRED instead of INSTR_COMPLETED?

On Wed, Jan 03, 2007 at 01:02:47PM +0100, Kenneth Hoste wrote:
> 
> On 03 Jan 2007, at 12:05, Stephane Eranian wrote:
> 
> >Did you try other events? Also PIN only counts user level  
> >instructions.
> >did you make sure your measurements were setup to count only at  
> >user level?
> >
> 
> I haven't tried other events yet, because those are a lot harder to  
> check. I've set the counters to count only user-level instructions  
> (setting it to kernel yields a much higher count).
> 
> I've been experimenting with this, and it seems the P4 counters just  
> aren't reporting the correct number of instructions. Some  
> instructions seem to be counted double, or even more.
> 
> The highest difference for the SPEC CPU2000 benchmarks is observed  
> for mesa:
> 
> PIN: 291,680,398,079
> perfex @ Pentium4: 298,575,250,612
> 
> That's a difference of 6,8 * 10^9 instructions, or 2.36% of the total  
> execution, which is _huge_ . My needs, as I've explained before, only  
> allow a difference of 100,000 instructions, so this is a real  
> showstopper for me. Clearly, perfmon is not to blame here, nor is  
> perfctr. Something is just wrong with the hardware implementation the  
> way I see it...
> 
> As a note for future users of instr_retired on Intel Pentium 4 with  
> any tool (perfctr, perfex, PAPI, perfmon, ...): be very carefull with  
> the results you're getting, because it appears some instructions  
> cause multiple increases of the counters, which leads to misleading  
> results. The settings I'm using (with perfex) are:
> 
> perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark>
> 
> for instr_completed (which yields a zero count for me):
> 
> perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark>
> 
> Any additional comments on this are welcome, but I won't be losing  
> anymore time over this. The counts on AMD machines are looking a lot  
> better, so I'll just go with AMD.
> 
> >Also have you tried in system-wide mode, just to verify that there  
> >is nothing
> >wrong with the PMU context switch code.
> >
> 
> Yep, that yields higher counts. Kernel-only yields the difference  
> between system-wide and user-level.
> 
> >As for instr_completed, I have never been able to measure it  
> >correctly.
> >There may be unpublished constraints on this event which libpfm  
> >does not
> >know about.
> 
> I've tried using perfex, using the correct settings, but I'm getting  
> a zero count every time I try (on two different machines). No idea  
> what's causing this. There are some undocumented issues for sure....
> 
> greetings,
> 
> Kenneth
> 
> -- 
> 
> Statistics are like a bikini. What they reveal is suggestive, but  
> what they conceal is vital (Aaron Levenstein)
> 
> Kenneth Hoste
> ELIS - Ghent University
> [EMAIL PROTECTED]
> http://www.elis.ugent.be/~kehoste
> 
> 
> _______________________________________________
> perfmon mailing list
> [email protected]
> http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

-- 

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to