Ken, Have you tried with INSTR_RETIRED instead of INSTR_COMPLETED?
On Wed, Jan 03, 2007 at 01:02:47PM +0100, Kenneth Hoste wrote: > > On 03 Jan 2007, at 12:05, Stephane Eranian wrote: > > >Did you try other events? Also PIN only counts user level > >instructions. > >did you make sure your measurements were setup to count only at > >user level? > > > > I haven't tried other events yet, because those are a lot harder to > check. I've set the counters to count only user-level instructions > (setting it to kernel yields a much higher count). > > I've been experimenting with this, and it seems the P4 counters just > aren't reporting the correct number of instructions. Some > instructions seem to be counted double, or even more. > > The highest difference for the SPEC CPU2000 benchmarks is observed > for mesa: > > PIN: 291,680,398,079 > perfex @ Pentium4: 298,575,250,612 > > That's a difference of 6,8 * 10^9 instructions, or 2.36% of the total > execution, which is _huge_ . My needs, as I've explained before, only > allow a difference of 100,000 instructions, so this is a real > showstopper for me. Clearly, perfmon is not to blame here, nor is > perfctr. Something is just wrong with the hardware implementation the > way I see it... > > As a note for future users of instr_retired on Intel Pentium 4 with > any tool (perfctr, perfex, PAPI, perfmon, ...): be very carefull with > the results you're getting, because it appears some instructions > cause multiple increases of the counters, which leads to misleading > results. The settings I'm using (with perfex) are: > > perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark> > > for instr_completed (which yields a zero count for me): > > perfex -e 0x00039000/[EMAIL PROTECTED] <benchmark> > > Any additional comments on this are welcome, but I won't be losing > anymore time over this. The counts on AMD machines are looking a lot > better, so I'll just go with AMD. > > >Also have you tried in system-wide mode, just to verify that there > >is nothing > >wrong with the PMU context switch code. > > > > Yep, that yields higher counts. Kernel-only yields the difference > between system-wide and user-level. > > >As for instr_completed, I have never been able to measure it > >correctly. > >There may be unpublished constraints on this event which libpfm > >does not > >know about. > > I've tried using perfex, using the correct settings, but I'm getting > a zero count every time I try (on two different machines). No idea > what's causing this. There are some undocumented issues for sure.... > > greetings, > > Kenneth > > -- > > Statistics are like a bikini. What they reveal is suggestive, but > what they conceal is vital (Aaron Levenstein) > > Kenneth Hoste > ELIS - Ghent University > [EMAIL PROTECTED] > http://www.elis.ugent.be/~kehoste > > > _______________________________________________ > perfmon mailing list > [email protected] > http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/ -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
