Hi,

I realize that this is an old thread, but it seems quite relevant to me now... My apologies if I shouldn't be replying on old threads...

On 24 Aug 2006, at 21:57, Stephane Eranian wrote:

Kevin,

On Thu, Aug 24, 2006 at 01:23:47PM -0500, Kevin Corry wrote:
On Thu August 24 2006 10:44 am, Stephane Eranian wrote:
On Thu, Aug 24, 2006 at 10:10:17AM -0500, Kevin Corry wrote:
I've definitely been able to count things on P4 with pfmon.

That's excellent. No offense for the comment, I think I was still under
the impression it was very preliminary (from your own comments)
and that it was not counting yet.

Obviously, I was wrong, my apologies.

Not a problem. You're right that I did mention that the support is not complete - there are features of the P4 PMU that I still want to add support for: event filtering, event tagging, and event cascading. But the code that's
there now is useable for basic event counting.

Ok, so you are saying that basic counting (and sampling) should work fine then.
That is already quite an accomlishment given the complexity of the P4.

As you may have seen in another thread, counts on P4 don't seem right.
I've counted instructions for SPEC CPU2000 using PIN (a dynamic instrumentator) and compared them with papiex (latest version, together with latest perfctr patch) and an Fedora Core 4 Linux / Intel Pentium 4 machine (see attachment).

Attachment: SpecCPU2000_full_exec_uarch-indep_x86_gcc_4.1.1_O2_PIN_instrCounts.csv
Description: Binary data


For some benchmarks the counts are quite similar, but for others, they are way of. Mesa is a nice example of this: papiex counts almost 7*10^9 instructions more, or 2.5% more than PIN does ! The PIN counts have been validated using another instrumentator (DIOTA), and counts using perfex on an AMD Athlon machine. Even for mesa, counts didn't differ more than 300,000 instructions.

So my conclusion is that something must be going horribly wrong with theHPCs on the Pentium4. Maybe the instr_completed event (see below) is actually what we want, but trying that also yields a zero count with me, as it did below. Has anyone figured out yet why instr_completed isn't working? The documentation mentions it will only work on model 3 or model 4 Pentium 4s, but looking at my /proc/ cpuinfo tells me my machine is a model 3, so it _should_ work...

Any ideas or comments on this are highly appreciated.

greetings,

Kenneth


$ pfmon -iinstr_completed
Name     : instr_completed
Code     : 0x7
Counters : [ 6 7 8 15 16 17 ]
Desc : Instructions that have completed and retired during a clock
cycle Umask    : 0x01 : [NBOGUS] : Non-bogus instructions.
Umask    : 0x02 : [BOGUS] : Bogus instructions.

Does anyone get something meaningful out of this one?

I've noticed this as well, actually.

The IA32 Developers Manual, Appendix A lists all the events and related info.
It has this to say about instr_retired and instr_completed:

instr_retired: This event counts instructions that are retired during a clock cycle. Mask bits specify bogus or non-bogus (and whether they are tagged
using the front-end tagging mechanism).

instr_completed: This event counts instructions that have completed and retired during a clock cycle. Mask bits specify whether the instruction is bogus or non-bogus. This metric differs from instr_retired, since it counts instructions completed, rather than the number of times that instructions
started.

I am wondering if instr_completed does not count instruction that execute and retired in 1 cycle as opposed to instructions that take more. Yet, I doubt we don't have any of those in a program such as date for instance. We need to verify that instr_comp;leted does not use a counter (CTR,ESCR,CCR) that is not used by the events that do seem to
return valid data. There maybe some bugs in the kernel in this case.

On the surface, it sounds like these should provide similar counts. But here's
the output I get for my simple test:

[EMAIL PROTECTED] /home/corry]$ pfmon -u -k -e instr_retired:NBOGUSNTAG \
dd if=/dev/sda of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
41078793 instr_retired

[EMAIL PROTECTED] /home/corry]$ pfmon -u -k -e instr_completed:NBOGUS \
dd if=/dev/sda of=/dev/null bs=1M count=100
100+0 records in
100+0 records out
0 instr_completed

I don't really know how to explain this. The table in
libpfm/lib/pentium4_events.h looks like it has the correct data for
instr_complete. When I get some time I'll run the above example through a debugger and make sure the correct values are getting passed to the correct
PMC registers in the kernel.

<cut out a part here, irrelevant>
--

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

--

Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital (Aaron Levenstein)

Kenneth Hoste
ELIS - Ghent University
[EMAIL PROTECTED]
http://www.elis.ugent.be/~kehoste


_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to