Kevin,

On Thu, Aug 24, 2006 at 01:23:47PM -0500, Kevin Corry wrote:
> On Thu August 24 2006 10:44 am, Stephane Eranian wrote:
> > On Thu, Aug 24, 2006 at 10:10:17AM -0500, Kevin Corry wrote:
> > > I've definitely been able to count things on P4 with pfmon.
> >
> > That's excellent. No offense for the comment, I think I was still under
> > the impression it was very preliminary (from your own comments)
> > and that it was not counting yet.
> >
> > Obviously, I was wrong, my apologies.
> 
> Not a problem. You're right that I did mention that the support is not 
> complete - there are features of the P4 PMU that I still want to add support 
> for: event filtering, event tagging, and event cascading. But the code that's 
> there now is useable for basic event counting.
> 
Ok, so you are saying that basic counting (and sampling) should work fine then.
That is already quite an accomlishment given the complexity of the P4.

> > $ pfmon -iinstr_completed
> > Name     : instr_completed
> > Code     : 0x7
> > Counters : [ 6 7 8 15 16 17 ]
> > Desc     : Instructions that have completed and retired during a clock
> > cycle Umask    : 0x01 : [NBOGUS] : Non-bogus instructions.
> > Umask    : 0x02 : [BOGUS] : Bogus instructions.
> >
> > Does anyone get something meaningful out of this one?
> 
> I've noticed this as well, actually.
> 
> The IA32 Developers Manual, Appendix A lists all the events and related info. 
> It has this to say about instr_retired and instr_completed:
> 
> instr_retired: This event counts instructions that are retired during a clock 
> cycle. Mask bits specify bogus or non-bogus (and whether they are tagged 
> using the front-end tagging mechanism).
> 
> instr_completed: This event counts instructions that have completed and 
> retired during a clock cycle. Mask bits specify whether the instruction is 
> bogus or non-bogus. This metric differs from instr_retired, since it counts 
> instructions completed, rather than the number of times that instructions 
> started.
> 
I am wondering if instr_completed does not count instruction that execute and 
retired
in 1 cycle as opposed to instructions that take more. Yet, I doubt we don't 
have any
of those in a program such as date for instance. We need to verify that 
instr_comp;leted
does not use a counter (CTR,ESCR,CCR) that is not used by the events that do 
seem to
return valid data. There maybe some bugs in the kernel in this case.

> On the surface, it sounds like these should provide similar counts. But 
> here's 
> the output I get for my simple test:
> 
> [EMAIL PROTECTED] /home/corry]$ pfmon -u -k -e instr_retired:NBOGUSNTAG \
> dd if=/dev/sda of=/dev/null bs=1M count=100
> 100+0 records in
> 100+0 records out
> 41078793 instr_retired
> 
> [EMAIL PROTECTED] /home/corry]$ pfmon -u -k -e instr_completed:NBOGUS \
> dd if=/dev/sda of=/dev/null bs=1M count=100
> 100+0 records in
> 100+0 records out
> 0 instr_completed
> 
> I don't really know how to explain this. The table in 
> libpfm/lib/pentium4_events.h looks like it has the correct data for 
> instr_complete. When I get some time I'll run the above example through a 
> debugger and make sure the correct values are getting passed to the correct 
> PMC registers in the kernel.
> 
> > > [EMAIL PROTECTED] /home/corry]$ pfmon -i global_power_events
> > > Name   : global_power_events
> > > Code   : 0x13
> > > counter: [ 0 1 9 10 ]
> > > Unit-mask 0: RUNNING
> > > Desc   : Counts the time during which a processor is not stopped.
> >
> > In the rewritten AMD K8 event table, I have use the rule that if an event
> > only has one unit mask, then we implement it without a unit mask, i.e., it
> > is collapsed with the event name. It makes it much easier to use. On AMD
> > K8, I have encoded the umask value starting at bit position 8. On P4, you
> > probably need to use another trick, though.
> 
> Yeah, I had thought about that recently. I'll look into fixing this up. And 
> since I don't think any of the P4 events actually count anything unless a at 
> least one mask is specified, it might not be a bad idea to designate a 
> "default" mask if none is provided. What do you think?

The latest release of libpfm does not let you go through pfm_dispatch_events()
if you are missing a unit mask, when the event has at least one. You are right
that an alternative would be to pick one, but then how would the tool know which
one was chosen given our interface.

-- 

-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to