Philip J. Mucci wrote:
Hi Will,

Good show on the list of tests. When I spent a week with Stefane about a
month back, we rooted out 4 or 5 bugs ust from getting a full port of
PAPI running.
In regards to FP_OPS, I'm afraid to say that these are very poorly
classified on x86 type processors (except the recent architected PMU
models)...This also goes for x86_64 variants...depending on whether one
uses SSE or x87, packed or unpacked, single or double, changes the
counts greatly. On AMD64, there's no exact way to measure fpops, as
various combinations of mov are also counted in the FP pipe.
>
We've had this problem for years in PAPI, my recommendation is to make a
small code module for each processor that designates a particular PFM
event and a piece of inlined code that can generate that event.
Hope this saves you the years of headaches it's given us.

Phil

Phil, thanks for the insights.

What is exactly counted by the floating point hardware and what the compiler generates can vary quite a bit. This can present a problem if specific counts are expected. The situation could be improved some by controlling the option the compiler used to generate the test code to make sure that regular x86 floating poin is generated by the compiler, but I can see this still being a problem.

Because of the variation in the performance monitoring events on performance monitoring hardware the oprofile tests (checked into oprofile's oprofile-tests module) just pick some a small set events and make sure that oprofile was collecting samples. The events were picked to be something that would be triggered on the machine. The oprofile tests did not attempt to check every possible event available on the processor. The oprofile tests look up the events to test based on the processor, but they didn't run a particular program to exercise the events. Certainly a small code module and selected event for each processor is a possibility and have some boiler plate that is common to the all the tests and processors for perfmon testing.

It looks like have the following things influence the testing: processor implementation, hardware configuration, and compiler

Processor implementation will affect what events are available and what tests should be run to exercise them.

There are some events that monitor the external bus on a machine. On a single core machine the bus traffic may only be from that processor. Some of the event masks can allow the processor's performance hardware to count only other bus traffic. Also events like interrupts may not be triggered by the hardware or routed to another processor.

For perfmon patches in the linux kernel it would be reasonable to assume that gcc is available on the machines. Even with GCC there are different versions and configurations that may or may not use the sse or other flavors of floating point.

-Will


_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to