Dan Terpstra wrote:
Carl -
Based on your description below it sounds like the trace buffer *does* make
the counters wider, but at a cost. You reduce the interrupt frequency by a
factor of 10^3 (or 2^10) and pay the price by summing the 1024 values from
the trace into a 64-bit virtual counter. 1024 adds is probably a lot more
efficient than 1024 interrupts. Consider adding 1023 '1's. The result is
exactly 10 bits wide. Consider adding 1023 '65535's. The result is exactly
26 bits wide. 10 extra bits of dynamic range. And 10^3 fewer interrupts.
You're right that sampling would still be restricted to the actual size of
the physical counter, but that's the same restriction as before. Seems to me
this could make virtualization of 16 bit counters *less* expensive.
I'm probably missing other hardware details that make this approach
impractical, but on the surface it could work.
BTW, glad to hear about the debugger stuff.
- dan
Wouldn't this make the operation of reading the performance counter more
expensive? Currently, perfmon2 has to paste together the accumulated values from
interrupts and the current counter value then check that the value for
interrupts hasn't rolled over because of the non-atomic operation. With the
trace buffer scheme the read would have to scan through the buffer. This could
still be less overhead than taking all those interrupts. The code would have to
be careful to make sure that the scanning of the trace buffer is faster than the
rate that the hardware can put elements in the buffer. Is there just one
buffered shared between all the counters? If so, the trace buffer scan will need
to determine which counter the event is for. What happens to the counter when
the trace buffer service interrupt is triggered, can it take more samples or
does the counter freeze. If it loses counts when the buffer is filled that
wouldn't be very useful.
-Will
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/