Kevin Corry wrote:

1) Perfmon seems to have an implicit assumption that a PMU's counters are a fixed width. Specifically, the "pfm_pmu_config" structure has a "counter_width" field that applies to the whole PMU. However, Cell provides four 32-bit counters, and each of those can independently be configured as two 16-bit counters. So I'm curious if it will be possible to support this capability within Perfmon, especially regarding the 64-bit counter virtualization. Do you know of any other platforms that have variable-width counters?

One of the problems that OProfile has is some people setting the interval between samples to be too small causing too much overhead because the interrupt routine called each time the counter overflows. Perfmon accumulates the overflow of the performance counter. Having 16-bit counters would imply an interrupt every 65536 events for that counter. Is 16-bit counters going to present too much overhead?

I can see why one might want to use the 16-bit counters if it doubles the number of events available. However, going from 4 32-bit counters to 8 16-bit counters is going to possibly cause a lot more interrupts. Double the number of interrupt sources and each interrupt at 2^16 times original the frequency.

2) Each Cell processor includes SMT support (symmetric multi-threading, basically the same as Intel hyperthreading), but only has one PMU. Unfortunately, the PMU can't be shared between two threads running simultaneously on the same CPU (unlike Intel Pentium4, which splits up the counters between the two threads when HT is enabled). Are there any mechanisms yet to prevent scheduling two threads that both want to use the PMU on the same physical CPU? This problem will also exist on POWER4/5/6.

The Pentium 4 doesn't split the counters between logical processor. The OProfile driver splits them to make the configuration easier to do. Could a similar trick be done in the Cell driver, would the registers span logical processors? Or does a single configuration register span multiple counters? Some of the events that the P4 has cannot be attributed to a particular logical processor. Thus for things like floating point operations the counter cannot attribute the event to a particular logical processor. This makes it very difficult to counter on a per process basis. Doe the Cell processor have a similar design "feature"?

-Will
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to