Will, Kevin, On Sun, Mar 25, 2007 at 12:17:17PM -0400, William Cohen wrote: > > >1) Perfmon seems to have an implicit assumption that a PMU's counters are > >a fixed width. Specifically, the "pfm_pmu_config" structure has > >a "counter_width" field that applies to the whole PMU. However, Cell > >provides four 32-bit counters, and each of those can independently be > >configured as two 16-bit counters. So I'm curious if it will be possible > >to support this capability within Perfmon, especially regarding the 64-bit > >counter virtualization. Do you know of any other platforms that have > >variable-width counters? > > One of the problems that OProfile has is some people setting the interval > between samples to be too small causing too much overhead because the > interrupt routine called each time the counter overflows. Perfmon > accumulates the overflow of the performance counter. Having 16-bit counters > would imply an interrupt every 65536 events for that counter. Is 16-bit > counters going to present too much overhead? > 16-bit for counters is very small, yet it depends on the clock speed and the frequency of occurrrences of the measured events. I think 32-bit is more reasonable. You have overhead when you are sampling but also when you are counting because of 64-bit emulation. Note that when sampling, perfmon will record a sample only when the 64-bit counter overflows, not when the 32-bit or 16-bit counter overflow, thus you can have sampling period bigger than the what the hardware can provide.
> I can see why one might want to use the 16-bit counters if it doubles the > number of events available. However, going from 4 32-bit counters to 8 > 16-bit counters is going to possibly cause a lot more interrupts. Double > the number of interrupt sources and each interrupt at 2^16 times original > the frequency. I would go with 4 32-bit counters. Yet I am wondering if they do not have event restrictions which would make this difficult to hardcode. For instance, if event A can only be measured on counter C which is only "visible" in the 8-counter configuration. > > >2) Each Cell processor includes SMT support (symmetric multi-threading, > >basically the same as Intel hyperthreading), but only has one PMU. > >Unfortunately, the PMU can't be shared between two threads running > >simultaneously on the same CPU (unlike Intel Pentium4, which splits up the > >counters between the two threads when HT is enabled). Are there any > >mechanisms yet to prevent scheduling two threads that both want to use the > >PMU on the same physical CPU? This problem will also exist on POWER4/5/6. > > The Pentium 4 doesn't split the counters between logical processor. The > OProfile driver splits them to make the configuration easier to do. Could a > similar trick be done in the Cell driver, would the registers span logical > processors? Or does a single configuration register span multiple counters? > Some of the events that the P4 has cannot be attributed to a particular > logical processor. Thus for things like floating point operations the > counter cannot attribute the event to a particular logical processor. This > makes it very difficult to counter on a per process basis. Doe the Cell > processor have a similar design "feature"? > In perfmon, the 18 counters are split in half when HT is on. Applications only see the first 9 counters. Under the cover, the counters are installed on either the first half for thread 0 or second half for thread 1. -- -Stephane _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
