On Sun March 25 2007 11:17 am, William Cohen wrote: > Kevin Corry wrote: > > 1) Perfmon seems to have an implicit assumption that a PMU's counters are > > a fixed width. Specifically, the "pfm_pmu_config" structure has > > a "counter_width" field that applies to the whole PMU. However, Cell > > provides four 32-bit counters, and each of those can independently be > > configured as two 16-bit counters. So I'm curious if it will be possible > > to support this capability within Perfmon, especially regarding the > > 64-bit counter virtualization. Do you know of any other platforms that > > have variable-width counters? > > One of the problems that OProfile has is some people setting the interval > between samples to be too small causing too much overhead because the > interrupt routine called each time the counter overflows. Perfmon > accumulates the overflow of the performance counter. Having 16-bit counters > would imply an interrupt every 65536 events for that counter. Is 16-bit > counters going to present too much overhead?
It certainly could. That's definitely one of my concerns with making the 16-bit counters available and allowing Perfmon to do 64-bit virtualization with them. > > 2) Each Cell processor includes SMT support (symmetric multi-threading, > > basically the same as Intel hyperthreading), but only has one PMU. > > Unfortunately, the PMU can't be shared between two threads running > > simultaneously on the same CPU (unlike Intel Pentium4, which splits up > > the counters between the two threads when HT is enabled). Are there any > > mechanisms yet to prevent scheduling two threads that both want to use > > the PMU on the same physical CPU? This problem will also exist on > > POWER4/5/6. > > The Pentium 4 doesn't split the counters between logical processor. The > OProfile driver splits them to make the configuration easier to do. Ok. I had thought that the split was part of the PMU design, but maybe that's just how Perfmon has decided to implement it. > Could a > similar trick be done in the Cell driver, would the registers span logical > processors? Or does a single configuration register span multiple counters? Each of the eight counters have a control register, but there are also a number of "global" control registers that are "shared" by all the counters. I'll have to go through each one and think about whether it could be "shared" between two different processes on the two logical CPUs, but my initial impression is that it won't work very well. At the very least, the hardware sampling feature (much like PEBS on P4) would definitely only be useable by one of the logical CPUs, since there's only one trace-buffer and one interval-timer. > Some of the events that the P4 has cannot be attributed to a particular > logical processor. Thus for things like floating point operations the > counter cannot attribute the event to a particular logical processor. This > makes it very difficult to counter on a per process basis. Doe the Cell > processor have a similar design "feature"? The events on Cell are grouped together by logical units within the full processor. The events for the PPU are almost all specific to one logical CPU. But in all the other groups (PPU Storage Subsystem, the SPUs, the Memory Flow Controlers, the Element Interface Bus, etc, etc...) don't seem to distinguish between logical CPUs. For instance, the Cell PMU would not be able to know which logical CPU is associated with an event that occurred on one of the eight SPUs. So, yes, we have similar problems on Cell as we do on P4. -- Kevin Corry [EMAIL PROTECTED] http://www.ibm.com/linux/ _______________________________________________ perfmon mailing list [email protected] http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
