Will, Kevin,

On Sun, Mar 25, 2007 at 12:17:17PM -0400, William Cohen wrote:
> 
> >1) Perfmon seems to have an implicit assumption that a PMU's counters are 
> >a fixed width. Specifically, the "pfm_pmu_config" structure has 
> >a "counter_width" field that applies to the whole PMU. However, Cell 
> >provides four 32-bit counters, and each of those can independently be 
> >configured as two 16-bit counters. So I'm curious if it will be possible 
> >to support this capability within Perfmon, especially regarding the 64-bit 
> >counter virtualization. Do you know of any other platforms that have 
> >variable-width counters?
> 
>  One of the problems that OProfile has is some people setting the interval 
> between samples to be too small causing too much overhead because the 
> interrupt routine called each time the counter overflows. Perfmon 
> accumulates the overflow of the performance counter. Having 16-bit counters 
> would imply an interrupt every 65536 events for that counter. Is 16-bit 
> counters going to present too much overhead?
> 
16-bit for counters is very small, yet it depends on the clock speed and
the frequency of occurrrences of the measured events. I think 32-bit is more
reasonable. You have overhead when you are sampling but also when you are
counting because of 64-bit emulation. Note that when sampling, perfmon will
record a sample only when the 64-bit counter overflows, not when the 32-bit
or 16-bit counter overflow, thus you can have sampling period bigger than
the what the hardware can provide.

> I can see why one might want to use the 16-bit counters if it doubles the 
> number of events available. However, going from 4 32-bit counters to 8 
> 16-bit counters is going to possibly cause a lot more interrupts. Double 
> the number of interrupt sources and each interrupt at 2^16 times original 
> the frequency.

I would go with 4 32-bit counters. Yet I am wondering if they do not have
event restrictions which would make this difficult to hardcode. For instance,
if event A can only be measured on counter C which is only "visible" in the
8-counter configuration.

> 
> >2) Each Cell processor includes SMT support (symmetric multi-threading, 
> >basically the same as Intel hyperthreading), but only has one PMU. 
> >Unfortunately, the PMU can't be shared between two threads running 
> >simultaneously on the same CPU (unlike Intel Pentium4, which splits up the 
> >counters between the two threads when HT is enabled). Are there any 
> >mechanisms yet to prevent scheduling two threads that both want to use the 
> >PMU on the same physical CPU? This problem will also exist on POWER4/5/6.
> 
> The Pentium 4 doesn't split the counters between logical processor. The 
> OProfile driver splits them to make the configuration easier to do. Could a 
> similar trick be done in the Cell driver, would the registers span logical 
> processors? Or does a single configuration register span multiple counters? 
> Some of the events that the P4 has cannot be attributed to a particular 
> logical processor. Thus for things like floating point operations the 
> counter cannot attribute the event to a particular logical processor. This 
> makes it very difficult to counter on a per process basis. Doe the Cell 
> processor have a similar design "feature"?
> 
In perfmon, the 18 counters are split in half when HT is on. Applications
only see the first 9 counters. Under the cover, the counters are installed
on either the first half for thread 0 or second half for thread 1.

-- 
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to