Hi folks,
I think it should not be too much work to put the field with in the
description table. With a flag, high level perfmon can just skip
consulting this field and go with a default. I think having both 16
and 32 bit counters would be useful on the cell, the 16 specifically
because one can allocate one counter per VPE. This functionality is
necessary for more than the cell, i.e. when supporting off-chip
counters that are different than those of the core.
Of course, we should certainly encourage folks to support at least 32
bits per counter...
With perfmon2, I believe I saw a mode in the code that did split the
counters between logical processor if requested...this should
probably be a module load time option.
Phil
On Mar 25, 2007, at 6:17 PM, William Cohen wrote:
Kevin Corry wrote:
1) Perfmon seems to have an implicit assumption that a PMU's
counters are a fixed width. Specifically, the "pfm_pmu_config"
structure has a "counter_width" field that applies to the whole
PMU. However, Cell provides four 32-bit counters, and each of
those can independently be configured as two 16-bit counters. So
I'm curious if it will be possible to support this capability
within Perfmon, especially regarding the 64-bit counter
virtualization. Do you know of any other platforms that have
variable-width counters?
One of the problems that OProfile has is some people setting the
interval between samples to be too small causing too much overhead
because the interrupt routine called each time the counter
overflows. Perfmon accumulates the overflow of the performance
counter. Having 16-bit counters would imply an interrupt every
65536 events for that counter. Is 16-bit counters going to present
too much overhead?
I can see why one might want to use the 16-bit counters if it
doubles the number of events available. However, going from 4 32-
bit counters to 8 16-bit counters is going to possibly cause a lot
more interrupts. Double the number of interrupt sources and each
interrupt at 2^16 times original the frequency.
2) Each Cell processor includes SMT support (symmetric multi-
threading, basically the same as Intel hyperthreading), but only
has one PMU. Unfortunately, the PMU can't be shared between two
threads running simultaneously on the same CPU (unlike Intel
Pentium4, which splits up the counters between the two threads
when HT is enabled). Are there any mechanisms yet to prevent
scheduling two threads that both want to use the PMU on the same
physical CPU? This problem will also exist on POWER4/5/6.
The Pentium 4 doesn't split the counters between logical processor.
The OProfile driver splits them to make the configuration easier to
do. Could a similar trick be done in the Cell driver, would the
registers span logical processors? Or does a single configuration
register span multiple counters? Some of the events that the P4
has cannot be attributed to a particular logical processor. Thus
for things like floating point operations the counter cannot
attribute the event to a particular logical processor. This makes
it very difficult to counter on a per process basis. Doe the Cell
processor have a similar design "feature"?
-Will
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/