Carl -
Based on your description below it sounds like the trace buffer *does* make
the counters wider, but at a cost. You reduce the interrupt frequency by a
factor of 10^3 (or 2^10) and pay the price by summing the 1024 values from
the trace into a 64-bit virtual counter. 1024 adds is probably a lot more
efficient than 1024 interrupts. Consider adding 1023 '1's. The result is
exactly 10 bits wide. Consider adding 1023 '65535's. The result is exactly
26 bits wide. 10 extra bits of dynamic range. And 10^3 fewer interrupts.
You're right that sampling would still be restricted to the actual size of
the physical counter, but that's the same restriction as before. Seems to me
this could make virtualization of 16 bit counters *less* expensive.
I'm probably missing other hardware details that make this approach
impractical, but on the surface it could work.

BTW, glad to hear about the debugger stuff.

- dan

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:perfmon-
> [EMAIL PROTECTED] On Behalf Of Carl Love
> Sent: Tuesday, March 27, 2007 4:56 PM
> To: Dan Terpstra; [EMAIL PROTECTED]
> Subject: RE: [perfmon] Cell port for Perfmon
> 
> Dan:
> 
> The trace buffer itself doesn't allow you to make the counters wider.
> It gives you a place to store the counts periodically.  This is used to
> create a histogram of how the counts accumulated.  As you said, you
> could then sum the stored values from the trace buffer in a 64 bit
> variable to present a count that was more then 32 or 16 bits.  I suspect
> the overhead of trying to do this would make the virtual 64 bit counter
> support expensive.  Also, it would not be possible to generate
> interrupts when the counters were full.  This is needed for sampling. I
> haven't thought completely through this but I think trying to do virtual
> counters with the trace buffer would not be real clean or efficient.
> 
> The debugger they talk about is a hardware level debug facility.
> Basically the debug bus which is used to route the performance counter
> signals from the islands to the performance counters can also be used to
> route internal hardware signals to a debug port where you then connect
> up a hardware debugger (logic analyzer).  This debugger is independent
> of your software debuggers such as GDB.  You can either enable the
> performance counters or the hardware debug at a given time, but not
> both.
> 
>              Carl Love
> 
> 
> On Tue, 2007-03-27 at 15:36 -0400, Dan Terpstra wrote:
> > Sorry for being so late to this conversation, and for being naïve about
> Cell
> > implementation. My reading of the May 2006 BE Handbook suggested that
> > counter values are (or can be) automatically stored to the 1024 entry
> trace
> > array on interval timer timeout; and that an interrupt can be generated
> on
> > trace array full. Could this feature be used to increase the effective
> width
> > of the counters by 10 (2^10 = 1024) bits? This could reduce interrupt
> > handling significantly, but would require summing the values across the
> > trace array.
> > Also, there are repeated warnings that the counter logic and the debug
> logic
> > share the same hardware. Does this imply that the debugger dies if the
> > counters are in use? Or that the debugger stomps on Perfmon? Will it be
> > possible to use debuggers and Perfmon simultaneously?
> > Curiosity killed the cat...
> > - d
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:perfmon-
> > > [EMAIL PROTECTED] On Behalf Of Carl Love
> > > Sent: Monday, March 26, 2007 8:33 PM
> > > To: [EMAIL PROTECTED]
> > > Cc: [EMAIL PROTECTED]; William Cohen; Carl Love; Kevin Corry;
> > > Philip Mucci
> > > Subject: Re: [perfmon] Cell port for Perfmon
> > >
> > > Stephen:
> > >
> > > Right, sorry I put the wrong variable in my message.  I think the key
> > > thing is that the register on Cell to get the mask of overflowed is
> read
> > > and stored in set->povfl_pmds.  The hardware automatically clears the
> > > bits in the CELL register as a side effect of reading the register.
> > > Then the povfl_pmds is used in the overflow handler to process all of
> > > the registers.
> > >
> > > If we had to read the cell register to get the pmd overflow each time
> in
> > > a loop to see if register i had overflowed, we would have a problem in
> > > that the overflow bits would have been cleared when the first pmd
> > > register was processed.  So I think the architecture of the code will
> > > work given the underlying hardware design.  On cell, we will not have
> to
> > > do anything to clear the interrupt mask.
> > >
> > >          Carl Love
> > >
> > >
> > > On Mon, 2007-03-26 at 15:19 -0800, Stephane Eranian wrote:
> > > > Carl,
> > > >
> > > > On Mon, Mar 26, 2007 at 04:05:15PM -0800, Carl Love wrote:
> > > > > If I read the overflow code correctly, the mask of the registers
> that
> > > > > overflowed is stored in set->reset_pmds before the overflow hander
> is
> > > > > called.  Then the overflow handler does all of the registers in a
> > > loop.
> > > > > It then determines if there were any 64 bit counter overflows or
> if
> > > the
> > > > > overflow was simple an overflow of the smaller HW counter
> register.
> > > > > >From what I see so far, it seems like the Cell interrupt
> > > enable/overflow
> > > > > reporting should work ok within the perfmon2 code structure.
> > > > >
> > > > Not quite. Upon entering the interrupt handler, the PMU is frozen
> > > > and a bitmask of overflowed counters is constructed in arch-specific
> > > > fashion. On IA-64 (like CEll), it's just a matter of reading a
> control
> > > > registers. On i386, there is no overflowed mask, you need to inspect
> > > > all used counter and check their values. The collected information
> is
> > > > in set->povfl_pmds and set->npend_ovfls. Worst processor is P4
> because
> > > > to freeze the PMU, you need to clear the control register which also
> > > hold
> > > > the overflow bit (OVF).
> > > >
> > > > The interrupt handler then scans povfl_pmds to update the 64-bit
> > > > sotware maintained counter values. If it detects a 64-bit overflow
> > > > then it does record a sample and/or notify user level. Otherwise
> > > > execution resumes.
> > > >
> > > > Upon leaving the interrupt handler, the PMU is unfrozen unless
> > > > the sampling buffer became full in which case (default format)
> > > > monitoring remains stopped.
> > > >
> > > > > On Mon, 2007-03-26 at 17:45 -0500, Kevin Corry wrote:
> > > > > > Hi Stephane,
> > > > > >
> > > > > > On Mon March 26 2007 5:30 pm, Stephane Eranian wrote:
> > > > > > > > > I think it should not be too much work to put the field
> with
> > > in the
> > > > > > > > > description table. With a flag, high level perfmon can
> just
> > > skip
> > > > > > > > > consulting this field and go with a default.
> > > > > > > >
> > > > > > > > Yeah, I had similar thoughts about how to support multiple
> > > counter sizes.
> > > > > > > > It should be relatively easy to add a counter_size field to
> the
> > > pfm_pmd
> > > > > > > > structure and consult that in the overflow handling code.
> > > > > > >
> > > > > > > Yes, that is one place where the mask is used. But it is also
> used
> > > > > > > when we write and read PMD registers (counters). I don't know
> how
> > > this
> > > > > > > works on Cell, but on x86, you needs to set the upper bits of
> a
> > > counter
> > > > > > > for it to trigger the PMU interrupt on overflow. For that you
> also
> > > need
> > > > > > > to apply the counter width mask. The mask may also be used to
> > > determine
> > > > > > > which counter overflowed, unless Cell provides a bitmask for
> that
> > > already.
> > > > > >
> > > > > > Interesting, I didn't realize that. I had only worked with the
> > > Pentium4
> > > > > > previously, and it has the counter-overflow bit and the
> interrupt-
> > > enable bit
> > > > > > in the per-counter control reigsters (CCCRs).
> > > > > >
> > > > > > On Cell, there is one global pm_status control register that is
> used
> > > to enable
> > > > > > interrupts for each counter and to determine which counters
> > > overflowed (along
> > > > > > with some status bits related to the hardware sampling feature).
> > > > > >
> > > > > > Hmmm....now that I take another glance at the Cell PMU docs, I
> see
> > > that
> > > > > > reading the pm_status register clears all the status bits and
> resets
> > > the
> > > > > > pending interrupts. This means that the overflow handler may
> have to
> > > handle
> > > > > > the overflow of multiple counters in one run (in addition to
> dealing
> > > with
> > > > > > hardware-sampling interrupts). I haven't gone through Perfmon2's
> > > overflow
> > > > > > interrupt handling enough to know if this will cause any
> problems.
> > > Any
> > > > > > thoughts?
> > > > > >
> > > > > > Thanks,
> > > > >
> > > > > _______________________________________________
> > > > > perfmon mailing list
> > > > > [email protected]
> > > > > http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
> > > >
> > >
> > > _______________________________________________
> > > perfmon mailing list
> > > [email protected]
> > > http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
> >
> >
> > _______________________________________________
> > perfmon mailing list
> > [email protected]
> > http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/
> 
> _______________________________________________
> perfmon mailing list
> [email protected]
> http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/


_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to