> Counting requires interrupt to virtualize the counters to 64-bit. They are
> 48
> bits if I recall.
>
Yes, they're 48-bit, but why do you need interrupt to virtualize? With a
multi-tasking OS, you just do it at context switch. I still think it will be
important to count in both domains at once...
- d

> -----Original Message-----
> From: stephane eranian [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, December 02, 2008 12:54 PM
> To: Dan Terpstra
> Cc: perfmon2-devel
> Subject: Re: [perfmon2] Intel Core i7 specs available.
> 
> Dan,
> 
> On Tue, Dec 2, 2008 at 5:10 PM, Dan Terpstra <[EMAIL PROTECTED]>
> wrote:
> >
> > Are you referring to the Uncore Address/Opcode Match stuff (18.17.2.3)?
> I
> > saw that, but wasn't quite sure how to use it. I didn't see anything in
> the
> > PEBS stuff that looked like Data EAR. Or is this part of the Load
> Latency
> > stuff that's described in (18.17.1.2)? Looks like part of the latency
> stuff
> > includes a Data Address.
> >>
> 
> No, I was indeed referring to offcore (which is different from uncore).
> Yes, that's the load latency PEBS I was talking about. I does give
> you the cache miss information similar to Itanium D-EAR, you get
> instr and data addresses, latency, source of the data in addition
> to the machine state which is quite nice.
> 
> >> You missed one thing, however, the offcore_response feature. That one
> >> is tricky because
> >> it uses a register that is shared per core (if I recall).
> >> Perfmon handles offcore_response similaryl to what is going on with
> >> AMD northbridge event.
> >> It enforces some form of mutual exclusion.
> >>
> > Yes, the off-core response stuff can be coded into any of the generic
> > registers on any core, but it shares a single common configuration
> register.
> > Exclusion logic for this guy could be fun. It looks like this takes the
> > place of the SELF/ANY modifiers used in earlier Core architectures for
> > events that probed shared cache?
> >>
> Well, yes this is tricky. The current code does the following:
>    - only one system-wide session per physical core (each physical
> core has 2 threads)
>    - only one per-thread session across the entire system (otherwise
> you have problems
>      in case of migration).
> 
> > Who owns the system-wide session? First-come first-served? Can it be any
> > thread or must it be a specific core? And if you restrict access to
> counting
> > (calipers), couldn't you do per-thread access without worrying about
> > overflow?
> >>
> For uncore, the first system-wide session which asks for it, gets it.
> It can be coming from any core/threads on the socket.
> 
> >> > I'm not sure that uncore counters should be restricted to system-wide
> >> > counting only; I think it could be quite useful, as Phil described
> for
> >> > SiCortex, to measure "what's happening to this shared resource while
> I'm
> >> > active". That's not unlike Component PAPI measuring network activity
> on
> >> a
> >>
> Counting requires interrupt to virtualize the counters to 64-bit. They are
> 48
> bits if I recall.
> 
> 
> >> No, this is not yet supported. I think on x86, this is not that far
> off.
> >>
> > Could you do first-person monitoring in a parent thread and spawn a
> daughter
> > thread to measure uncore stuff? Or maybe even fork a new process?
> >>
> No, this is currently restricted to system-wide sessions only.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to