> Counting requires interrupt to virtualize the counters to 64-bit. They are > 48 > bits if I recall. > Yes, they're 48-bit, but why do you need interrupt to virtualize? With a multi-tasking OS, you just do it at context switch. I still think it will be important to count in both domains at once... - d
> -----Original Message----- > From: stephane eranian [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2008 12:54 PM > To: Dan Terpstra > Cc: perfmon2-devel > Subject: Re: [perfmon2] Intel Core i7 specs available. > > Dan, > > On Tue, Dec 2, 2008 at 5:10 PM, Dan Terpstra <[EMAIL PROTECTED]> > wrote: > > > > Are you referring to the Uncore Address/Opcode Match stuff (18.17.2.3)? > I > > saw that, but wasn't quite sure how to use it. I didn't see anything in > the > > PEBS stuff that looked like Data EAR. Or is this part of the Load > Latency > > stuff that's described in (18.17.1.2)? Looks like part of the latency > stuff > > includes a Data Address. > >> > > No, I was indeed referring to offcore (which is different from uncore). > Yes, that's the load latency PEBS I was talking about. I does give > you the cache miss information similar to Itanium D-EAR, you get > instr and data addresses, latency, source of the data in addition > to the machine state which is quite nice. > > >> You missed one thing, however, the offcore_response feature. That one > >> is tricky because > >> it uses a register that is shared per core (if I recall). > >> Perfmon handles offcore_response similaryl to what is going on with > >> AMD northbridge event. > >> It enforces some form of mutual exclusion. > >> > > Yes, the off-core response stuff can be coded into any of the generic > > registers on any core, but it shares a single common configuration > register. > > Exclusion logic for this guy could be fun. It looks like this takes the > > place of the SELF/ANY modifiers used in earlier Core architectures for > > events that probed shared cache? > >> > Well, yes this is tricky. The current code does the following: > - only one system-wide session per physical core (each physical > core has 2 threads) > - only one per-thread session across the entire system (otherwise > you have problems > in case of migration). > > > Who owns the system-wide session? First-come first-served? Can it be any > > thread or must it be a specific core? And if you restrict access to > counting > > (calipers), couldn't you do per-thread access without worrying about > > overflow? > >> > For uncore, the first system-wide session which asks for it, gets it. > It can be coming from any core/threads on the socket. > > >> > I'm not sure that uncore counters should be restricted to system-wide > >> > counting only; I think it could be quite useful, as Phil described > for > >> > SiCortex, to measure "what's happening to this shared resource while > I'm > >> > active". That's not unlike Component PAPI measuring network activity > on > >> a > >> > Counting requires interrupt to virtualize the counters to 64-bit. They are > 48 > bits if I recall. > > > >> No, this is not yet supported. I think on x86, this is not that far > off. > >> > > Could you do first-person monitoring in a parent thread and spawn a > daughter > > thread to measure uncore stuff? Or maybe even fork a new process? > >> > No, this is currently restricted to system-wide sessions only. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel