Great! Thanks for the confirmation.
On Tue, Nov 3, 2009 at 10:22 AM, stephane eranian <eran...@googlemail.com>wrote:
> On Tue, Nov 3, 2009 at 4:07 PM, Carole Wu <cwu...@gmail.com> wrote:
> > Would it make sense to use --ld-lat-threshold=50 to collect the "true"
> (more
> > possible) L3 misses?
> >
> Yes, it would capture the misses, which most likely, could not be
> overlapped
> with execution, i..e, the low hanging-fruits.
>
> >
> > --Carole
> >
> > On Tue, Nov 3, 2009 at 3:51 AM, stephane eranian <eran...@googlemail.com
> >
> > wrote:
> >>
> >> On Tue, Nov 3, 2009 at 4:10 AM, Carole Wu <cwu...@gmail.com> wrote:
> >> > Hello,
> >> > In my PEBS-LL trace, collected with
> >> > pfmon --smpl-module=pebs-ll -e
> >> > MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD --ld-lat-threshold=4
> >> > --long-smpl-periods=2000 --smpl-compact --with-header
> >> > I am seeing samples with latency 0x4 and the data source information
> >> > field
> >> > says it is a L3 miss.
> >> > However, the latency for L1 I&D cache is about 4 cycles, for L2 cache
> is
> >> > about 11 cycles, for L3 cache is about 40 cycles, and >100 cycles
> going
> >> > offchip for the Nehalem machine.
> >> > Is this because the latency in the trace is "scaled" somehow?
> >>
> >> I am seen similar samples myself, but they should represent a tiny
> >> fraction of
> >> all samples. Not clear where this is coming from, either a bug or some
> >> corner case
> >> situation. I would ignore those if possible.
> >>
> >> > Thanks in advance for your help.
> >> > Carole
> >> >
> >> >
> >> > On Wed, Oct 28, 2009 at 3:36 AM, stephane eranian
> >> > <eran...@googlemail.com>
> >> > wrote:
> >> >>
> >> >> Hi,
> >> >>
> >> >> I am happy to report that I have now uploaded all the code necessary
> to
> >> >> use
> >> >> PEBS on Intel Core, Atom, and Nehalem. That includes PEBS-LL on
> Nehalem
> >> >> which is used to sample where cache misses occur.
> >> >>
> >> >> What you need:
> >> >> - latest libpfm sources from CVS
> >> >>
> >> >> - latest pfmon sources from CVS
> >> >>
> >> >> - perfmon2 2.6.30 from GIT
> >> >>
> >> >> git clone
> >> >> git://git.kernel.org/pub/scm/linux/kernel/git/eranian/linux-2.6.git
> >> >> Make sure you enabled 'Unified PEBS'
> >> >>
> >> >>
> >> >> This kernel includes a unified PEBS sampling format which supports
> >> >> Netburst,
> >> >> Core, Atom, and Nehalem. You must insert the module perfmon_pebs_smpl
> >> >> (or compile in the code).
> >> >>
> >> >> Next, to use PEBS, you can simply do:
> >> >>
> >> >> pfmon --smpl-module=pebs --smpl-compact --with-header
> >> >> -einst_retired:any_p \
> >> >> --long-smpl-period=2400000 ...
> >> >>
> >> >> Not all events support PEBS. In --smpl-compact mode, each line
> >> >> contains a PEBS
> >> >> sample.
> >> >>
> >> >> To collect cache misses on Nehalem, you can do:
> >> >>
> >> >> pfmon --smpl-module=pebs-ll -e
> MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
> >> >> \
> >> >> --ld-lat-threshold=4 --long-smpl-periods=2000 --smpl-compact
> >> >> --with-header ...
> >> >>
> >> >> You must use the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD to
> activate
> >> >> this
> >> >> HW feature.
> >> >>
> >> >> Each line contains a PEBS record, including the cache miss
> >> >> information. The ld-lat parameter
> >> >> is the minimal threshold for the miss latency. Only misses >=
> >> >> threshold are captured. It must
> >> >> be at least 4. 4 cycles is the L1D hit latency. For each captured
> >> >> miss, you get an instruction addr,
> >> >> data addr, miss latency, source of the data (where did it come from,
> >> >> refer to Intel documentation).
> >> >> It is important to understand that the instruction addr does NOT
> point
> >> >> to the load instruction but
> >> >> ALWAYS to the next dynamic instruction, i.e., the whole state is
> >> >> recorded at retirement of the load.
> >> >>
> >> >>
> >> >> On Mon, Oct 5, 2009 at 1:55 AM, Carole Wu <cwu...@gmail.com> wrote:
> >> >> > Hello,
> >> >> >
> >> >> > I'd like to collect information about my workload, running on
> >> >> > Nehalem,
> >> >> > using
> >> >> > PEBS, so I use the following command.
> >> >> >
> >> >> >>> pfmon -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
> >> >> >>> --ld-lat-threshold=1
> >> >> >>> --long-smpl-periods=2000 --short-smpl-periods=200 ./mcf_base
> inp.in
> >> >> >>> load latency threshold not yet supported
> >> >> > However, the response seems to suggest that my machine does not
> >> >> > currently
> >> >> > support PEBS? Is it true, or am I not setting parameters correctly?
> >> >> >
> >> >> > Any help is greatly appreciated.
> >> >> >
> >> >> > Carole
> >> >> >
> >> >> >
> >> >> >
> ------------------------------------------------------------------------------
> >> >> > Come build with us! The BlackBerry® Developer Conference in SF,
> >> >> > CA
> >> >> > is the only developer event you need to attend this year. Jumpstart
> >> >> > your
> >> >> > developing skills, take BlackBerry mobile applications to market
> and
> >> >> > stay
> >> >> > ahead of the curve. Join us from November 9-12, 2009. Register
> >> >> > now!
> >> >> > http://p.sf.net/sfu/devconf
> >> >> > _______________________________________________
> >> >> > perfmon2-devel mailing list
> >> >> > perfmon2-devel@lists.sourceforge.net
> >> >> > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> >> >> >
> >> >> >
> >> >
> >> >
> >
> >
>
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel