Got it. Thanks very much for the explanation.

--Carole

On Sun, Nov 1, 2009 at 7:06 AM, stephane eranian <eran...@googlemail.com>wrote:

> On Sun, Nov 1, 2009 at 4:37 AM, Carole Wu <cwu...@gmail.com> wrote:
> > Hello,
> >
> > pfmon --smpl-module=pebs-ll -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD \
> >          --ld-lat-threshold=4 --long-smpl-periods=2000 --smpl-compact
> > --with-header ...
> >
> > This should generate a trace of sample, where each sample represents the
> > 2000th long latency (>4 cycles) memory operations. 2000 x (the number of
> > samples in the trace) should equal to the number of
> LAST_LEVEL_CACHE_MISSES
> > collected with perfmon2's "counting" capability, e.g. pfmon -e
> > LAST_LEVEL_CACHE_MISSES. Is this right?
>
> No.
>
> First of all this is not tracing but statistical sampling. You are
> going to loose some misses.
> There are shadowing effects. In order to complete a sample, you need
> the latency. Until
> it is collected, no other loads can be sampled even if they exceed the
> threshold, i.e., PEBS
> can only track one load at a time. When the PEBS buffer fills up,
> monitoring stop, but execution
> continues, you are missing some more loads there.
>
> Note that this discrepancy is not specific to PEBS, you get the same
> behavior with AMD IBS or
> Itanium D-EAR. But the idea is that if you run for long enough, you
> will eventually get enough
> representative samples to approximate a trace.
>
>
> > I am seeing mismatching numbers for counts collected with PEBS and the
> > counting feature in perfmon2 for my Nehalem machine.
> > Your help is appreciated. Thanks,
> > Carole
> > On Wed, Oct 28, 2009 at 4:36 AM, stephane eranian <
> eran...@googlemail.com>
> > wrote:
> >>
> >> Hi,
> >>
> >> I am happy to report that I have now uploaded all the code necessary to
> >> use
> >> PEBS on Intel Core, Atom, and Nehalem. That includes PEBS-LL on Nehalem
> >> which is used to sample where cache misses occur.
> >>
> >> What you need:
> >>    - latest libpfm sources from CVS
> >>
> >>    - latest pfmon sources from CVS
> >>
> >>    - perfmon2 2.6.30 from GIT
> >>
> >>      git clone
> >> git://git.kernel.org/pub/scm/linux/kernel/git/eranian/linux-2.6.git
> >>      Make sure you enabled 'Unified PEBS'
> >>
> >>
> >> This kernel includes a unified PEBS sampling format which supports
> >> Netburst,
> >> Core, Atom, and Nehalem. You must insert the module perfmon_pebs_smpl
> >> (or compile in the code).
> >>
> >> Next, to use PEBS, you can simply do:
> >>
> >>   pfmon --smpl-module=pebs --smpl-compact --with-header
> >> -einst_retired:any_p \
> >>             --long-smpl-period=2400000 ...
> >>
> >>   Not all events support PEBS. In --smpl-compact mode, each line
> >> contains a PEBS
> >>   sample.
> >>
> >> To collect cache misses on Nehalem, you can do:
> >>
> >> pfmon --smpl-module=pebs-ll -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
> \
> >>          --ld-lat-threshold=4 --long-smpl-periods=2000 --smpl-compact
> >> --with-header ...
> >>
> >>  You must use the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD to activate
> >> this
> >>  HW feature.
> >>
> >> Each line contains a PEBS record, including the cache miss
> >> information. The ld-lat parameter
> >> is the minimal threshold for the miss latency. Only misses >=
> >> threshold are captured. It must
> >> be at least 4. 4 cycles is the L1D hit latency. For each captured
> >> miss, you get an instruction addr,
> >> data addr, miss latency, source of the data (where did it come from,
> >> refer to Intel documentation).
> >> It is important to understand that the instruction addr does NOT point
> >> to the load instruction but
> >> ALWAYS to the next dynamic instruction, i.e., the whole state is
> >> recorded at retirement of the load.
> >>
> >>
> >> On Mon, Oct 5, 2009 at 1:55 AM, Carole Wu <cwu...@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > I'd like to collect information about my workload, running on Nehalem,
> >> > using
> >> > PEBS, so I use the following command.
> >> >
> >> >>> pfmon -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD
> --ld-lat-threshold=1
> >> >>> --long-smpl-periods=2000 --short-smpl-periods=200 ./mcf_base inp.in
> >> >>> load latency threshold not yet supported
> >> > However, the response seems to suggest that my machine does not
> >> > currently
> >> > support PEBS? Is it true, or am I not setting parameters correctly?
> >> >
> >> > Any help is greatly appreciated.
> >> >
> >> > Carole
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Come build with us! The BlackBerry&reg; Developer Conference in SF, CA
> >> > is the only developer event you need to attend this year. Jumpstart
> your
> >> > developing skills, take BlackBerry mobile applications to market and
> >> > stay
> >> > ahead of the curve. Join us from November 9&#45;12, 2009. Register
> >> > now&#33;
> >> > http://p.sf.net/sfu/devconf
> >> > _______________________________________________
> >> > perfmon2-devel mailing list
> >> > perfmon2-devel@lists.sourceforge.net
> >> > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel
> >> >
> >> >
> >
> >
>
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to