Hi, What you're observing is a known side effect of interrupt-based sampling. There is skid. Let me explain.
What the kernel captures is the address of the instruction at the time of the PMU interrupt. That instruction may be far away from the instruction that cause the counter to overflow, i.e., in your case the branch that retired. The PMU takes some variable amount of time to interrupt after an overflow. During that time, execution continues. This "imprecision" cannot be corrected by software but only thru hardware. That's why you have Intel PEBS, for instance. Why does it work with cycles, then? That's because you are looking at stalls, i.e., locations where execution does not make forward progress. Thus the skid appears to have vanished. In summary, except for cycles you should do not expect profiles to point at instructions that generated the occurrences of the sampling event. Hope this helps. On Wed, Mar 16, 2011 at 10:13 AM, 陳韋任 <che...@iis.sinica.edu.tw> wrote: > Hi, all > > I do not sure some events description and their result. Take > event "BRANCH_INSTRUCTIONS_RETIRED" for example. `showevtinfo` > says this event counts the retirement of the last micro-op of > a branch instruction. > > When I sample an application with event "BRANCH_INSTRUCTIONS_RETIRED", > I will expect only the branch instructions got hit. Here is my > command, > > $ perf record -c 10000 -e `evt2raw BRANCH_INSTRUCTIONS_RETIRED` > ./bzip2_base.i386-m32-gcc434 input.combined 1 > > But `perf annotate` shows me that my expectation is wrong. For > example, > > $ perf annotate > ------------------------------------------------ > Percent | Source code & Disassembly of bzip2_base.i386-m32-gcc434 > ------------------------------------------------ > 3.65 : 401d65: 89 14 8b mov %edx,(%rbx,%rcx,4) > 8.27 : 401d68: 74 2a je 401d94 > <fallbackSort+0x1d4> > 0.00 : 401d6a: 41 89 c0 mov %eax,%r8d > > The "mov" instruction should not be a branch instruction, but it > still has a percentage. Can anyone help me explain this? > > Thanks! > > Regards, > chenwj > > -- > Wei-Ren Chen (陳韋任) > Parallel Processing Lab, Institute of Information Science, > Academia Sinica, Taiwan (R.O.C.) > Tel:886-2-2788-3799 #1667 > > ------------------------------------------------------------------------------ > Colocation vs. Managed Hosting > A question and answer guide to determining the best fit > for your organization - today and in the future. > http://p.sf.net/sfu/internap-sfd2d > _______________________________________________ > perfmon2-devel mailing list > perfmon2-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > ------------------------------------------------------------------------------ Colocation vs. Managed Hosting A question and answer guide to determining the best fit for your organization - today and in the future. http://p.sf.net/sfu/internap-sfd2d _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel