On Mon, 22 Mar 2010, stephane eranian wrote:
> I think that's a plausible explanation. It would be interesting to verify
> whether the same behavior also exists on AMD processors.
same program on a Phenom machine:
venchi:~% pfmon -e retired_instructions,retired_branch_instructions
./ten_billion
10000001156 RETIRED_INSTRUCTIONS
4999991152 RETIRED_BRANCH_INSTRUCTIONS
(retired_branch_instructions is exactly expected + 1156)
same program run on a Pentium D:
domori:~% pfmon -e
branch_retired:mmnp:mmnm:mmtp:mmtm,instr_retired:nbogusntag -- ./ten_billion
4999990579 BRANCH_RETIRED:MMNP:MMNM:MMTP:MMTM
10000000000 INSTR_RETIRED:NBOGUSNTAG
(instr_retired on pentium 4 machines does not overcount due to
instructions. instr_completed does, however it gives a 0 count if
you try to team it up with the branch_retired count.)
domori:~% pfmon -e instr_completed:nbogus -- ./ten_billion
10000000553 INSTR_COMPLETED:NBOGUS
A pentium III however does not match the behavior seen on others:
spruengli:~% perf stat -e branches:u,instructions:u -- ./ten_billion32
Performance counter stats for './ten_billion32':
5000067815 branches
10000019454 instructions
18.549943960 seconds time elapsed
The overhead for branches is about 3x as high as that for instructions.
And just for completeness, same program, modified to run on 32-bits,
running on a 200MHz Pentium Pro machine is similar to the PIII results:
ancient:~% pfmon -e inst_retired,br_inst_retired ./ten_billion32
10000013896 INST_RETIRED
5000031685 BR_INST_RETIRED
Vince
[email protected]
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel