Dean,
On Tue, Jun 06, 2006 at 12:53:28PM -0500, Dean Nelson wrote:
> The following fortran program when run via pfmon on a SGI Altix (ia64) running
> Linux 2.6.16 gets various results when measuring the number of floating point
> operations. When given ni=10 and nj=100, one would expect the result to be
> 1000, but I've seen the number range from 247 to 1001.
>
> Am I doing something wrong? Or should I be getting consistent results
> of 1000?
>
If you are after calculating FLOPS, this is not the way to do this.
> pfmon --system-wide --smpl-outfile=/tmp/sample.out.4572 --smpl-entries=100000
> -u -k --short-smpl-periods=1 --smpl-module=compact-ia64
> --events=FP_OPS_RETIRED --relative --cpu-list=0-3 ./nfpo.x
A sampling period of 1 is unrealistic, especially given the tight loop you have.
To measure flops, you simply want to count and not sample in this case. The
program does not run for long
enough with input 10,100,1 to measure anything significant when sampling.
Keep in mind that pfmon, and underlying kernel infrstructure, do support
per-thread mode, which in this
case would give you something even more precise.
Here are a few examples where I have hardcoded values 10,100,1 (compiler did
some optimization at -O2):
$ pfmon --us-c --show-time -ecpu_cycles,fp_ops_retired ./nfpo.x
ni= 10 nj= 100 nfp= 1000
a= 1000.000 b= 1.000000 c= 0.0000000E+00 d= 0.0000000E+00 e=
0.0000000E+00
786,176 CPU_CYCLES
16,386 FP_OPS_RETIRED
real 0h00m00.234s user 0h00m00.000s sys 0h00m00.104s
The theoretical value of FP_OPS_RETIRED for this program must be 1000 (10x100
fadd instruction in doflop).
you can verify this by constraining monitornig to a particular function or
betweeen two points:
To count the number of FP_OPS_RETIRED in doflop():
$ pfmon --us-c --irange=doflop_ -efp_ops_retired ./nfpo.x
ni= 10 nj= 100 nfp= 1000
a= 1000.000 b= 1.000000 c= 0.0000000E+00 d=
0.0000000E+00 e=
0.0000000E+00
1,000 FP_OPS_RETIRED
In your main routine (MAIN__):
$ pfmon --us-c --irange=MAIN__ -efp_ops_retired ./nfpo.x
ni= 10 nj= 100 nfp= 1000
a= 1000.000 b= 1.000000 c= 0.0000000E+00 d=
0.0000000E+00 e=
0.0000000E+00
0 FP_OPS_RETIRED
To start when entering function foo() and stop when leaving function foo(),
i.e., measure
foo() and all its children, you can do:
$ pfmon --us-c --trigger-code-start=MAIN__ --trigger-code-stop=MAIN__
-efp_ops_retired ./nfpo.x
ni= 10 nj= 100 nfp= 1000
a= 1000.000 b= 1.000000 c= 0.0000000E+00 d=
0.0000000E+00 e=
0.0000000E+00
1,746 FP_OPS_RETIRED
Note that this 'trigger' form uses breakpoints to delimit start/stop. As such
it can be used
with any events unlike --irange. The downside being that there is more
overhead. The triggers
are executed only once, to repeat them you need to specify
--trigger-code-repeat.
If want to look at flops over time, then you can sample but run for much longer.
I mofidied the program to run for 26s, and here is the pfmon command I use:
$ pfmon -ecpu_cycles,fp_ops_retired --reset-non-smpl
--long-smpl-periods=1500000000 --overflow-block nfpo.x
I sample on cycles, here 1sample/s on this 1.5GHz Madison. So I will get about
26samples. The --overflow-block
option tells the kernel to block nfpo.x when the sampling buffer fills up. That
will not happen in this case.
Each sample contains information about where nfpo was but also the value of
FP_OPS_RETIRED at that time.
To avoid avoid to compute the delta of FP_OPS_RETIRED, you can use
--reset-non-smpl, which will reset
FP_OPS_RETIRED after each sample, thereby giving you directly the number of
FP_OPS_RETIRED between samples, i.e.,
in 1 second. And you get:
entry 0 PID:14663 TID:14663 CPU:1 STAMP:0x5212c93c7474 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001657bc66
entry 1 PID:14663 TID:14663 CPU:1 STAMP:0x521322c79872 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b21d
entry 2 PID:14663 TID:14663 CPU:1 STAMP:0x52137c5044e0 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b185
entry 3 PID:14663 TID:14663 CPU:1 STAMP:0x5213d5dfa5be IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b03f
entry 4 PID:14663 TID:14663 CPU:1 STAMP:0x52142f671194 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b07f
entry 5 PID:14663 TID:14663 CPU:1 STAMP:0x521488efa308 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b08f
entry 6 PID:14663 TID:14663 CPU:1 STAMP:0x5214e2770581 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b08f
entry 7 PID:14663 TID:14663 CPU:1 STAMP:0x52153bff60bc IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b069
entry 8 PID:14663 TID:14663 CPU:1 STAMP:0x52159586c24b IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b08b
entry 9 PID:14663 TID:14663 CPU:1 STAMP:0x5215ef0ef7fb IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
...
entry 22 PID:14663 TID:14663 CPU:1 STAMP:0x521a7af20495 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b01e
entry 23 PID:14663 TID:14663 CPU:1 STAMP:0x521ad479f980 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b02b
entry 24 PID:14663 TID:14663 CPU:1 STAMP:0x521b2e014080 IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b023
entry 25 PID:14663 TID:14663 CPU:1 STAMP:0x521b8789207b IIP:0x4000000000002e60
OVFL:4 LAST_VAL:1500000000 SET:0
PMD5 : 0x000000001659b025
The average FP_OPS_RETIRED is pretty much constant around 0x000000001659b08f.
Hope this helps.
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>
> cat << EOF > nfpo.f
> program nfpo
> integer :: ni, nj, nfp
> real :: a, b, c, d, e
>
> print *, 'enter ni, nj, b'
> read *, ni, nj, b
>
> do i = 1, ni
> call doflop( nj, a, b, c, d, e )
> end do
>
> nfp = ni * nj
> print *, ' ni=', ni, ' nj=', nj, ' nfp=', nfp
> print *, ' a=', a, ' b=', b, ' c=', c, ' d=', d, ' e=', e
>
> end program
>
> subroutine doflop( nj, a, b, c, d, e)
> real :: a, b, c, d, e
>
> do i = 1, nj
> a = a + b
> end do
>
> end subroutine
> EOF
>
> ifort -O0 -g -free -o ./nfpo.x ./nfpo.f
> nm ./nfpo.x | sort > nfpo.x.map
>
> # I ran the following on a four processor system, you may need to change
> # '--cpu-list=0-3' to reflect the system you run this on.
>
>
> # when asked to " enter ni, nj, b" give it: 10,100,1
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/