Hi Stefane,
Ok, first off try it with doubles. That convert operation probably
happens 'in the FP pipe' therefore it is counted.
Since little or no HPC people use single precision (that we've worked
with) we haven't received these reports. But it would make sense for
that convert to be done in the FP pipes...
Are your events that same as ours, as far as register encodings?
Our test cases produce the expected values...but we don't have a test
case as below.
Phil
On Tue, 2006-05-23 at 05:29 -0700, Stephane Eranian wrote:
> Phil,
>
> On Tue, May 23, 2006 at 11:43:08AM +0200, Philip Mucci wrote:
> > Hi Stephane,
> >
> > It sure can...in a number of ways...But I believe the SSE/SSE2 counting
> > isn't as accurate as one might like...It was the Athlon which AMD blew
> > it on...no FP counter!
> >
> > However, you have 3 choices.
> > PNE_OPT_FP_ADD_PIPE
> > PNE_OPT_FP_MULT_PIPE,
> > PNE_OPT_FP_MULT_AND_ADD_PIPE,
> >
> > Event 0x100, 0x200 and 0x300
>
> Well, there are things I don't get understand here. Let's take
> this simple program:
>
> #include <sys/types.h>
> #include <stdio.h>
> main(int argc, char **argv)
> {
> unsigned long i, n;
> float f=4;
>
> n = strtoul(argv[1], NULL, 0);
> for(i=0; i < n; i++) {
> f+=1.9;
> }
> printf("f=%g\n", f);
> }
> Compiled with: cc float.c -o float -O3 -mtune=opteron -mcpu=opteron
>
> The loop generates the following code:
> 400530: cvtss2sd (%rsp),%xmm0
> 400535: dec %rax
> 400538: addsd %xmm1,%xmm0
> 40053c: cvtsd2ss %xmm0,%xmm2
> 400540: movss %xmm2,(%rsp)
> 400545: jne 400530 <main+0x30>
>
> With pfmon, I do 100,000,000 iterations:
> $ pfmon --trigger-code-start=main --trigger-code-stop=main --us-c -u -e
> cpu_clk_unhalted,retired_instructions,DISPATCHED_FPU_OPS_ADD,DISPATCHED_FPU_OPS_MULTIPLY
> float 100000000
> 2,308,816,413 CPU_CLK_UNHALTED
> 600,006,705 RETIRED_INSTRUCTIONS
> 150,002,866 DISPATCHED_FPU_OPS_ADD
> 50,000,979 DISPATCHED_FPU_OPS_MULTIPLY
>
> I don't understand where those MULTIPLY come from. There are also 50,000,000
> additions extra.
>
> In constrast, if I use double (instead of float) and compile the same way. I
> get the following code:
> 400530: movlpd (%rsp),%xmm1
> 400535: dec %rax
> 400538: addsd %xmm0,%xmm1
> 40053c: movsd %xmm1,(%rsp)
> 400541: jne 400530 <main+0x30>
>
> And pfmon yields:
>
> 1,163,126,715 CPU_CLK_UNHALTED
> 500,005,961 RETIRED_INSTRUCTIONS
> 100,001,398 DISPATCHED_FPU_OPS_ADD
> 4 DISPATCHED_FPU_OPS_MULTIPLY
>
> As such, I am inclined to believe that the cvt instructions are the cause of
> this extra "noise". It may
> be coming from the way they are actually implemented.
>
> It seems difficult to compute FLOPS on Opteron. I do not quite understand the
> PIPE versions of those
> events.
>
> Any clue?
>
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/