Phil,

On Tue, May 23, 2006 at 11:43:08AM +0200, Philip Mucci wrote:
> Hi Stephane,
> 
> It sure can...in a number of ways...But I believe the SSE/SSE2 counting
> isn't as accurate as one might like...It was the Athlon which AMD blew
> it on...no FP counter!
> 
> However, you have 3 choices.
>    PNE_OPT_FP_ADD_PIPE
>    PNE_OPT_FP_MULT_PIPE,
>    PNE_OPT_FP_MULT_AND_ADD_PIPE,
> 
> Event 0x100, 0x200 and 0x300

Well, there are things I don't get understand here. Let's take 
this simple program:

#include <sys/types.h>
#include <stdio.h>
main(int argc, char **argv)
{
        unsigned long i, n;
        float f=4;

        n = strtoul(argv[1], NULL, 0);
        for(i=0; i < n; i++) {
                f+=1.9;
        }
        printf("f=%g\n", f);
}
Compiled with: cc float.c -o float -O3 -mtune=opteron -mcpu=opteron

The loop generates the following code:
  400530:       cvtss2sd (%rsp),%xmm0
  400535:       dec    %rax
  400538:       addsd  %xmm1,%xmm0
  40053c:       cvtsd2ss %xmm0,%xmm2
  400540:       movss  %xmm2,(%rsp)
  400545:       jne    400530 <main+0x30>

With pfmon, I do 100,000,000 iterations:
$ pfmon --trigger-code-start=main --trigger-code-stop=main --us-c -u -e 
cpu_clk_unhalted,retired_instructions,DISPATCHED_FPU_OPS_ADD,DISPATCHED_FPU_OPS_MULTIPLY
 float 100000000
2,308,816,413 CPU_CLK_UNHALTED
  600,006,705 RETIRED_INSTRUCTIONS
  150,002,866 DISPATCHED_FPU_OPS_ADD
   50,000,979 DISPATCHED_FPU_OPS_MULTIPLY

I don't understand where those MULTIPLY come from. There are also 50,000,000 
additions extra.

In constrast, if I use double (instead of float) and compile the same way. I 
get the following code:
  400530:       movlpd (%rsp),%xmm1
  400535:       dec    %rax
  400538:       addsd  %xmm0,%xmm1
  40053c:       movsd  %xmm1,(%rsp)
  400541:       jne    400530 <main+0x30>

And pfmon yields:

1,163,126,715 CPU_CLK_UNHALTED
  500,005,961 RETIRED_INSTRUCTIONS
  100,001,398 DISPATCHED_FPU_OPS_ADD
            4 DISPATCHED_FPU_OPS_MULTIPLY

As such, I am inclined to believe that the cvt instructions are the cause of 
this extra "noise". It may
be coming from the way they are actually implemented.

It seems difficult to compute FLOPS on Opteron. I do not quite understand the 
PIPE versions of those
events.

Any clue?

-- 
-Stephane
_______________________________________________
perfmon mailing list
[email protected]
http://www.hpl.hp.com/hosted/linux/mail-archives/perfmon/

Reply via email to