I was trying to work out why the PAPI floating point event validation 
tests were failing on a Ryzen machine.

PAPI is using:
        
RETIRED_SSE_AVX_OPERATIONS:DP_ADD_SUB_FLOPS:DP_MULT_FLOPS:DP_MULT_ADD_FLOPS:DP_DIV_FLOPS
        0x53f003

which I think should cover most double-precision floating point, and it
does record a lot of counts when run on Linpack.  However our simple
validation test does a matrix-matrix multiply that does

        mulsd  (%rax),%xmm0
        addsd  %xmm0,%xmm1

both of which I would think would count as SSE double-precision, but the 
event doesn't incrememnt (the total count for the test ends up being 0).

Expected results *are* returned if we use

        RETIRED_MMX_FP_INSTRUCTIONS:SSE_INSTR:MMX_INSTR:X87_INSTR
        0x5307cb
instead

Does anyone know why this might be happening?

Also, somewhat related, in reading the "Open Source Register Reference for 
AMD Family 17h" document it mentions the special "Merge" events for 
combining results when the increment is too big.  Does libpfm4 support 
this?  I suppose not as it doesn't appear that Linux supports this?

Thanks,

Vince



------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to