I was trying to work out why the PAPI floating point event validation
tests were failing on a Ryzen machine.
PAPI is using:
RETIRED_SSE_AVX_OPERATIONS:DP_ADD_SUB_FLOPS:DP_MULT_FLOPS:DP_MULT_ADD_FLOPS:DP_DIV_FLOPS
0x53f003
which I think should cover most double-precision floating point, and it
does record a lot of counts when run on Linpack. However our simple
validation test does a matrix-matrix multiply that does
mulsd (%rax),%xmm0
addsd %xmm0,%xmm1
both of which I would think would count as SSE double-precision, but the
event doesn't incrememnt (the total count for the test ends up being 0).
Expected results *are* returned if we use
RETIRED_MMX_FP_INSTRUCTIONS:SSE_INSTR:MMX_INSTR:X87_INSTR
0x5307cb
instead
Does anyone know why this might be happening?
Also, somewhat related, in reading the "Open Source Register Reference for
AMD Family 17h" document it mentions the special "Merge" events for
combining results when the increment is too big. Does libpfm4 support
this? I suppose not as it doesn't appear that Linux supports this?
Thanks,
Vince
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
perfmon2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel