Hi, On AMD Zen1 and Zen2 architectures, the event to count flops requires some extra support as explained in the PPR. But that extra support is needed in the kernel. AMD posted the patches upstream and they are in 5.6 kernel.
The FLOPS event is one that can have large increments per cycle depending on how wide the SIMDs are. However, a single PMU counter can only take an increment/cycle of 15 or less. FLOPS can exceed this limit. To overcome this limitation, AMD implemented merging of counters. Two consecutive counters are "fused" together to make a counter capable of handling more than 15 increments/cycle. As you can imagine given the way Linux perf_events works, this cannot be transparent. Therefore, perf_events needs to be patched to recognize that the FLOPS event is used and that perf_events needs to allocate two consecutive counters and program the MERGE event in the second counter to yield a valid FLOPS count. This is what AMD patche (5738891229a perf/x86/amd: Add support for Large Increment per Cycle event). This is all documented in the PPR under section 2.1.15.3. Hope this helps. On Thu, Apr 16, 2020 at 2:33 AM <martyn.fos...@gmail.com> wrote: > Hi Vincent, > > Was this ever resolved? We appear to have the same issue.... > > Many thanks, Martyn > > On Friday, 31 August 2018 20:34:46 UTC+1, vincent.weaver wrote: >> >> >> I was trying to work out why the PAPI floating point event validation >> tests were failing on a Ryzen machine. >> >> PAPI is using: >> >> RETIRED_SSE_AVX_OPERATIONS:DP_ADD_SUB_FLOPS:DP_MULT_FLOPS:DP_MULT_ADD_FLOPS:DP_DIV_FLOPS >> >> 0x53f003 >> >> which I think should cover most double-precision floating point, and it >> does record a lot of counts when run on Linpack. However our simple >> validation test does a matrix-matrix multiply that does >> >> mulsd (%rax),%xmm0 >> addsd %xmm0,%xmm1 >> >> both of which I would think would count as SSE double-precision, but the >> event doesn't incrememnt (the total count for the test ends up being 0). >> >> Expected results *are* returned if we use >> >> RETIRED_MMX_FP_INSTRUCTIONS:SSE_INSTR:MMX_INSTR:X87_INSTR >> 0x5307cb >> instead >> >> Does anyone know why this might be happening? >> >> Also, somewhat related, in reading the "Open Source Register Reference >> for >> AMD Family 17h" document it mentions the special "Merge" events for >> combining results when the increment is too big. Does libpfm4 support >> this? I suppose not as it doesn't appear that Linux supports this? >> >> Thanks, >> >> Vince >> >> >>
_______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel