On 10/29/18 12:20 PM, Michael Petlan wrote:
> Hi Vince,
> 
> On Sun, 28 Oct 2018, Vince Weaver wrote:
>> On Thu, 25 Oct 2018, Michael Petlan wrote:
>>
>>> The ctests/flops test fails on HPE Moonshot machines:
>>
>> so what type of processor is it exactly in this machine?
> 
> Is there any way to detect what papi knows/uses apart from papi_avail?
> 
> [...]
> Vendor string and code   : ARM (7, 0x7)
> Model string and code    :  (1, 0x1)
> CPU revision             : 1.000000
> CPU Max MHz              : 12
> CPU Min MHz              : 12
> Total cores              : 8
> SMT threads per core     : 1
> Cores per socket         : 2
> Sockets                  : 4
> Cores per NUMA region    : 8
> NUMA regions             : 1
> Running in a VM          : no
> Number Hardware Counters : 4
> Max Multiplex Counters   : 384
> Fast counter read (rdpmc): no
> [...]
> 
> # lscpu
> Architecture:        aarch64
> Byte Order:          Little Endian
> CPU(s):              8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  1
> Core(s) per socket:  2
> Socket(s):           4
> NUMA node(s):        1
> Vendor ID:           APM
> Model:               1
> Model name:          X-Gene
> Stepping:            0x0
> BogoMIPS:            100.00
> NUMA node0 CPU(s):   0-7
> Flags:               fp asimd evtstrm cpuid
> 
>> Also which event is being used by FP_INS /FP_OPS?
>>
>> I gather it might be Xgene, and in that case
>>
>> PRESET,PAPI_FP_INS,NOT_DERIVED,INST_SPEC_EXEC_VFP
> 
> Yes, it seems it is X-Gene, thus the above definition is probably the used 
> one.
> 
>>
>> some of your problems are coming from the "SPEC_EXEC" (speculatively 
>> executed) part.  Not it being 3 counts for each instruction, but the fact 
>> that the value seems to vary.
> 
> OK. Sounds logical...
> 
>>
>> Also if you look at the ARM ARM for a generic VFP_SPEC events it says
>> "The counter counts the last micro-operation of each data engine 
>> floating-point instruction."  So if somehow the fmadd instruction is 
>> broken up into 3 vfp micro-ops internally then you might get the 3-count.
>>
> Shouldn't it be FP_OPS instead of FP_INS then? We're counting ops not insns
> in fact.

The definition of what PAPI_FP_OPS and PAPI_FP_INS measure seems to vary a bit 
dependent on the micro architecture as mentioned on 
https://icl.cs.utk.edu/projects/papi/wiki/PAPIC:High_Level .  On x86 there are 
several different pieces of hardware that could be used for FP calculations. 
The limited number of PMU registers available make it difficult to monitor all 
areas without running out of PMU registers on some microarchitectures.



> 
>>> 3) Shouldn't the test adjust the estimation of expected
>>> result based on whether it tests FP_INS or FP_OPS, instead
>>> of based on whether __powerpc__ is defined?
>>
>> well, we should properly break out if a fmadd counts as one or two ops on 
>> an architecture, which is what the powerpc test was really trying to do.  
>>
>> as you are finding, floating point operation events vary a lot from vendor 
>> to vendor and even chip to chip and getting info about what is going on 
>> can be difficult.  AMD fam17h (ryzen) has mysterious issues with the FP 
>> event counts too and I'm still waiting to see if anyone from AMD can tell 
>> me why the events are behaving that way.
> 
> OK, so the conclusion is that the test is not reliable everywhere,
> since it does not adjust to HW, just makes basic assumptions based
> on whether it runs on ppc or not.

Yes, the flops_float_init_matrix() is going to need to be more flexible in how 
it computes the number expected number of flops to account for variations in 
microarchitectures and use more than just the basic machine architecture to get 
an appropriate exected value.

-Will


_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to