Pierre Jolivet <[email protected]> writes:

>>>>   Expecting PETSc users to automatically add -march= is not realistic.  I 
>>>> will try to rig something up in configure where if the user does not 
>>>> provide march something reasonable is selected. 
>>> A softer (yet trivial to implement) option might also be to just alert the 
>>> user that these flags exist in the usual message about using default 
>>> optimization flags. Something like this would encourage users to do what 
>>> Jed is doing:
>>> 
>>>       ***** WARNING: Using default optimization C flags -g -O3
>>> You might consider manually setting optimal optimization flags for your 
>>> system with
>>> COPTFLAGS="optimization flags" see config/examples/arch-*-opt.py for 
>>> examples. 
>>> In particular, you may want to supply specific flags (e.g. -march=native) 
>>> to take advantage of higher-performance instructions.
>> 
>> I think this is a reasonable thing to do.
>
> This is a reasonable message to print on the screen, but I don’t think this 
> is a reasonable flag to impose by default.
> You are basically asking all package managers to add a new flag 
> (-march=generic) which was previously not needed.
>
> I’m crossing my fingers Jed has a clever way of "making portable binaries 
> that run-time detected when to use newer instructions where it matters”, 
> because -march=native by default is just not practical when deploying 
> software.

immintrin.h provides

if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
  fancy_version_that_needs_fma_and_avx2();
} else {
  fallback_version();
}

https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677

I believe this function is slightly expensive because it probably calls the 
CPUID instruction each time. BLIS has code to cache the result and query 
features with simple bitwise math.
 
https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c

Of course this bit of dispatch should typically be done at object creation 
time, not every iteration.

Reply via email to