For our handcoded AVX functions this is fine, we can handle the dispatching 
ourselves. 

  But what about all the tons of regular code in PETSc, somehow we need to have 
the same function compiled twice and dispatched properly. Do we use what Hong 
suggested with fat binaries? So fat-binaries PLUS _may_i_use_cpu_feature 
together are the way to portable transportable libraries? 

  And we do this always --with-debugging=0 so everyone, packages and users get 
portable but also the best performance possible.

  Barry


> On Feb 14, 2021, at 11:50 AM, Jed Brown <[email protected]> wrote:
> 
>> 
> 
> immintrin.h provides
> 
> if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
>  fancy_version_that_needs_fma_and_avx2();
> } else {
>  fallback_version();
> }
> 
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677
> 
> I believe this function is slightly expensive because it probably calls the 
> CPUID instruction each time. BLIS has code to cache the result and query 
> features with simple bitwise math.
> 
> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c
> 
> Of course this bit of dispatch should typically be done at object creation 
> time, not every iteration.

Reply via email to