> On Feb 14, 2021, at 1:25 PM, Zhang, Hong <[email protected]> wrote:
>
>
>
>> On Feb 14, 2021, at 12:04 PM, Barry Smith <[email protected]> wrote:
>>
>>
>> For our handcoded AVX functions this is fine, we can handle the dispatching
>> ourselves.
>
> Cool. _may_i_use_cpu_feature() would be very useful to determine the optimal
> AVX code path at runtime. Theoretically we just need to query for the needed
> features once and cache the results.
>
>>
>> But what about all the tons of regular code in PETSc, somehow we need to
>> have the same function compiled twice and dispatched properly. Do we use
>> what Hong suggested with fat binaries? So fat-binaries PLUS
>> _may_i_use_cpu_feature together are the way to portable transportable
>> libraries?
>>
>>
>> And we do this always --with-debugging=0 so everyone, packages and users get
>> portable but also the best performance possible.
>
> IMHO, only package managers should consider using -ax options. On our side,
> if we want to satisfy the needs of different parties (developers, users,
> package managers), better be conservative than aggressive. -march=native
> brings huge performance improvement
But this means most our users are year after year throwing lots of
performance on the floor and don't even know it. I think we pander for
portability too much.
> but it has never been the default for many compilers with a good reason. Even
> -O3 does not enable the advanced vector instructions. I just did a quick
> check on petsc-02:
>
> hongzhang@petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null
> | grep SSE
> #define __SSE__ 1
> #define __SSE_MATH__ 1
> #define __SSE2__ 1
> #define __SSE2_MATH__ 1
> hongzhang@petsc-02:/nfs/gce/projects/TSAdjoint$ icc -O3 -E -dM - < /dev/null
> | grep avx
> hongzhang@petsc-02:/nfs/gce/projects/TSAdjoint$
>
> What Jed usually does (--with-debugging=0 COPTFLAGS='-O2 -march=native’) can
> be suggested to anyone who does not need to care about portability. If you do
> not want users to specify the magic options, perhaps we can provide a
> configure option like --with-portability. If it is set to false, we add
> aggressive flags automatically.
My feeling is 90+% of users don't care about portability, they want to get
fast performance on the machine they are compiling with (or a collection of
machines they have around).
Can we build aggressively for their system (except package managers and for
people who provide the -march) and have PetscInitialize() produce a very useful
error message if they then run the code on a system where it will not work? Any
system calls to get that type of information?
Barry
>
> Hong
>
>>
>> Barry
>>
>>
>>> On Feb 14, 2021, at 11:50 AM, Jed Brown <[email protected]> wrote:
>>>
>>>>
>>>
>>> immintrin.h provides
>>>
>>> if (_may_i_use_cpu_feature(_FEATURE_FMA | _FEATURE_AVX2) {
>>> fancy_version_that_needs_fma_and_avx2();
>>> } else {
>>> fallback_version();
>>> }
>>>
>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_may_i_use&expand=3677,3677
>>>
>>> I believe this function is slightly expensive because it probably calls the
>>> CPUID instruction each time. BLIS has code to cache the result and query
>>> features with simple bitwise math.
>>>
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.h
>>> https://github.com/flame/blis/blob/master/frame/base/bli_cpuid.c
>>>
>>> Of course this bit of dispatch should typically be done at object creation
>>> time, not every iteration.
>>
>