Max,

at configure time, Open MPI detects the *compiler* capabilities.
In your case, your compiler can emit AVX512 code.
(and fwiw, the tests are only compiled and never executed)

Then at *runtime*, Open MPI detects the *CPU* capabilities.
In your case, it should not invoke the functions containing AVX512 code.

That being said, several changes were made to the op/avx component,
so if you are experiencing some crashes, I do invite you to give a try to the
latest nightly snapshot for the v4.1.x branch.


Cheers,

Gilles

On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
<users@lists.open-mpi.org> wrote:
>
> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
> that supports AVX2 but not AVX512, resulted in
>
> checking for AVX512 support (no additional flags)... no
> checking for AVX512 support (with -march=skylake-avx512)... yes
>
> in "configure" output, and in config.log
>
> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
>
> Consequently AVX512 intrinsic functions were erroneously
> deployed, resulting in OpenMPI failure.
>
> The relevant test code was in essence
>
> cat > conftest.c << EOF
> #include <immintrin.h>
>
> int main()
> {
>         __m512 vA, vB;
>
>         _mm512_add_ps(vA, vB);
>
>         return 0;
> }
> EOF
>
> The problem with this is that the result of the function
> is never used, so at optimization level higher than O0
> the compiler elimates the function as "dead code" (DCE).
> To wit,
>
> gcc -O3 -march=skylake-avx512 -S conftest.c
>
> yields
>
>         .file   "conftest.c"
>         .text
>         .section        .text.startup,"ax",@progbits
>         .p2align 4
>         .globl  main
>         .type   main, @function
> main:
> .LFB5345:
>         .cfi_startproc
>         xorl    %eax, %eax
>         ret
>         .cfi_endproc
> .LFE5345:
>         .size   main, .-main
>         .ident  "GCC: (GNU) 10.2.0"
>         .section        .note.GNU-stack,"",@progbits
>
> Compare this with the result of
>
> gcc -O0 -march=skylake-avx512 -S conftest.c
>
> in which the function IS called:
>
>         .file   "conftest.c"
>         .text
>         .globl  main
>         .type   main, @function
> main:
> .LFB4092:
>         .cfi_startproc
>         pushq   %rbp
>         .cfi_def_cfa_offset 16
>         .cfi_offset 6, -16
>         movq    %rsp, %rbp
>         .cfi_def_cfa_register 6
>         andq    $-64, %rsp
>         subq    $136, %rsp
>         vmovaps 72(%rsp), %zmm0
>         vmovaps %zmm0, -56(%rsp)
>         vmovaps 8(%rsp), %zmm0
>         vmovaps %zmm0, -120(%rsp)
>         movl    $0, %eax
>         leave
>         .cfi_def_cfa 7, 8
>         ret
>         .cfi_endproc
> .LFE4092:
>         .size   main, .-main
>         .ident  "GCC: (GNU) 10.2.0"
>         .section        .note.GNU-stack,"",@progbits
>
> Note the use of a 512-bit ZMM register - ZMM registers
> are used only by AVX512 instructions.  Hence at O3 the
> test program does not detect the lack of AVX512 support
> by the host processor.
>
> An easy remedy would be to declare the operands as
> "volatile" and thereby force to compiler to invoke the
> function:
>
> cat > conftest.c << EOF
> #include <immintrin.h>
>
> int main()
> {
>         volatile __m512 vA, vB;
>
>         _mm512_add_ps(vA, vB);
>
>         return 0;
> }
>
> Compiled at O3, the resulting executable dumps core as it
> should when run on my Haswell processor, returning nonzero
> exit status ($?), which would inform "configure" that the
> processor does not have AVX512 capability.
>
> Finally please note that this error could affect the
> detection of support for other instruction sets on other
> families of processors: compiler optimization must be
> inhibited for such tests to be reliable!
>
> Max
> ---
> Max R. Dechantsreiter
> President
> Performance Jones L.L.C.
> m...@performancejones.com
> Skype: PerformanceJones (UTC+01:00)
> +1 414 446-3100 (telephone/voicemail)
> http://www.linkedin.com/in/benchmarking

Reply via email to