Max, at configure time, Open MPI detects the *compiler* capabilities. In your case, your compiler can emit AVX512 code. (and fwiw, the tests are only compiled and never executed)
Then at *runtime*, Open MPI detects the *CPU* capabilities. In your case, it should not invoke the functions containing AVX512 code. That being said, several changes were made to the op/avx component, so if you are experiencing some crashes, I do invite you to give a try to the latest nightly snapshot for the v4.1.x branch. Cheers, Gilles On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users <users@lists.open-mpi.org> wrote: > > Configuring OpenMPI 4.1.0 with GCC 10.2.0 on > Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor > that supports AVX2 but not AVX512, resulted in > > checking for AVX512 support (no additional flags)... no > checking for AVX512 support (with -march=skylake-avx512)... yes > > in "configure" output, and in config.log > > MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' > MCA_BUILD_ompi_op_has_avx512_support_TRUE='' > > Consequently AVX512 intrinsic functions were erroneously > deployed, resulting in OpenMPI failure. > > The relevant test code was in essence > > cat > conftest.c << EOF > #include <immintrin.h> > > int main() > { > __m512 vA, vB; > > _mm512_add_ps(vA, vB); > > return 0; > } > EOF > > The problem with this is that the result of the function > is never used, so at optimization level higher than O0 > the compiler elimates the function as "dead code" (DCE). > To wit, > > gcc -O3 -march=skylake-avx512 -S conftest.c > > yields > > .file "conftest.c" > .text > .section .text.startup,"ax",@progbits > .p2align 4 > .globl main > .type main, @function > main: > .LFB5345: > .cfi_startproc > xorl %eax, %eax > ret > .cfi_endproc > .LFE5345: > .size main, .-main > .ident "GCC: (GNU) 10.2.0" > .section .note.GNU-stack,"",@progbits > > Compare this with the result of > > gcc -O0 -march=skylake-avx512 -S conftest.c > > in which the function IS called: > > .file "conftest.c" > .text > .globl main > .type main, @function > main: > .LFB4092: > .cfi_startproc > pushq %rbp > .cfi_def_cfa_offset 16 > .cfi_offset 6, -16 > movq %rsp, %rbp > .cfi_def_cfa_register 6 > andq $-64, %rsp > subq $136, %rsp > vmovaps 72(%rsp), %zmm0 > vmovaps %zmm0, -56(%rsp) > vmovaps 8(%rsp), %zmm0 > vmovaps %zmm0, -120(%rsp) > movl $0, %eax > leave > .cfi_def_cfa 7, 8 > ret > .cfi_endproc > .LFE4092: > .size main, .-main > .ident "GCC: (GNU) 10.2.0" > .section .note.GNU-stack,"",@progbits > > Note the use of a 512-bit ZMM register - ZMM registers > are used only by AVX512 instructions. Hence at O3 the > test program does not detect the lack of AVX512 support > by the host processor. > > An easy remedy would be to declare the operands as > "volatile" and thereby force to compiler to invoke the > function: > > cat > conftest.c << EOF > #include <immintrin.h> > > int main() > { > volatile __m512 vA, vB; > > _mm512_add_ps(vA, vB); > > return 0; > } > > Compiled at O3, the resulting executable dumps core as it > should when run on my Haswell processor, returning nonzero > exit status ($?), which would inform "configure" that the > processor does not have AVX512 capability. > > Finally please note that this error could affect the > detection of support for other instruction sets on other > families of processors: compiler optimization must be > inhibited for such tests to be reliable! > > Max > --- > Max R. Dechantsreiter > President > Performance Jones L.L.C. > m...@performancejones.com > Skype: PerformanceJones (UTC+01:00) > +1 414 446-3100 (telephone/voicemail) > http://www.linkedin.com/in/benchmarking