Configuring OpenMPI 4.1.0 with GCC 10.2.0 on Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor that supports AVX2 but not AVX512, resulted in
checking for AVX512 support (no additional flags)... no checking for AVX512 support (with -march=skylake-avx512)... yes in "configure" output, and in config.log MCA_BUILD_ompi_op_has_avx512_support_FALSE='#' MCA_BUILD_ompi_op_has_avx512_support_TRUE='' Consequently AVX512 intrinsic functions were erroneously deployed, resulting in OpenMPI failure. The relevant test code was in essence cat > conftest.c << EOF #include <immintrin.h> int main() { __m512 vA, vB; _mm512_add_ps(vA, vB); return 0; } EOF The problem with this is that the result of the function is never used, so at optimization level higher than O0 the compiler elimates the function as "dead code" (DCE). To wit, gcc -O3 -march=skylake-avx512 -S conftest.c yields .file "conftest.c" .text .section .text.startup,"ax",@progbits .p2align 4 .globl main .type main, @function main: .LFB5345: .cfi_startproc xorl %eax, %eax ret .cfi_endproc .LFE5345: .size main, .-main .ident "GCC: (GNU) 10.2.0" .section .note.GNU-stack,"",@progbits Compare this with the result of gcc -O0 -march=skylake-avx512 -S conftest.c in which the function IS called: .file "conftest.c" .text .globl main .type main, @function main: .LFB4092: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 andq $-64, %rsp subq $136, %rsp vmovaps 72(%rsp), %zmm0 vmovaps %zmm0, -56(%rsp) vmovaps 8(%rsp), %zmm0 vmovaps %zmm0, -120(%rsp) movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE4092: .size main, .-main .ident "GCC: (GNU) 10.2.0" .section .note.GNU-stack,"",@progbits Note the use of a 512-bit ZMM register - ZMM registers are used only by AVX512 instructions. Hence at O3 the test program does not detect the lack of AVX512 support by the host processor. An easy remedy would be to declare the operands as "volatile" and thereby force to compiler to invoke the function: cat > conftest.c << EOF #include <immintrin.h> int main() { volatile __m512 vA, vB; _mm512_add_ps(vA, vB); return 0; } Compiled at O3, the resulting executable dumps core as it should when run on my Haswell processor, returning nonzero exit status ($?), which would inform "configure" that the processor does not have AVX512 capability. Finally please note that this error could affect the detection of support for other instruction sets on other families of processors: compiler optimization must be inhibited for such tests to be reliable! Max --- Max R. Dechantsreiter President Performance Jones L.L.C. m...@performancejones.com Skype: PerformanceJones (UTC+01:00) +1 414 446-3100 (telephone/voicemail) http://www.linkedin.com/in/benchmarking