We did have some issues with 4.1.0 and AVX, but we have fixed everything that 
we were aware of.

I'd be curious to know if you still have build failures in the latest 4.1.x 
nightly snapshot.

If you do, can you send the following:

- stdout/stderr from running configure
- config.log
- stdout/stderr from running "make V=1" (for brevity, you can "make", get to 
the failure, and then "make V=1" to get just the details of the compile 
failure, vs. the details of the entire make with oodles of lengthy successful 
compiles)


> On Feb 11, 2021, at 9:15 AM, Max R. Dechantsreiter 
> <m...@performancejones.com> wrote:
> 
> ...The error that prompted me to start this thread occurred
> during "make all" with 4.1.0:
> 
> .
> .
> .
> Making all in mca/op/avx
> gmake[2]: Entering directory 
> `/home/maxd/XXXXXXXXXXXXXXXXXX/Build/openmpi-4.1.0_gcc-10.2.0/ompi/mca/op/avx'
>  CC       op_avx_component.lo
>  CC       liblocal_ops_avx_la-op_avx_functions.lo
>  CCLD     liblocal_ops_avx.la
>  CC       liblocal_ops_avx512_la-op_avx_functions.lo
> op_avx_functions.c: In function 'ompi_op_avx_2buff_bxor_uint64_t_avx512':
> op_avx_functions.c:208:21: warning: AVX512F vector return without AVX512F 
> enabled changes the ABI [-Wpsabi]
>  208 |             __m512i vecA =  _mm512_loadu_si512((__m512i*)in);          
>  \
>      |                     ^~~~
> op_avx_functions.c:263:5: note: in expansion of macro 'OP_AVX_AVX512_BIT_FUNC'
>  263 |     OP_AVX_AVX512_BIT_FUNC(name, type_size, type, op);                 
>  \
>      |     ^~~~~~~~~~~~~~~~~~~~~~
> op_avx_functions.c:573:5: note: in expansion of macro 'OP_AVX_BIT_FUNC'
>  573 |     OP_AVX_BIT_FUNC(bxor, 64, uint64_t, xor)
>      |     ^~~~~~~~~~~~~~~
> In file included from 
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
>                 from op_avx_functions.c:26:
> op_avx_functions.c: In function 'ompi_op_avx_2buff_max_int8_t_avx512':
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512fintrin.h:6429:1:
>  error: inlining failed in call to 'always_inline' '_mm512_storeu_si512': 
> target specific option mismatch
> 6429 | _mm512_storeu_si512 (void *__P, __m512i __A)
>      | ^~~~~~~~~~~~~~~~~~~
> op_avx_functions.c:73:13: note: called from here
>   73 |             _mm512_storeu_si512((__m512*)out, res);                    
>         \
>      |             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
>  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);          
>         \
>      |     ^~~~~~~~~~~~~~~~~~
> op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
>  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
>      |     ^~~~~~~~~~~
> In file included from 
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:65,
>                 from op_avx_functions.c:26:
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/avx512bwintrin.h:1984:1:
>  error: inlining failed in call to 'always_inline' '_mm512_max_epi8': target 
> specific option mismatch
> 1984 | _mm512_max_epi8 (__m512i __A, __m512i __B)
>      | ^~~~~~~~~~~~~~~
> op_avx_functions.c:72:27: note: called from here
>   72 |             __m512i res = _mm512_##op##_ep##type_sign##type_size(vecA, 
> vecB);  \
>      |                           
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> op_avx_functions.c:124:5: note: in expansion of macro 'OP_AVX_AVX512_FUNC'
>  124 |     OP_AVX_AVX512_FUNC(name, type_sign, type_size, type, op);          
>         \
>      |     ^~~~~~~~~~~~~~~~~~
> op_avx_functions.c:454:5: note: in expansion of macro 'OP_AVX_FUNC'
>  454 |     OP_AVX_FUNC(max, i, 8,    int8_t, max)
>      |     ^~~~~~~~~~~
> In file included from 
> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/gcc-10.2.0/lib/gcc/x86_64-linux-gnu/10.2.0/include/immintrin.h:55,
>                 from op_avx_functions.c:26:
> .
> .
> .
> 
> End result: the build failed.
> 
> My build of v4.1.x-202102090356-380ac96 threw no errors.
> I will continue with an attempt to build GROMACS using
> that 4.1.x snapshot.
> 
> 
> On Thu, Feb 11, 2021 at 01:10:42PM +0000, Max R. Dechantsreiter wrote:
>> I ran into a problem with 4.1.0 several weeks ago,
>> and no longer recall precisely how; I am now rebuilding
>> both 4.1.0 and a recent 4.1.x, then will use them to
>> build GROMACS, probably the application I was attemping
>> back then.
>> 
>> But I do have this from my notes (for 4.1.0):
>> 
>> mpicc -fopenmp hybrid_hello.c
>> export OMP_NUM_THREADS=2
>> mpirun -np 2 ./a.out
>> # [server.clearlight.us:18349] mca_base_component_repository_open: unable to 
>> open mca_op_avx: 
>> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
>>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
>> # [server.clearlight.us:18348] mca_base_component_repository_open: unable to 
>> open mca_op_avx: 
>> /home/maxd/XXXXXXXXXXXXXXXXXX/opt/gnu/openmpi-4.1.0_gcc-10.2.0/lib/openmpi/mca_op_avx.so:
>>  undefined symbol: ompi_op_avx_functions_avx512 (ignored)
>> # Hello from thread 0 out of 2 from process 0 out of 2 on 
>> server.clearlight.us
>> # Hello from thread 1 out of 2 from process 0 out of 2 on 
>> server.clearlight.us
>> # Hello from thread 0 out of 2 from process 1 out of 2 on 
>> server.clearlight.us
>> # Hello from thread 1 out of 2 from process 1 out of 2 on 
>> server.clearlight.us
>> 
>> (where I X-ed out confidential details).  Not an error,
>> but surely indicative of something amiss.
>> 
>> More to come!
>> 
>> 
>> On Thu, Feb 11, 2021 at 02:02:48AM +0000, Jeff Squyres (jsquyres) via users 
>> wrote:
>>> I think Max did try the latest 4.1 nightly build (from an off-list email), 
>>> and his problem still persisted.
>>> 
>>> Max: can you describe exactly how Open MPI failed?  All you said was:
>>> 
>>>>> Consequently AVX512 intrinsic functions were erroneously
>>>>> deployed, resulting in OpenMPI failure.
>>> 
>>> Can you provide more details?
>>> 
>>> 
>>>> On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users 
>>>> <users@lists.open-mpi.org> wrote:
>>>> 
>>>> Max,
>>>> 
>>>> at configure time, Open MPI detects the *compiler* capabilities.
>>>> In your case, your compiler can emit AVX512 code.
>>>> (and fwiw, the tests are only compiled and never executed)
>>>> 
>>>> Then at *runtime*, Open MPI detects the *CPU* capabilities.
>>>> In your case, it should not invoke the functions containing AVX512 code.
>>>> 
>>>> That being said, several changes were made to the op/avx component,
>>>> so if you are experiencing some crashes, I do invite you to give a try to 
>>>> the
>>>> latest nightly snapshot for the v4.1.x branch.
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Gilles
>>>> 
>>>> On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
>>>> <users@lists.open-mpi.org> wrote:
>>>>> 
>>>>> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
>>>>> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
>>>>> that supports AVX2 but not AVX512, resulted in
>>>>> 
>>>>> checking for AVX512 support (no additional flags)... no
>>>>> checking for AVX512 support (with -march=skylake-avx512)... yes
>>>>> 
>>>>> in "configure" output, and in config.log
>>>>> 
>>>>> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
>>>>> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
>>>>> 
>>>>> Consequently AVX512 intrinsic functions were erroneously
>>>>> deployed, resulting in OpenMPI failure.
>>>>> 
>>>>> The relevant test code was in essence
>>>>> 
>>>>> cat > conftest.c << EOF
>>>>> #include <immintrin.h>
>>>>> 
>>>>> int main()
>>>>> {
>>>>>       __m512 vA, vB;
>>>>> 
>>>>>       _mm512_add_ps(vA, vB);
>>>>> 
>>>>>       return 0;
>>>>> }
>>>>> EOF
>>>>> 
>>>>> The problem with this is that the result of the function
>>>>> is never used, so at optimization level higher than O0
>>>>> the compiler elimates the function as "dead code" (DCE).
>>>>> To wit,
>>>>> 
>>>>> gcc -O3 -march=skylake-avx512 -S conftest.c
>>>>> 
>>>>> yields
>>>>> 
>>>>>       .file   "conftest.c"
>>>>>       .text
>>>>>       .section        .text.startup,"ax",@progbits
>>>>>       .p2align 4
>>>>>       .globl  main
>>>>>       .type   main, @function
>>>>> main:
>>>>> .LFB5345:
>>>>>       .cfi_startproc
>>>>>       xorl    %eax, %eax
>>>>>       ret
>>>>>       .cfi_endproc
>>>>> .LFE5345:
>>>>>       .size   main, .-main
>>>>>       .ident  "GCC: (GNU) 10.2.0"
>>>>>       .section        .note.GNU-stack,"",@progbits
>>>>> 
>>>>> Compare this with the result of
>>>>> 
>>>>> gcc -O0 -march=skylake-avx512 -S conftest.c
>>>>> 
>>>>> in which the function IS called:
>>>>> 
>>>>>       .file   "conftest.c"
>>>>>       .text
>>>>>       .globl  main
>>>>>       .type   main, @function
>>>>> main:
>>>>> .LFB4092:
>>>>>       .cfi_startproc
>>>>>       pushq   %rbp
>>>>>       .cfi_def_cfa_offset 16
>>>>>       .cfi_offset 6, -16
>>>>>       movq    %rsp, %rbp
>>>>>       .cfi_def_cfa_register 6
>>>>>       andq    $-64, %rsp
>>>>>       subq    $136, %rsp
>>>>>       vmovaps 72(%rsp), %zmm0
>>>>>       vmovaps %zmm0, -56(%rsp)
>>>>>       vmovaps 8(%rsp), %zmm0
>>>>>       vmovaps %zmm0, -120(%rsp)
>>>>>       movl    $0, %eax
>>>>>       leave
>>>>>       .cfi_def_cfa 7, 8
>>>>>       ret
>>>>>       .cfi_endproc
>>>>> .LFE4092:
>>>>>       .size   main, .-main
>>>>>       .ident  "GCC: (GNU) 10.2.0"
>>>>>       .section        .note.GNU-stack,"",@progbits
>>>>> 
>>>>> Note the use of a 512-bit ZMM register - ZMM registers
>>>>> are used only by AVX512 instructions.  Hence at O3 the
>>>>> test program does not detect the lack of AVX512 support
>>>>> by the host processor.
>>>>> 
>>>>> An easy remedy would be to declare the operands as
>>>>> "volatile" and thereby force to compiler to invoke the
>>>>> function:
>>>>> 
>>>>> cat > conftest.c << EOF
>>>>> #include <immintrin.h>
>>>>> 
>>>>> int main()
>>>>> {
>>>>>       volatile __m512 vA, vB;
>>>>> 
>>>>>       _mm512_add_ps(vA, vB);
>>>>> 
>>>>>       return 0;
>>>>> }
>>>>> 
>>>>> Compiled at O3, the resulting executable dumps core as it
>>>>> should when run on my Haswell processor, returning nonzero
>>>>> exit status ($?), which would inform "configure" that the
>>>>> processor does not have AVX512 capability.
>>>>> 
>>>>> Finally please note that this error could affect the
>>>>> detection of support for other instruction sets on other
>>>>> families of processors: compiler optimization must be
>>>>> inhibited for such tests to be reliable!
>>>>> 
>>>>> Max
>>>>> ---
>>>>> Max R. Dechantsreiter
>>>>> President
>>>>> Performance Jones L.L.C.
>>>>> m...@performancejones.com
>>>>> Skype: PerformanceJones (UTC+01:00)
>>>>> +1 414 446-3100 (telephone/voicemail)
>>>>> http://www.linkedin.com/in/benchmarking
>>> 
>>> 
>>> -- 
>>> Jeff Squyres
>>> jsquy...@cisco.com
>>> 


-- 
Jeff Squyres
jsquy...@cisco.com

Reply via email to