Re: [OMPI users] OpenMPI 4.1.0 misidentifies x86 capabilities

2021-02-10 Thread Jeff Squyres (jsquyres) via users
I think Max did try the latest 4.1 nightly build (from an off-list email), and 
his problem still persisted.

Max: can you describe exactly how Open MPI failed?  All you said was:

>> Consequently AVX512 intrinsic functions were erroneously
>> deployed, resulting in OpenMPI failure.

Can you provide more details?


> On Feb 10, 2021, at 6:09 PM, Gilles Gouaillardet via users 
>  wrote:
> 
> Max,
> 
> at configure time, Open MPI detects the *compiler* capabilities.
> In your case, your compiler can emit AVX512 code.
> (and fwiw, the tests are only compiled and never executed)
> 
> Then at *runtime*, Open MPI detects the *CPU* capabilities.
> In your case, it should not invoke the functions containing AVX512 code.
> 
> That being said, several changes were made to the op/avx component,
> so if you are experiencing some crashes, I do invite you to give a try to the
> latest nightly snapshot for the v4.1.x branch.
> 
> 
> Cheers,
> 
> Gilles
> 
> On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
>  wrote:
>> 
>> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
>> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
>> that supports AVX2 but not AVX512, resulted in
>> 
>> checking for AVX512 support (no additional flags)... no
>> checking for AVX512 support (with -march=skylake-avx512)... yes
>> 
>> in "configure" output, and in config.log
>> 
>> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
>> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
>> 
>> Consequently AVX512 intrinsic functions were erroneously
>> deployed, resulting in OpenMPI failure.
>> 
>> The relevant test code was in essence
>> 
>> cat > conftest.c << EOF
>> #include 
>> 
>> int main()
>> {
>>__m512 vA, vB;
>> 
>>_mm512_add_ps(vA, vB);
>> 
>>return 0;
>> }
>> EOF
>> 
>> The problem with this is that the result of the function
>> is never used, so at optimization level higher than O0
>> the compiler elimates the function as "dead code" (DCE).
>> To wit,
>> 
>> gcc -O3 -march=skylake-avx512 -S conftest.c
>> 
>> yields
>> 
>>.file   "conftest.c"
>>.text
>>.section.text.startup,"ax",@progbits
>>.p2align 4
>>.globl  main
>>.type   main, @function
>> main:
>> .LFB5345:
>>.cfi_startproc
>>xorl%eax, %eax
>>ret
>>.cfi_endproc
>> .LFE5345:
>>.size   main, .-main
>>.ident  "GCC: (GNU) 10.2.0"
>>.section.note.GNU-stack,"",@progbits
>> 
>> Compare this with the result of
>> 
>> gcc -O0 -march=skylake-avx512 -S conftest.c
>> 
>> in which the function IS called:
>> 
>>.file   "conftest.c"
>>.text
>>.globl  main
>>.type   main, @function
>> main:
>> .LFB4092:
>>.cfi_startproc
>>pushq   %rbp
>>.cfi_def_cfa_offset 16
>>.cfi_offset 6, -16
>>movq%rsp, %rbp
>>.cfi_def_cfa_register 6
>>andq$-64, %rsp
>>subq$136, %rsp
>>vmovaps 72(%rsp), %zmm0
>>vmovaps %zmm0, -56(%rsp)
>>vmovaps 8(%rsp), %zmm0
>>vmovaps %zmm0, -120(%rsp)
>>movl$0, %eax
>>leave
>>.cfi_def_cfa 7, 8
>>ret
>>.cfi_endproc
>> .LFE4092:
>>.size   main, .-main
>>.ident  "GCC: (GNU) 10.2.0"
>>.section.note.GNU-stack,"",@progbits
>> 
>> Note the use of a 512-bit ZMM register - ZMM registers
>> are used only by AVX512 instructions.  Hence at O3 the
>> test program does not detect the lack of AVX512 support
>> by the host processor.
>> 
>> An easy remedy would be to declare the operands as
>> "volatile" and thereby force to compiler to invoke the
>> function:
>> 
>> cat > conftest.c << EOF
>> #include 
>> 
>> int main()
>> {
>>volatile __m512 vA, vB;
>> 
>>_mm512_add_ps(vA, vB);
>> 
>>return 0;
>> }
>> 
>> Compiled at O3, the resulting executable dumps core as it
>> should when run on my Haswell processor, returning nonzero
>> exit status ($?), which would inform "configure" that the
>> processor does not have AVX512 capability.
>> 
>> Finally please note that this error could affect the
>> detection of support for other instruction sets on other
>> families of processors: compiler optimization must be
>> inhibited for such tests to be reliable!
>> 
>> Max
>> ---
>> Max R. Dechantsreiter
>> President
>> Performance Jones L.L.C.
>> m...@performancejones.com
>> Skype: PerformanceJones (UTC+01:00)
>> +1 414 446-3100 (telephone/voicemail)
>> http://www.linkedin.com/in/benchmarking


-- 
Jeff Squyres
jsquy...@cisco.com



[OMPI users] GROMACS with openmpi

2021-02-10 Thread Wenhao Yao via users
Hi, MPI developers and users,
   I want to run GROMACS using *gmx_mpi* rather than *gmx*, could
you give me a hand on how to do that?
Thanks a lot!



Cheers,


Re: [OMPI users] OpenMPI 4.1.0 misidentifies x86 capabilities

2021-02-10 Thread Gilles Gouaillardet via users
Max,

at configure time, Open MPI detects the *compiler* capabilities.
In your case, your compiler can emit AVX512 code.
(and fwiw, the tests are only compiled and never executed)

Then at *runtime*, Open MPI detects the *CPU* capabilities.
In your case, it should not invoke the functions containing AVX512 code.

That being said, several changes were made to the op/avx component,
so if you are experiencing some crashes, I do invite you to give a try to the
latest nightly snapshot for the v4.1.x branch.


Cheers,

Gilles

On Wed, Feb 10, 2021 at 10:43 PM Max R. Dechantsreiter via users
 wrote:
>
> Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
> Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
> that supports AVX2 but not AVX512, resulted in
>
> checking for AVX512 support (no additional flags)... no
> checking for AVX512 support (with -march=skylake-avx512)... yes
>
> in "configure" output, and in config.log
>
> MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
> MCA_BUILD_ompi_op_has_avx512_support_TRUE=''
>
> Consequently AVX512 intrinsic functions were erroneously
> deployed, resulting in OpenMPI failure.
>
> The relevant test code was in essence
>
> cat > conftest.c << EOF
> #include 
>
> int main()
> {
> __m512 vA, vB;
>
> _mm512_add_ps(vA, vB);
>
> return 0;
> }
> EOF
>
> The problem with this is that the result of the function
> is never used, so at optimization level higher than O0
> the compiler elimates the function as "dead code" (DCE).
> To wit,
>
> gcc -O3 -march=skylake-avx512 -S conftest.c
>
> yields
>
> .file   "conftest.c"
> .text
> .section.text.startup,"ax",@progbits
> .p2align 4
> .globl  main
> .type   main, @function
> main:
> .LFB5345:
> .cfi_startproc
> xorl%eax, %eax
> ret
> .cfi_endproc
> .LFE5345:
> .size   main, .-main
> .ident  "GCC: (GNU) 10.2.0"
> .section.note.GNU-stack,"",@progbits
>
> Compare this with the result of
>
> gcc -O0 -march=skylake-avx512 -S conftest.c
>
> in which the function IS called:
>
> .file   "conftest.c"
> .text
> .globl  main
> .type   main, @function
> main:
> .LFB4092:
> .cfi_startproc
> pushq   %rbp
> .cfi_def_cfa_offset 16
> .cfi_offset 6, -16
> movq%rsp, %rbp
> .cfi_def_cfa_register 6
> andq$-64, %rsp
> subq$136, %rsp
> vmovaps 72(%rsp), %zmm0
> vmovaps %zmm0, -56(%rsp)
> vmovaps 8(%rsp), %zmm0
> vmovaps %zmm0, -120(%rsp)
> movl$0, %eax
> leave
> .cfi_def_cfa 7, 8
> ret
> .cfi_endproc
> .LFE4092:
> .size   main, .-main
> .ident  "GCC: (GNU) 10.2.0"
> .section.note.GNU-stack,"",@progbits
>
> Note the use of a 512-bit ZMM register - ZMM registers
> are used only by AVX512 instructions.  Hence at O3 the
> test program does not detect the lack of AVX512 support
> by the host processor.
>
> An easy remedy would be to declare the operands as
> "volatile" and thereby force to compiler to invoke the
> function:
>
> cat > conftest.c << EOF
> #include 
>
> int main()
> {
> volatile __m512 vA, vB;
>
> _mm512_add_ps(vA, vB);
>
> return 0;
> }
>
> Compiled at O3, the resulting executable dumps core as it
> should when run on my Haswell processor, returning nonzero
> exit status ($?), which would inform "configure" that the
> processor does not have AVX512 capability.
>
> Finally please note that this error could affect the
> detection of support for other instruction sets on other
> families of processors: compiler optimization must be
> inhibited for such tests to be reliable!
>
> Max
> ---
> Max R. Dechantsreiter
> President
> Performance Jones L.L.C.
> m...@performancejones.com
> Skype: PerformanceJones (UTC+01:00)
> +1 414 446-3100 (telephone/voicemail)
> http://www.linkedin.com/in/benchmarking


Re: [OMPI users] Issue with MPI_Get_processor_name() in Cygwin

2021-02-10 Thread Martín Morales via users
Hello Joseph,

Yes, it was just that. However, for some reason it was working on Linux…
Thank you very much for your help.
Regards,

Martín

From: Joseph Schuchart via users
Sent: martes, 9 de febrero de 2021 17:45
To: users@lists.open-mpi.org
Cc: Joseph Schuchart
Subject: Re: [OMPI users] Issue with MPI_Get_processor_name() in Cygwin

Martin,

The name argument to MPI_Get_processor_name is a character string of
length at least MPI_MAX_PROCESSOR_NAME, which in OMPI is 256. You are
providing a character string of length 200, so OMPI is free to write
past the end of your string and into some of your stack variables, hence
you are "losing" the values of rank and size. The issue should be gone
if you write `char hostName[MPI_MAX_PROCESSOR_NAME];`

Cheers
Joseph

On 2/9/21 9:14 PM, Martín Morales via users wrote:
> Hello,
>
> I have what it could be a memory corruption with
> /MPI_Get_processor_name()/ in Cygwin.
>
> I’m using OMPI 4.1.0; I tried also in Linux (same OMPI version) but
> there isn’t an issue there.
>
> Below the example of a trivial spawn operation. It has 2 scripts:
> spawned and spawner.
>
> In the spawned script, if I move the /MPI_Get_processor_name()/ line
> below /MPI_Comm_size()/ I lose the values of /rank/ and /size/.
>
> In fact, I declared some other variables in the /int hostName_len, rank,
> size;/ line and I lost them too.
>
> Regards,
>
> Martín
>
> ---
>
> *Spawned:*
>
> /#include "mpi.h"/
>
> /#include /
>
> /#include /
>
> //
>
> /int main(int argc, char ** argv){/
>
> */int hostName_len,rank, size;/*
>
> /MPI_Comm parentcomm;/
>
> /char hostName[200];/
>
> //
>
> /MPI_Init( NULL, NULL );/
>
> /MPI_Comm_get_parent(  );/
>
> /*MPI_Get_processor_name(hostName, _len);*/
>
> /   MPI_Comm_rank(MPI_COMM_WORLD, );/
>
> /MPI_Comm_size(MPI_COMM_WORLD, );/
>
> //
>
> /if (parentcomm != MPI_COMM_NULL) {/
>
> /  printf("I'm the spawned h: %s  r/s: %i/%i\n", hostName, rank, size);/
>
> /}/
>
> //
>
> /MPI_Finalize();/
>
> /return 0;/
>
> /}/
>
> //
>
> *Spawner:*
>
> #include "mpi.h"
>
> #include 
>
> #include 
>
> #include 
>
> int main(int argc, char ** argv){
>
>  int processesToRun;
>
>  MPI_Comm intercomm;
>
>if(argc < 2 ){
>
>  printf("Processes number needed!\n");
>
>  return 0;
>
>}
>
>processesToRun = atoi(argv[1]);
>
>MPI_Init( NULL, NULL );
>
>printf("Spawning from parent:...\n");
>
>MPI_Comm_spawn( "./spawned", MPI_ARGV_NULL, processesToRun,
> MPI_INFO_NULL, 0, MPI_COMM_SELF, , MPI_ERRCODES_IGNORE);
>
>  MPI_Finalize();
>
>  return 0;
>
> }
>
> //
>



[OMPI users] OpenMPI 4.1.0 misidentifies x86 capabilities

2021-02-10 Thread Max R. Dechantsreiter via users
Configuring OpenMPI 4.1.0 with GCC 10.2.0 on
Intel(R) Xeon(R) CPU E5-2620 v3, a Haswell processor
that supports AVX2 but not AVX512, resulted in

checking for AVX512 support (no additional flags)... no
checking for AVX512 support (with -march=skylake-avx512)... yes

in "configure" output, and in config.log

MCA_BUILD_ompi_op_has_avx512_support_FALSE='#'
MCA_BUILD_ompi_op_has_avx512_support_TRUE=''

Consequently AVX512 intrinsic functions were erroneously
deployed, resulting in OpenMPI failure.

The relevant test code was in essence

cat > conftest.c << EOF
#include 

int main()
{
__m512 vA, vB;

_mm512_add_ps(vA, vB);

return 0;
}
EOF

The problem with this is that the result of the function
is never used, so at optimization level higher than O0
the compiler elimates the function as "dead code" (DCE).
To wit,

gcc -O3 -march=skylake-avx512 -S conftest.c

yields

.file   "conftest.c"
.text
.section.text.startup,"ax",@progbits
.p2align 4
.globl  main
.type   main, @function
main:
.LFB5345:
.cfi_startproc
xorl%eax, %eax
ret
.cfi_endproc
.LFE5345:
.size   main, .-main
.ident  "GCC: (GNU) 10.2.0"
.section.note.GNU-stack,"",@progbits

Compare this with the result of

gcc -O0 -march=skylake-avx512 -S conftest.c

in which the function IS called:

.file   "conftest.c"
.text
.globl  main
.type   main, @function
main:
.LFB4092:
.cfi_startproc
pushq   %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq%rsp, %rbp
.cfi_def_cfa_register 6
andq$-64, %rsp
subq$136, %rsp
vmovaps 72(%rsp), %zmm0
vmovaps %zmm0, -56(%rsp)
vmovaps 8(%rsp), %zmm0
vmovaps %zmm0, -120(%rsp)
movl$0, %eax
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE4092:
.size   main, .-main
.ident  "GCC: (GNU) 10.2.0"
.section.note.GNU-stack,"",@progbits

Note the use of a 512-bit ZMM register - ZMM registers
are used only by AVX512 instructions.  Hence at O3 the
test program does not detect the lack of AVX512 support
by the host processor.

An easy remedy would be to declare the operands as
"volatile" and thereby force to compiler to invoke the
function:

cat > conftest.c << EOF
#include 

int main()
{
volatile __m512 vA, vB;

_mm512_add_ps(vA, vB);

return 0;
}

Compiled at O3, the resulting executable dumps core as it
should when run on my Haswell processor, returning nonzero
exit status ($?), which would inform "configure" that the
processor does not have AVX512 capability.

Finally please note that this error could affect the
detection of support for other instruction sets on other
families of processors: compiler optimization must be
inhibited for such tests to be reliable!

Max
---
Max R. Dechantsreiter
President
Performance Jones L.L.C.
m...@performancejones.com
Skype: PerformanceJones (UTC+01:00)
+1 414 446-3100 (telephone/voicemail)
http://www.linkedin.com/in/benchmarking