Re: [OMPI users] Cygwin. Strange issue with MPI_Isend() and packed data

2022-09-13 Thread Protze, Joachim via users
Hi Martin,

Your code seems to have several issues in inform_my_completion: comm is used 
uninitialized in the my_pack macro.
If the intention is that isend is executed by spawned processes, MPI_COMM_WORLD 
is probably the wrong communicator to use.

Best
Joachim

From: users  on behalf of Martín Morales via 
users 
Sent: Tuesday, September 13, 2022 11:07:06 PM
To: users@lists.open-mpi.org 
Cc: Martín Morales 
Subject: [OMPI users] Cygwin. Strange issue with MPI_Isend() and packed data

Hello over there.

We have a very strange issue when the program tries to send a non-blocking 
message with MPI_Isend() and packed data: if we run this send after some 
unnecessary code (see details below), it works, but without it, not.

This program uses dynamic spawning to launch processes. Below are some extracts 
of the code with comments, environment specifications, and the output error.

Thanks in advance,

Martín


—



char * xmul_coord_transbuf = NULL , * transpt , * transend ;
char * mpi_buffer ;
int mpi_buffer_size ;

void init_xmul_coord_buff ( int siz ) {
  unsigned long int i = ( ( ( unsigned long ) ( siz ) + 7 ) & ~ 7 ) ;
  if ( xmul_coord_transbuf == NULL ) {
  transpt = xmul_coord_transbuf = ( char * ) malloc ( 512 ) ;
  transend = transpt + 508 ; }
  mpi_buffer = transpt ;
  transpt += i ;
  if ( transpt >= transend ) transpt = xmul_coord_transbuf ;
  mpi_buf_position = 0 ;
  mpi_buffer_size = siz ;
}

#define my_pack(x, mpi_type) { MPI_Pack_size(1,mpi_type,comm,_pack_size); 
MPI_Pack(, 1, mpi_type, mpi_buffer,mpi_buffer_size,_buf_position, comm); }

void inform_my_completion ( double val , Fint imstopped ) {
  int a , i = imstopped ;
  MPI_Comm comm;
  MPI_Status status;
  MPI_Request request;
  if ( !myslavenum ) return ;  // Note: myslavenum equals rank; there are 6 
slaves in our test...
  init_xmul_coord_buff ( sizeof ( double ) + sizeof ( int ) ) ;
  my_pack ( val , MPI_DOUBLE ) ;
  my_pack ( i , MPI_INT ) ;

#ifdef FUNNY_CODE
  // compiling with -DFUNNY_CODE, it works; otherwise it crashes with message 
below ...
  if ( FALSE ) { fprintf ( stderr , "\r/SLAVE %i - report to COORD... 
%.0f\n" , myslavenum , val ) ; fflush ( stderr ) ; }
#endif

   // this is done only ONCE, no reception even attempted in our 
test code
  MPI_Isend( mpi_buffer , mpi_buffer_size , MPI_PACKED , 0 , XMUL_DONE , 
MPI_COMM_WORLD ,  ) ;
}


-
File compiled without optimization, linked with -O3

-
Windows Version:
Windows 10 Pro
Single machine, 4 CPUs (2 threads each)

-
Cygwin Version:

$ uname -r
3.3.4(0.341/5/3)

-
MPI version:

mpirun (Open MPI) 4.1.2

All processes started with MPI_Comm_Spawn()

-
Crash message at runtime:

[DESKTOP-N9KKTKD:00286] *** Process received signal ***
[DESKTOP-N9KKTKD:00286] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00286] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00286] Failing at address: 0xc9
Unable to print stack trace!
[DESKTOP-N9KKTKD:00286] *** End of error message ***
--
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
[DESKTOP-N9KKTKD:00282] *** Process received signal ***
[DESKTOP-N9KKTKD:00282] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00282] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00282] Failing at address: 0xcb
Unable to print stack trace!
[DESKTOP-N9KKTKD:00282] *** End of error message ***

-
Message when exitting master:

[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)

[OMPI users] Cygwin. Strange issue with MPI_Isend() and packed data

2022-09-13 Thread Martín Morales via users
Hello over there. 

We have a very strange issue when the program tries to send a non-blocking 
message with MPI_Isend() and packed data: if we run this send after some 
unnecessary code (see details below), it works, but without it, not.

This program uses dynamic spawning to launch processes. Below are some extracts 
of the code with comments, environment specifications, and the output error.

Thanks in advance,

Martín


—



char * xmul_coord_transbuf = NULL , * transpt , * transend ;
char * mpi_buffer ;
int mpi_buffer_size ; 

void init_xmul_coord_buff ( int siz ) {
  unsigned long int i = ( ( ( unsigned long ) ( siz ) + 7 ) & ~ 7 ) ;
  if ( xmul_coord_transbuf == NULL ) {
  transpt = xmul_coord_transbuf = ( char * ) malloc ( 512 ) ;
  transend = transpt + 508 ; }
  mpi_buffer = transpt ;
  transpt += i ;
  if ( transpt >= transend ) transpt = xmul_coord_transbuf ; 
  mpi_buf_position = 0 ;
  mpi_buffer_size = siz ;
}

#define my_pack(x, mpi_type) { MPI_Pack_size(1,mpi_type,comm,_pack_size); 
MPI_Pack(, 1, mpi_type, mpi_buffer,mpi_buffer_size,_buf_position, comm); }

void inform_my_completion ( double val , Fint imstopped ) {
  int a , i = imstopped ; 
  MPI_Comm comm;
  MPI_Status status;
  MPI_Request request;
  if ( !myslavenum ) return ;  // Note: myslavenum equals rank; there are 6 
slaves in our test...
  init_xmul_coord_buff ( sizeof ( double ) + sizeof ( int ) ) ; 
  my_pack ( val , MPI_DOUBLE ) ;
  my_pack ( i , MPI_INT ) ;
  
#ifdef FUNNY_CODE
  // compiling with -DFUNNY_CODE, it works; otherwise it crashes with message 
below ... 
  if ( FALSE ) { fprintf ( stderr , "\r/SLAVE %i - report to COORD... 
%.0f\n" , myslavenum , val ) ; fflush ( stderr ) ; }
#endif

   // this is done only ONCE, no reception even attempted in our 
test code
  MPI_Isend( mpi_buffer , mpi_buffer_size , MPI_PACKED , 0 , XMUL_DONE , 
MPI_COMM_WORLD ,  ) ; 
}


-
File compiled without optimization, linked with -O3

-
Windows Version:
Windows 10 Pro
Single machine, 4 CPUs (2 threads each)

-
Cygwin Version:

$ uname -r
3.3.4(0.341/5/3)

-
MPI version: 

mpirun (Open MPI) 4.1.2

All processes started with MPI_Comm_Spawn()

-
Crash message at runtime:

[DESKTOP-N9KKTKD:00286] *** Process received signal ***
[DESKTOP-N9KKTKD:00286] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00286] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00286] Failing at address: 0xc9
Unable to print stack trace!
[DESKTOP-N9KKTKD:00286] *** End of error message ***
--
Child job 2 terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--
[DESKTOP-N9KKTKD:00282] *** Process received signal ***
[DESKTOP-N9KKTKD:00282] Signal: Segmentation fault (11)
[DESKTOP-N9KKTKD:00282] Signal code: Address not mapped (23)
[DESKTOP-N9KKTKD:00282] Failing at address: 0xcb
Unable to print stack trace!
[DESKTOP-N9KKTKD:00282] *** End of error message ***

-
Message when exitting master:

[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
[DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv]
 mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113)
--
(null) noticed that process rank 5 with PID 0 on node DESKTOP-N9KKTKD exited on 
signal 11 (Segmentation fault).
--











  



Re: [OMPI users] Hardware topology influence

2022-09-13 Thread Jeff Squyres (jsquyres) via users
Let me add a little more color on what Gilles stated.

First, you should probably upgrade to the latest v4.1.x release: v4.1.4.  It 
has a bunch of bug fixes compared to v4.1.0.

Second, you should know that it is relatively uncommon to run HPC/MPI apps 
inside VMs because the virtualization infrastructure will -- by definition -- 
decrease your overall performance.  This is usually counter to the goal of 
writing/running HPC applications.  If you do run HPC/MPI applications in VMs, 
it is strongly recommended that you bind the cores in the VM to physical cores 
to attempt to minimize the performance loss.

By default, Open MPI maps MPI processes by core when deciding how many 
processes to place on each machine (and also deciding how to bind them).  For 
example, Open MPI looks at a machine and sees that it has N cores, and (by 
default) maps N MPI processes to that machine.  You can change Open MPI's 
defaults to map by hardware thread ("Hyperthread" in Interl parlance) instead 
of by core, but conventional knowledge is that math-heavy processes don't 
perform well with the limited resources of a single hardware thread, and 
benefit from the full resources of the core (this depends on your specific app, 
of course -- YMMV).  Intel's and AMD's hardware threads have gotten better over 
the years, but I think they still represent a division of resources in the 
core, and will likely still be performance-detrimental to at least some classes 
of HPC applications.  It's a surprisingly complicated topic.

In the v4.x series, note that you can use "mpirun --report-bindings ..." to see 
exactly where Open MPI thinks it has bound each process.  Note that this 
binding occurs before each MPI process starts; it's nothing that the 
application itself needs to do.

--
Jeff Squyres
jsquy...@cisco.com

From: users  on behalf of Gilles Gouaillardet 
via users 
Sent: Tuesday, September 13, 2022 9:07 AM
To: Open MPI Users 
Cc: Gilles Gouaillardet 
Subject: Re: [OMPI users] Hardware topology influence

Lucas,

the number of MPI tasks started by mpirun is either
 - explicitly passed via the command line (e.g. mpirun -np 2306 ...)
 - equals to the number of available slots, and this value is either
 a) retrieved from the resource manager (such as a SLURM allocation)
 b) explicitly set in a machine file (e.g. mpirun -machinefile 
 ...) or the command line
 (e.g. mpirun --hosts host0:96,host1:96 ...)
 c) if none of the above is set, the number of detected cores on the system

Cheers,

Gilles

On Tue, Sep 13, 2022 at 9:23 PM Lucas Chaloyard via users 
mailto:users@lists.open-mpi.org>> wrote:
Hello,

I'm working as a research intern in a lab where we're studying virtualization.
And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP, GPAW 
and Incompact3d from Phrononix Test suite).

To briefly explain my experiments, I'm running those benchmarks on several 
virtual machines using different topologies.
During one experiment I've been comparing those two topologies :
- Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads
- Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage of 
hyperthreading)

For the ASKAP Benchmark :
- While using Topology2, 2306 processes will be created by the application to 
do its work.
- While using Topology1, 4612 processes will be created by the application to 
do its work.
This is also happening when running GPAW and Incompact3d benchmarks.

What I've been wondering (and looking for) is, does OpenMPI take into account 
the topology, and reduce the number of processes create to execute its work in 
order to avoid the usage of hyperthreading ?
Or is it something done by the application itself ?

I was looking at the source code, and I've been trying to find how and when are 
filled the information about the MPI_COMM_WORLD communicator, to see if the 
'num_procs' field depends on the topology, but I didn't have any chance for now.

Respectfully, Chaloyard Lucas.


Re: [OMPI users] Hardware topology influence

2022-09-13 Thread Gilles Gouaillardet via users
Lucas,

the number of MPI tasks started by mpirun is either
 - explicitly passed via the command line (e.g. mpirun -np 2306 ...)
 - equals to the number of available slots, and this value is either
 a) retrieved from the resource manager (such as a SLURM allocation)
 b) explicitly set in a machine file (e.g. mpirun -machinefile
 ...) or the command line
 (e.g. mpirun --hosts host0:96,host1:96 ...)
 c) if none of the above is set, the number of detected cores on the
system

Cheers,

Gilles

On Tue, Sep 13, 2022 at 9:23 PM Lucas Chaloyard via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> I'm working as a research intern in a lab where we're studying
> virtualization.
> And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP,
> GPAW and Incompact3d from Phrononix Test suite).
>
> To briefly explain my experiments, I'm running those benchmarks on several
> virtual machines using different topologies.
> During one experiment I've been comparing those two topologies :
> - Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads
> - Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage
> of hyperthreading)
>
> For the ASKAP Benchmark :
> - While using Topology2, 2306 processes will be created by the application
> to do its work.
> - While using Topology1, 4612 processes will be created by the application
> to do its work.
> This is also happening when running GPAW and Incompact3d benchmarks.
>
> What I've been wondering (and looking for) is, does OpenMPI take into
> account the topology, and reduce the number of processes create to execute
> its work in order to avoid the usage of hyperthreading ?
> Or is it something done by the application itself ?
>
> I was looking at the source code, and I've been trying to find how and
> when are filled the information about the MPI_COMM_WORLD communicator, to
> see if the 'num_procs' field depends on the topology, but I didn't have any
> chance for now.
>
> Respectfully, Chaloyard Lucas.
>


[OMPI users] Hardware topology influence

2022-09-13 Thread Lucas Chaloyard via users
Hello, 

I'm working as a research intern in a lab where we're studying virtualization. 
And I've been working with several benchmarks using OpenMPI 4.1.0 (ASKAP, GPAW 
and Incompact3d from Phrononix Test suite). 

To briefly explain my experiments, I'm running those benchmarks on several 
virtual machines using different topologies. 
During one experiment I've been comparing those two topologies : 
- Topology1 : 96 vCPUS divided in 96 sockets containing 1 threads 
- Topology2 : 96 vCPUS divided in 48 sockets containing 2 threads (usage of 
hyperthreading) 

For the ASKAP Benchmark : 
- While using Topology2, 2306 processes will be created by the application to 
do its work. 
- While using Topology1, 4612 processes will be created by the application to 
do its work. 
This is also happening when running GPAW and Incompact3d benchmarks. 

What I've been wondering (and looking for) is, does OpenMPI take into account 
the topology, and reduce the number of processes create to execute its work in 
order to avoid the usage of hyperthreading ? 
Or is it something done by the application itself ? 

I was looking at the source code, and I've been trying to find how and when are 
filled the information about the MPI_COMM_WORLD communicator, to see if the 
'num_procs' field depends on the topology, but I didn't have any chance for 
now. 

Respectfully, Chaloyard Lucas.