Hello over there. We have a very strange issue when the program tries to send a non-blocking message with MPI_Isend() and packed data: if we run this send after some unnecessary code (see details below), it works, but without it, not.
This program uses dynamic spawning to launch processes. Below are some extracts of the code with comments, environment specifications, and the output error. Thanks in advance, Martín — char * xmul_coord_transbuf = NULL , * transpt , * transend ; char * mpi_buffer ; int mpi_buffer_size ; void init_xmul_coord_buff ( int siz ) { unsigned long int i = ( ( ( unsigned long ) ( siz ) + 7 ) & ~ 7 ) ; if ( xmul_coord_transbuf == NULL ) { transpt = xmul_coord_transbuf = ( char * ) malloc ( 512 ) ; transend = transpt + 508 ; } mpi_buffer = transpt ; transpt += i ; if ( transpt >= transend ) transpt = xmul_coord_transbuf ; mpi_buf_position = 0 ; mpi_buffer_size = siz ; } #define my_pack(x, mpi_type) { MPI_Pack_size(1,mpi_type,comm,&mpi_pack_size); MPI_Pack(&x, 1, mpi_type, mpi_buffer,mpi_buffer_size,&mpi_buf_position, comm); } void inform_my_completion ( double val , Fint imstopped ) { int a , i = imstopped ; MPI_Comm comm; MPI_Status status; MPI_Request request; if ( !myslavenum ) return ; // Note: myslavenum equals rank; there are 6 slaves in our test... init_xmul_coord_buff ( sizeof ( double ) + sizeof ( int ) ) ; my_pack ( val , MPI_DOUBLE ) ; my_pack ( i , MPI_INT ) ; #ifdef FUNNY_CODE // compiling with -DFUNNY_CODE, it works; otherwise it crashes with message below ... if ( FALSE ) { fprintf ( stderr , "\r/////SLAVE %i - report to COORD... %.0f\n" , myslavenum , val ) ; fflush ( stderr ) ; } #endif // this is done only ONCE, no reception even attempted in our test code MPI_Isend( mpi_buffer , mpi_buffer_size , MPI_PACKED , 0 , XMUL_DONE , MPI_COMM_WORLD , &request ) ; } ----------------------------- File compiled without optimization, linked with -O3 ----------------------------- Windows Version: Windows 10 Pro Single machine, 4 CPUs (2 threads each) ----------------------------- Cygwin Version: $ uname -r 3.3.4(0.341/5/3) ----------------------------- MPI version: mpirun (Open MPI) 4.1.2 All processes started with MPI_Comm_Spawn() ----------------------------- Crash message at runtime: [DESKTOP-N9KKTKD:00286] *** Process received signal *** [DESKTOP-N9KKTKD:00286] Signal: Segmentation fault (11) [DESKTOP-N9KKTKD:00286] Signal code: Address not mapped (23) [DESKTOP-N9KKTKD:00286] Failing at address: 0xc9 Unable to print stack trace! [DESKTOP-N9KKTKD:00286] *** End of error message *** -------------------------------------------------------------------------- Child job 2 terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- [DESKTOP-N9KKTKD:00282] *** Process received signal *** [DESKTOP-N9KKTKD:00282] Signal: Segmentation fault (11) [DESKTOP-N9KKTKD:00282] Signal code: Address not mapped (23) [DESKTOP-N9KKTKD:00282] Failing at address: 0xcb Unable to print stack trace! [DESKTOP-N9KKTKD:00282] *** End of error message *** ----------------------------- Message when exitting master: [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) [DESKTOP-N9KKTKD][[47566,1],0][/pub/devel/openmpi/v4.1/openmpi-4.1.2-1.x86_64/src/openmpi-4.1.2/opal/mca/btl/tcp/btl_tcp_frag.c:242:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Software caused connection abort (113) -------------------------------------------------------------------------- (null) noticed that process rank 5 with PID 0 on node DESKTOP-N9KKTKD exited on signal 11 (Segmentation fault). --------------------------------------------------------------------------