Hello Jeff, "Jeff Squyres (jsquyres)" <jsquy...@cisco.com> writes:
> With THREAD_FUNNELED, it means that there can only be one thread in > MPI at a time -- and it needs to be the same thread as the one that > called MPI_INIT_THREAD. > > Is that the case in your app? the master rank (i.e. 0) never creates threads, while other ranks go through the following to communicate with it, so I check that it is indeed the master thread communicating only: ,---- | tid = 0 | #ifdef _OPENMP | tid = omp_get_thread_num() | #endif | | do | if (tid == 0) then | call mpi_send(my_rank, 1, mpi_integer, master, ask_job, & | mpi_comm_world, mpierror) | call mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror) | | if (stat(mpi_tag) == stop_signal) then | call mpi_recv(b_,1,mpi_integer,master,stop_signal, & | mpi_comm_world,stat,mpierror) | else | call mpi_recv(iyax,1,mpi_integer,master,give_job, & | mpi_comm_world,stat,mpierror) | end if | end if | | !$omp barrier | | [... actual work...] `---- > Also, what is your app doing at src/pcorona_main.f90:627? It is the mpi_probe call above. In case it can clarify things, my app follows a master-worker paradigm, where rank 0 hands over jobs, and all mpi ranks > 0 just do the following: ,---- | !$OMP PARALLEL DEFAULT(NONE) | do | ! (the code above) | if (tid == 0) then receive job number | stop signal | | !$OMP DO schedule(dynamic) | loop_izax: do izax=sol_nz_min,sol_nz_max | | [big computing loop body] | | end do loop_izax | !$OMP END DO | | if (tid == 0) then | call mpi_send(iyax,1,mpi_integer,master,results_tag, & | mpi_comm_world,mpierror) | call mpi_send(stokes_buf_y,nz*8,mpi_double_precision, & | master,results_tag,mpi_comm_world,mpierror) | end if | | !omp barrier | | end do | !$OMP END PARALLEL `---- Following Gilles' suggestion, I also tried changing MPI_THREAD_FUNELLED to MPI_THREAD_MULTIPLE just in case, but I get the same segmentation fault in the same line (mind you, the segmentation fault doesn't happen all the time). But again, no issues if running with --bind-to socket (and no apparent issues at all in the other computer even with --bind-to none). Many thanks for any suggestions, -- Ángel de Vicente Tel.: +34 922 605 747 Web.: http://research.iac.es/proyecto/polmag/ --------------------------------------------------------------------------------------------- AVISO LEGAL: Este mensaje puede contener información confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifíquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje está estrictamente prohibida. Más información en: https://www.iac.es/es/responsabilidad-legal DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer