You can first double check you
MPI_Init_thread(..., MPI_THREAD_MULTIPLE, ...)
And the provided level is MPI_THREAD_MULTIPLE as you requested.

Cheers,

Gilles

On Fri, Apr 22, 2022, 21:45 Angel de Vicente via users <
users@lists.open-mpi.org> wrote:

> Hello,
>
> I'm running out of ideas, and wonder if someone here could have some
> tips on how to debug a segmentation fault I'm having with my
> application [due to the nature of the problem I'm wondering if the
> problem is with OpenMPI itself rather than my app, though at this point
> I'm not leaning strongly either way].
>
> The code is hybrid MPI+OpenMP and I compile it with gcc 10.3.0 and
> OpenMPI 4.1.3.
>
> Usually I was running the code with "mpirun -np X --bind-to none [...]"
> so that the threads created by OpenMP don't get bound to a single core
> and I actually get proper speedup out of OpenMP.
>
> Now, since I introduced some changes to the code this week (though I
> have read the changes carefully a number of times, and I don't see
> anything suspicious), I now get a segmentation fault sometimes, but only
> when I run with "--bind-to none" and only in my workstation. It is not
> always with the same running configuration, but I can see some pattern,
> and the problem shows up only if I run the version compiled with OpenMP
> support and most of the times only when the number of rank*threads goes
> above 4 or so. If I run it with "--bind-to socket" all looks good all
> the time.
>
> If I run it in another server, "--bind-to none" doesn't seem to be any
> issue (I submitted the jobs many many times and not a single
> segmentation fault), but in my workstation it fails almost every time if
> using MPI+OpenMP with a handful of threads and with "--bind-to none". In
> this other server I'm running gcc 9.3.0 and OpenMPI 4.1.3.
>
> For example, setting OMP_NUM_THREADS to 1, I run the code like the
> following, and get the segmentation fault as below:
>
> ,----
> | angelv@sieladon:~/.../Fe13_NL3/t~gauss+isat+istim$ mpirun -np 4
> --bind-to none  ../../../../../pcorona+openmp~gauss Fe13_NL3.params
> |  Reading control file: Fe13_NL3.params
> |   ... Control file parameters broadcasted
> |
> | [...]
> |
> |  Starting calculation loop on the line of sight
> |  Receiving results from:            2
> |  Receiving results from:            1
> |
> | Program received signal SIGSEGV: Segmentation fault - invalid memory
> reference.
> |
> | Backtrace for this error:
> |  Receiving results from:            3
> | #0  0x7fd747e7555f in ???
> | #1  0x7fd7488778e1 in ???
> | #2  0x7fd7488667a4 in ???
> | #3  0x7fd7486fe84c in ???
> | #4  0x7fd7489aa9ce in ???
> | #5  0x414959 in __pcorona_main_MOD_main_loop._omp_fn.0
> |         at src/pcorona_main.f90:627
> | #6  0x7fd74813ec75 in ???
> | #7  0x412bb0 in pcorona
> |         at src/pcorona.f90:49
> | #8  0x40361c in main
> |         at src/pcorona.f90:17
> |
> | [...]
> |
> |
> --------------------------------------------------------------------------
> | mpirun noticed that process rank 3 with PID 0 on node sieladon exited on
> signal 11 (Segmentation fault).
> | -------------------------------------------------------
> `----
>
> I cannot see inside the MPI library (I don't really know if that would
> be helpful) but line 627 in pcorona_main.f90 is:
>
> ,----
> |              call
> mpi_probe(master,mpi_any_tag,mpi_comm_world,stat,mpierror)
> `----
>
> Any ideas/suggestions what could be going on or how to try an get some
> more clues about the possible causes of this?
>
> Many thanks,
> --
> Ángel de Vicente
>
> Tel.: +34 922 605 747
> Web.: http://research.iac.es/proyecto/polmag/
>
> ---------------------------------------------------------------------------------------------
> AVISO LEGAL: Este mensaje puede contener información confidencial y/o
> privilegiada. Si usted no es el destinatario final del mismo o lo ha
> recibido por error, por favor notifíquelo al remitente inmediatamente.
> Cualquier uso no autorizadas del contenido de este mensaje está
> estrictamente prohibida. Más información en:
> https://www.iac.es/es/responsabilidad-legal
> DISCLAIMER: This message may contain confidential and / or privileged
> information. If you are not the final recipient or have received it in
> error, please notify the sender immediately. Any unauthorized use of the
> content of this message is strictly prohibited. More information:
> https://www.iac.es/en/disclaimer
>

Reply via email to