FWIW: I just ran a cycle of 10,000 spawns on my Mac without a problem using OMPI master, so I believe this has been resolved. I don’t know if/when the required updates might come into the various release branches.
Ralph > On Mar 16, 2019, at 1:13 PM, Thomas Pak <thomas....@maths.ox.ac.uk> wrote: > > Dear Jeff, > > I did find a way to circumvent this issue for my specific application by > spawning less frequently. However, I wanted to at least bring attention to > this issue for the OpenMPI community, as it can be reproduced with an > alarmingly simple program. > > Perhaps the user's mailing list is not the ideal place for this. Would you > recommend that I report this issue on the developer's mailing list or open a > GitHub issue? > > Best wishes, > Thomas Pak > > On Mar 16 2019, at 7:40 pm, Jeff Hammond <jeff.scie...@gmail.com> wrote: > Is there perhaps a different way to solve your problem that doesn’t spawn so > much as to hit this issue? > > I’m not denying there’s an issue here, but in a world of finite human effort > and fallible software, sometimes it’s easiest to just avoid the bugs > altogether. > > Jeff > > On Sat, Mar 16, 2019 at 12:11 PM Thomas Pak <thomas....@maths.ox.ac.uk > <mailto:thomas....@maths.ox.ac.uk>> wrote: > Dear all, > > Does anyone have any clue on what the problem could be here? This seems to be > a persistent problem present in all currently supported OpenMPI releases and > indicates that there is a fundamental flaw in how OpenMPI handles dynamic > process creation. > > Best wishes, > Thomas Pak > > From: "Thomas Pak" <thomas....@maths.ox.ac.uk > <mailto:thomas....@maths.ox.ac.uk>> > To: users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > Sent: Friday, 7 December, 2018 17:51:29 > Subject: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors > > Dear all, > > My MPI application spawns a large number of MPI processes using > MPI_Comm_spawn over its total lifetime. Unfortunately, I have experienced > that this results in problems for all currently supported OpenMPI versions > (2.1, 3.0, 3.1 and 4.0). I have written a short, self-contained program in C > (included below) that spawns child processes using MPI_Comm_spawn in an > infinite loop, where each child process exits after writing a message to > stdout. This short program leads to the following issues: > > In versions 2.1.2 (Ubuntu package) and 2.1.5 (compiled from source), the > program leads to a pipe leak where pipes keep accumulating over time until my > MPI application crashes because the maximum number of pipes has been reached. > > In versions 3.0.3 and 3.1.3 (both compiled from source), there appears to be > no pipe leak, but the program crashes with the following error message: > PMIX_ERROR: UNREACHABLE in file ptl_tcp_component.c at line 1257 > > In version 4.0.0 (compiled from source), I have not been able to test this > issue very thoroughly because mpiexec ignores the --oversubscribe > command-line flag (as detailed in this GitHub issue > https://github.com/open-mpi/ompi/issues/6130 > <https://github.com/open-mpi/ompi/issues/6130>). This prohibits the > oversubscription of processor cores, which means that spawning additional > processes immediately results in an error because "not enough slots" are > available. A fix for this was proposed recently > (https://github.com/open-mpi/ompi/pull/6139 > <https://github.com/open-mpi/ompi/pull/6139>), but since the v4.0.x developer > branch is being actively developed right now, I decided not go into it. > > I have found one e-mail thread on this mailing list about a similar problem > (https://www.mail-archive.com/users@lists.open-mpi.org/msg10543.html > <https://www.mail-archive.com/users@lists.open-mpi.org/msg10543.html>). In > this thread, Ralph Castain states that this is a known issue and suggests > that it is fixed in the then upcoming v1.3.x release. However, version 1.3 is > no longer supported and the issue has reappeared, hence this did not solve > the issue. > > I have created a GitHub gist that contains the output from "ompi_info --all" > of all the OpenMPI installations mentioned here, as well as the config.log > files for the OpenMPI installations that I compiled from source: > https://gist.github.com/ThomasPak/1003160e396bb88dff27e53c53121e0c > <https://gist.github.com/ThomasPak/1003160e396bb88dff27e53c53121e0c>. > > I have also attached the code for the short program that demonstrates these > issues. For good measure, I have included it directly here as well: > > """ > #include <stdio.h> > #include <mpi.h> > > int main(int argc, char *argv[]) { > > // Initialize MPI > MPI_Init(NULL, NULL); > > // Get parent > MPI_Comm parent; > MPI_Comm_get_parent(&parent); > > // If the process was not spawned > if (parent == MPI_COMM_NULL) { > > puts("I was not spawned!"); > > // Spawn child process in loop > char *cmd = argv[0]; > char **cmd_argv = MPI_ARGV_NULL; > int maxprocs = 1; > MPI_Info info = MPI_INFO_NULL; > int root = 0; > MPI_Comm comm = MPI_COMM_SELF; > MPI_Comm intercomm; > int *array_of_errcodes = MPI_ERRCODES_IGNORE; > > for (;;) { > MPI_Comm_spawn(cmd, cmd_argv, maxprocs, info, root, comm, > &intercomm, array_of_errcodes); > > MPI_Comm_disconnect(&intercomm); > } > > // If process was spawned > } else { > > puts("I was spawned!"); > > MPI_Comm_disconnect(&parent); > } > > // Finalize > MPI_Finalize(); > > } > """ > > Thanks in advance and best wishes, > Thomas Pak > > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://lists.open-mpi.org/mailman/listinfo/users > <https://lists.open-mpi.org/mailman/listinfo/users> > -- > Jeff Hammond > jeff.scie...@gmail.com <mailto:jeff.scie...@gmail.com> > http://jeffhammond.github.io/ <http://jeffhammond.github.io/> > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users