Daniel,
If your MPI singleton will never MPI_Comm_spawn(), then you can use the
isolated mode like this
OMPI_MCA_ess_singleton_isolated=true ./program
You can also save some ports by blacklisting the btl/tcp component
OMPI_MCA_ess_singleton_isolated=true OMPI_MCA_pml=ob1
OMPI_MCA_btl=vader,self ./program
Cheers,
Gilles
On 4/18/2019 3:51 AM, Daniel Hemberger wrote:
Hi everyone,
I've been trying to track down the source of TCP connections when
running MPI singletons, with the goal of avoiding all TCP
communication to free up ports for other processes. I have a local apt
install of OpenMPI 2.1.1 on Ubuntu 18.04 which does not establish any
TCP connections by default, either when run as "mpirun -np 1
./program" or "./program". But it has non-TCP alternatives for both
the BTL (vader, self, etc.) and OOB (ud and usock) frameworks, so I
was not surprised by this result.
On a remote machine, I'm running the same test with an assortment of
OpenMPI versions (1.6.4, 1.8.6, 4.0.0, 4.0.1 on RHEL6 and 1.10.7 on
RHEL7). In all but 1.8.6 and 1.10.7, there is always a TCP connection
established, even if I disable the TCP BTL on the command line (e.g.
"mpirun --mca btl ^tcp"). Therefore, I assumed this was because `tcp`
was the only OOB interface available in these installations. This TCP
connection is established both for "mpirun -np 1 ./program" and
"./program".
The confusing part is that the 1.8.6 and 1.10.7 installations only
appear to establish a TCP connection when invoked with "mpirun -np 1
./program", but _not_ with "./program", even though its only OOB
interface was also `tcp`. This result was not consistent with my
understanding, so now I am confused about when I should expect TCP
communication to occur.
Is there a known explanation for what I am seeing? Is there actually a
way to get singletons to forego all TCP communication, even if TCP is
the only OOB available, or is there something else at play here? I'd
be happy to provide any config.log files or ompi_info output if it
would help.
For more context, the underlying issue I'm trying to resolve is that
we are (unfortunately) running many short instances of mpirun, and the
TCP connections are piling up in the TIME_WAIT state because they
aren't cleaned up faster than we create them.
Any advice or pointers would be greatly appreciated!
Thanks,
-Dan
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users