Daniel,

If your MPI singleton will never MPI_Comm_spawn(), then you can use the isolated mode like this

OMPI_MCA_ess_singleton_isolated=true ./program


You can also save some ports by blacklisting the btl/tcp component


OMPI_MCA_ess_singleton_isolated=true OMPI_MCA_pml=ob1 OMPI_MCA_btl=vader,self ./program


Cheers,


Gilles

On 4/18/2019 3:51 AM, Daniel Hemberger wrote:
Hi everyone,

I've been trying to track down the source of TCP connections when running MPI singletons, with the goal of avoiding all TCP communication to free up ports for other processes. I have a local apt install of OpenMPI 2.1.1 on Ubuntu 18.04 which does not establish any TCP connections by default, either when run as "mpirun -np 1 ./program" or "./program". But it has non-TCP alternatives for both the BTL (vader, self, etc.) and OOB (ud and usock) frameworks, so I was not surprised by this result.

On a remote machine, I'm running the same test with an assortment of OpenMPI versions (1.6.4, 1.8.6, 4.0.0, 4.0.1 on RHEL6 and 1.10.7 on RHEL7). In all but 1.8.6 and 1.10.7, there is always a TCP connection established, even if I disable the TCP BTL on the command line (e.g. "mpirun --mca btl ^tcp"). Therefore, I assumed this was because `tcp` was the only OOB interface available in these installations. This TCP connection is established both for "mpirun -np 1 ./program" and "./program".

The confusing part is that the 1.8.6 and 1.10.7 installations only appear to establish a TCP connection when invoked with "mpirun -np 1 ./program", but _not_ with "./program", even though its only OOB interface was also `tcp`. This result was not consistent with my understanding, so now I am confused about when I should expect TCP communication to occur.

Is there a known explanation for what I am seeing? Is there actually a way to get singletons to forego all TCP communication, even if TCP is the only OOB available, or is there something else at play here? I'd be happy to provide any config.log files or ompi_info output if it would help.

For more context, the underlying issue I'm trying to resolve is that we are (unfortunately) running many short instances of mpirun, and the TCP connections are piling up in the TIME_WAIT state because they aren't cleaned up faster than we create them.

Any advice or pointers would be greatly appreciated!

Thanks,
-Dan

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to