Hi everyone,

I've been trying to track down the source of TCP connections when running
MPI singletons, with the goal of avoiding all TCP communication to free up
ports for other processes. I have a local apt install of OpenMPI 2.1.1 on
Ubuntu 18.04 which does not establish any TCP connections by default,
either when run as "mpirun -np 1 ./program" or "./program". But it has
non-TCP alternatives for both the BTL (vader, self, etc.) and OOB (ud and
usock) frameworks, so I was not surprised by this result.

On a remote machine, I'm running the same test with an assortment of
OpenMPI versions (1.6.4, 1.8.6, 4.0.0, 4.0.1 on RHEL6 and 1.10.7 on RHEL7).
In all but 1.8.6 and 1.10.7, there is always a TCP connection established,
even if I disable the TCP BTL on the command line (e.g. "mpirun --mca btl
^tcp"). Therefore, I assumed this was because `tcp` was the only OOB
interface available in these installations. This TCP connection is
established both for "mpirun -np 1 ./program" and "./program".

The confusing part is that the 1.8.6 and 1.10.7 installations only appear
to establish a TCP connection when invoked with "mpirun -np 1 ./program",
but _not_ with "./program", even though its only OOB interface was also
`tcp`. This result was not consistent with my understanding, so now I am
confused about when I should expect TCP communication to occur.

Is there a known explanation for what I am seeing? Is there actually a way
to get singletons to forego all TCP communication, even if TCP is the only
OOB available, or is there something else at play here? I'd be happy to
provide any config.log files or ompi_info output if it would help.

For more context, the underlying issue I'm trying to resolve is that we are
(unfortunately) running many short instances of mpirun, and the TCP
connections are piling up in the TIME_WAIT state because they aren't
cleaned up faster than we create them.

Any advice or pointers would be greatly appreciated!

users mailing list

Reply via email to