I see similar issues on platforms with multiple IP addresses, if some of them are not fully connected. In general, specifying which interface OMPI can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem.
George. On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via users < users@lists.open-mpi.org> wrote: > Iām using OpenMpi 4.1.2 under Slurm 20.11.8. My 2 process job is > successfully launched, but when the main process rank 0 > > attempts to create an intercommunicator with process rank 1 on the other > node: > > > > MPI_Comm intercom; > > MPI_Intercomm_create(MPI_COMM_SELF, 0, MPI_COMM_WORLD, 1, <tag>, > &intercom); > > > > OpenMpi spins deep inside the MPI_Intercomm_create code, and the following > error is reported: > > > > *WARNING: Open MPI accepted a TCP connection from what appears to be a* > > *another Open MPI process but cannot find a corresponding process* > > *entry for that peer.* > > > > *This attempted connection will be ignored; your MPI job may or may not* > > *continue properly.* > > > > The output resulting from using the mpirun arguments ā--mca > ras_base_verbose 5 --display-devel-map --mca rmaps_base_verbose 5ā is > attached. > > Any help would be appreciated. >