Hi OMPI_Developers,

It seems that I am unable to establish an MPI communication between two 
independently started MPI programs using the simplest client/server call 
sequence I can imagine (see the two attached files) when the client and server 
process are started on different machines. Note that I have no problems when 
the client and server program run on the same machine.

For example if I do the following on the server machine (running on fn1):

[audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
[audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
Server port = 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'

The server prints its port (created with MPI_Open_port()) and wait for a 
connection by calling MPI_Comm_accept().

Now on the client machine (running on linux15) if I compile the client and run 
it with the above port address on the command line, I get:

[audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
[audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient 
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
trying to connect...
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
  Local host:    linux15
  Remote host:   linux15
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
[linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler: 
invalid connection state (6) on socket 16

And then I have to stop the client program by pressing ^C (and also the server 
which doesn't seems affected).

What's wrong ?

And I am almost sure there is no firewall running on linux15.

It is not the first MPI client/server application I am developing (with both 
OpenMPI and mpich).
These simple MPI client/server programs work well with mpich (version 3.1.3).

This problem happens with both OpenMPI 1.8.3 and 1.8.6

linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected by 
a Gigabit Ethernet (the normal network).

And again if client and server run on the same machine (either fn1 or linux15) 
no such problems happens.

Thanks in advance,

Martin Audet
#include <stdio.h>
#include <stdlib.h>

#include <mpi.h>

int main(int argc, char **argv)
{
   int       comm_rank;
   char      port_name[MPI_MAX_PORT_NAME];
   MPI_Comm intercomm;
   int      ok_flag;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   ok_flag = (comm_rank != 0) || (argc == 1);
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
      if (comm_rank == 0) {
         fprintf(stderr,"Usage: %s\n",argv[0]);
      }
      MPI_Abort(MPI_COMM_WORLD, 1);
   }

   MPI_Open_port(MPI_INFO_NULL, port_name);

   if (comm_rank == 0) {
      printf("Server port = '%s'\n", port_name);
   }
   MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm);

   MPI_Close_port(port_name);

   if (comm_rank == 0) {
      printf("MPI_Comm_accept() sucessful...\n");
   }

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return EXIT_SUCCESS;
}
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

#include <mpi.h>

int main(int argc, char **argv)
{
   int      comm_rank;
   int      ok_flag;
   MPI_Comm intercomm;

   MPI_Init(&argc, &argv);

   MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);

   ok_flag = (comm_rank != 0)  || ((argc == 2)  &&  argv[1]  &&  (*argv[1] != '\0'));
   MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);

   if (!ok_flag) {
      if (comm_rank == 0) {
         fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
      }
      MPI_Abort(MPI_COMM_WORLD, 1);
   }

   if (comm_rank == 0) {
      printf("trying to connect...\n");
   }
   while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) {
      if (comm_rank == 0) {
         printf("MPI_Comm_connect() failled, sleeping and retrying...\n");
      }
      sleep(1);
   }
   if (comm_rank == 0) {
      printf("MPI_Comm_connect() sucessful...\n");
   }

   MPI_Comm_disconnect(&intercomm);

   MPI_Finalize();

   return EXIT_SUCCESS;
}

Reply via email to