Hi OMPI_Developers,
It seems that I am unable to establish an MPI communication between two
independently started MPI programs using the simplest client/server call
sequence I can imagine (see the two attached files) when the client and server
process are started on different machines. Note that I have no problems when
the client and server program run on the same machine.
For example if I do the following on the server machine (running on fn1):
[audet@fn1 mpi]$ mpicc -Wall simpleserver.c -o simpleserver
[audet@fn1 mpi]$ mpiexec -n 1 ./simpleserver
Server port =
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
The server prints its port (created with MPI_Open_port()) and wait for a
connection by calling MPI_Comm_accept().
Now on the client machine (running on linux15) if I compile the client and run
it with the above port address on the command line, I get:
[audet@linux15 mpi]$ mpicc -Wall simpleclient.c -o simpleclient
[audet@linux15 mpi]$ mpiexec -n 1 ./simpleclient
'3054370816.0;tcp://172.17.15.20:54458+3054370817.0;tcp://172.17.15.20:58943:300'
trying to connect...
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: linux15
Remote host: linux15
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
[linux15:24193] [[13075,0],0]-[[46606,0],0] mca_oob_tcp_peer_send_handler:
invalid connection state (6) on socket 16
And then I have to stop the client program by pressing ^C (and also the server
which doesn't seems affected).
What's wrong ?
And I am almost sure there is no firewall running on linux15.
It is not the first MPI client/server application I am developing (with both
OpenMPI and mpich).
These simple MPI client/server programs work well with mpich (version 3.1.3).
This problem happens with both OpenMPI 1.8.3 and 1.8.6
linux15 and fn1 run both on Fedora Core 12 Linux (64 bits) and are connected by
a Gigabit Ethernet (the normal network).
And again if client and server run on the same machine (either fn1 or linux15)
no such problems happens.
Thanks in advance,
Martin Audet
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int comm_rank;
char port_name[MPI_MAX_PORT_NAME];
MPI_Comm intercomm;
int ok_flag;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
ok_flag = (comm_rank != 0) || (argc == 1);
MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (!ok_flag) {
if (comm_rank == 0) {
fprintf(stderr,"Usage: %s\n",argv[0]);
}
MPI_Abort(MPI_COMM_WORLD, 1);
}
MPI_Open_port(MPI_INFO_NULL, port_name);
if (comm_rank == 0) {
printf("Server port = '%s'\n", port_name);
}
MPI_Comm_accept(port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm);
MPI_Close_port(port_name);
if (comm_rank == 0) {
printf("MPI_Comm_accept() sucessful...\n");
}
MPI_Comm_disconnect(&intercomm);
MPI_Finalize();
return EXIT_SUCCESS;
}
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char **argv)
{
int comm_rank;
int ok_flag;
MPI_Comm intercomm;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &comm_rank);
ok_flag = (comm_rank != 0) || ((argc == 2) && argv[1] && (*argv[1] != '\0'));
MPI_Bcast(&ok_flag, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (!ok_flag) {
if (comm_rank == 0) {
fprintf(stderr,"Usage: %s mpi_port\n", argv[0]);
}
MPI_Abort(MPI_COMM_WORLD, 1);
}
if (comm_rank == 0) {
printf("trying to connect...\n");
}
while (MPI_Comm_connect((comm_rank == 0) ? argv[1] : 0, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &intercomm) != MPI_SUCCESS) {
if (comm_rank == 0) {
printf("MPI_Comm_connect() failled, sleeping and retrying...\n");
}
sleep(1);
}
if (comm_rank == 0) {
printf("MPI_Comm_connect() sucessful...\n");
}
MPI_Comm_disconnect(&intercomm);
MPI_Finalize();
return EXIT_SUCCESS;
}