Hello,I'm trying to connect two independent MPI process groups with an intercommunicator, using ports, as described in sec. 10.4 of the MPI standard. One group runs a server, the other a client. The server opens a port, publishes the port's name, and waits for a connection. The client obtains the port's name, and connects to it. The problem is, the code works if both the server and the client are run in a one-process MPI group each. If any of the MPI groups has more than one process, the program hangs.
The following are two fragments of a minimal code example reproducing the problem on my machine. The server:
if (rank == 0) { MPI_Open_port(MPI_INFO_NULL, port); int fifo = open(argv[1], O_WRONLY); write(fifo, port, MPI_MAX_PORT_NAME); close(fifo); printf("[server] listening on port '%s'\n", port); MPI_Comm_accept(port, MPI_INFO_NULL, 0, this, &that); printf("[server] connected\n"); MPI_Close_port(port); } MPI_Barrier(this); and the client: if (rank == 0) { int fifo = open(buffer, O_RDONLY); read(fifo, port, MPI_MAX_PORT_NAME); close(fifo); printf("[client] connecting to port '%s'\n", port); MPI_Comm_connect(port, MPI_INFO_NULL, 0, this, &that); printf("[client] connected\n"); } MPI_Barrier(this);where 'this' is the local MPI_COMM_WORLD, and the port name is transmitted via a named pipe. (Complete code together with a makefile is attached for reference.)
When the compiled codes are run on one MPI process each: mkfifo port mpirun -np 1 ./server port & mpirun -np 1 ./client portthe connection is established as expected. With more than one process on either side, however, the execution blocks at the connect-accept step (i.e., after the 'listening' and 'connecting' messages are printed, but before the 'connected' messages are); using the attached code,
make NS=2 run or make NC=2 run should reproduce the problem.I'm using OpenMPI on two different machines: 1.4 on a 2-core laptop, and 1.3.3 on a large supercomputer, having the same problem on both. Where do I go wrong?
One more, related question: once I manage to establish an intercommunicator for two multi-process MPI groups, can any process in one group send a message to any process in the other, directly, or does the communication have to go through the root nodes?
Regards, Wacek
rendezvous.tgz
Description: application/compressed-tar