Hello,

I am trying to replicate a simple client/server MPI application using 
MPI_Comm_accept and MPI_Comm_connect . Before version 5.0.x, I used the 
ompi-server command to allow the communication between the two processes, but I 
don't see this command anymore in the new 5.0.x release. Without running the 
ompi-server, I cannot publish anymore the port on which the server accepts 
connection; a minimal example below.

Moreover, even if I communicate the server port to the client in other ways 
(such as printing on a file), the two processes hang; in previous versions, I 
would get an error asking to run the ompi-server and communicate its address to 
the environment.

server.c


#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv ) {
    MPI_Comm client;
    char port_name[MPI_MAX_PORT_NAME];
    int size;
    MPI_Info info;

    MPI_Init( &argc, &argv );
    MPI_Comm_size(MPI_COMM_WORLD, &size);


    MPI_Open_port(MPI_INFO_NULL, port_name);
    printf("Server available at %s\n", port_name);

    MPI_Info_create(&info);

    MPI_Publish_name("name", info, port_name);

    printf("Wait for client connection\n");
    MPI_Comm_accept( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &client );
    printf("Client connected\n");

    MPI_Unpublish_name("name", MPI_INFO_NULL, port_name);
    MPI_Comm_free( &client );
    MPI_Close_port(port_name);
    MPI_Finalize();
    return 0;
}

client.c


#include <mpi.h>
#include <stdio.h>


int main(int argc, char **argv ) {
    MPI_Comm server;
    char port_name[MPI_MAX_PORT_NAME];

    MPI_Init( &argc, &argv );

    printf("Looking for server\n");
    MPI_Lookup_name( "name", MPI_INFO_NULL, port_name);
    printf("server found at %s\n", port_name);

    printf("Wait for server connection\n");
    MPI_Comm_connect( port_name, MPI_INFO_NULL, 0, MPI_COMM_WORLD,  &server );
    printf("Server connected\n");

    MPI_Comm_disconnect( &server );
    MPI_Finalize();
    return 0;
}

Error message due to the lack of a ompi-server where to publish the port name

[parallels-Parallels-Virtual-Platform:61301] 
mca_base_component_repository_open: unable to open mca_reachable_netlink: 
libopen-pal.so.40: cannot open shared object file: No such file or directory 
(ignored)
[parallels-Parallels-Virtual-Platform:61301] 
mca_base_component_repository_open: unable to open mca_btl_openib: 
libopen-pal.so.40: cannot open shared object file: No such file or directory 
(ignored)
Looking for server
[parallels-Parallels-Virtual-Platform:00000] *** An error occurred in 
MPI_Lookup_name
[parallels-Parallels-Virtual-Platform:00000] *** reported by process 
[611254273,0]
[parallels-Parallels-Virtual-Platform:00000] *** on communicator MPI_COMM_SELF
[parallels-Parallels-Virtual-Platform:00000] *** MPI_ERR_NAME: invalid name 
argument
[parallels-Parallels-Virtual-Platform:00000] *** MPI_ERRORS_ARE_FATAL 
(processes in this communicator will now abort,
[parallels-Parallels-Virtual-Platform:00000] ***    and MPI will try to 
terminate your MPI job as well)



Thank you in advance for any pointer or documentation I could use; for 
additional context, I'd like to use 5.0.0rc3 since it's the last version with 
ULFM, and version 4.0.3 with ULFM is broken due to an issue on host recognition 
with ompi-server (related github issue: 
https://github.com/open-mpi/ompi/issues/9396 )
[https://opengraph.githubassets.com/b4f6a3b86e93ad2b498ae3fe86821328c172e85ed1c2f343e0fda6fc4391fb07/open-mpi/ompi/issues/9396]<https://github.com/open-mpi/ompi/issues/9396>
client/server mechanism broken? · Issue #9396 · 
open-mpi/ompi<https://github.com/open-mpi/ompi/issues/9396>
Thank you for taking the time to submit an issue! Background information What 
version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and 
hash, etc.) v3.1.4 but have tried v4.1.0 ...
github.com


Luca Repetti

Reply via email to