Just to add an example that may help to this "disconnect" discussion :Attached is the code of a test that does the following (and it works perfectly with OpenMPI 1.0.1)
1) master spawns slave1 2) master spawns slave2 3) exechange messages between master and slaves over intercommunicator 4) slave1 disconnects from master and finalize 5) slave2 disconnects from master and finalize(the processors used by slave 1 and slave 2 can now be re-used by new spawned processes)
6) master spawns slave3, and then slave47) slave3 and slave4 have NO direct communicator, but they can create one through the Open-Port
mechanism and the MPI_Connect / MPI_Accept functions. The port number is relayed through the master.8) slave3 and slave4 create this direct communicator and do some pingpong over it
9) slave3 and slave4 disconnect from each other on this direct communicator 10) slave3 and slave4 disconnect from master an finalize 11) master finalize Hope it helps Best regards, Jean Latour Ralph Castain wrote:
We expect to have much better support for the entire comm_spawn process in the next incarnation of the RTE. I don't expect that to be included in a release, however, until 1.1 (Jeff may be able to give you an estimate for when that will happen).Jeff et al may be able to give you access to an early non-release version sooner, if better comm_spawn support is a critical issue and you don't mind being patient with the inevitable bugs in such versions.Ralph Edgar Gabriel wrote:Open MPI currently does not fully support a proper disconnection of parent and child processes. Thus, if a child dies/aborts, the parents will abort as well, despite of calling MPI_Comm_disconnect. (The new RTE will have better support for these operations, Ralph/Jeff can probably give a better estimate when this will be available.)However, what should not happen is, that if the child calls MPI_Finalize (so not a violent death but a proper shutdown), the parent goes down at the same time. Let me check that as well...Brignone, Sergio wrote:Hi everybody, I am trying to run a master/slave set.Because of the nature of the problem I need to start and stop (kill) some slaves.The problem is that as soon as one of the slave dies, the master dies also. This is what I am doing: MASTER: MPI_Init(...) MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1); MPI_Barrier(intercomm1); MPI_Comm_disconnect(&intercomm1); MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2); MPI_Barrier(intercomm2); MPI_Comm_disconnect(&intercomm2); MPI_Finalize(); SLAVE: MPI_Init(...) MPI_Comm_get_parent(&intercomm); (does something) MPI_Barrier(intercomm); MPI_Comm_disconnect(&intercomm); MPI_Finalize();The issue is that as soon as the first set of slaves calls MPI_Finalize, the master dies also (it dies right after MPI_Comm_disconnect(&intercomm1) )What am I doing wrong? Thanks Sergio ------------------------------------------------------------------------ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users------------------------------------------------------------------------ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
spawn+connect.tar.gz
Description: Binary data
<<attachment: latour.vcf>>