Thanks Edgar, Ralph and Jean. It seems to me that the problem I am having is related to the operating system or MPI configuration or compiler or all of them (I am using Solaris).
For example, the F90 as well as the C++ interfaces could not be compiled (I had to configure MPI without them). I converted Jean's example to F77 and tested. It didn't work (off course, you can always claim that I didn't convert them right ...); in fact it seems I got errors in the Fortran to C conversion of strings (the program fils1 exists but notice the error: it concatenates all strings. This looks to me that the F to C conversion is not correct). So I am assuming that the problems are related to my particular environment. I will debug and see what the problem is. Thanks for your help. Sergio Brignone bash-2.03$ perem PR : rank = 0 size = 1 PR : I am running on PE 0 PR : I am before the spawning of fils1 on PE 1 ------------------------------------------------------------------------ -- Could not execute the executable "./fils1 ./fils2 ./fils3 ./fils4 ": No such file or directory This could mean that your PATH or executable name is wrong, or that you do not have the necessary permissions. Please ensure that the executable is able to be found and executed. ------------------------------------------------------------------------ -- -----Original Message----- From: Jean Latour [mailto:lat...@fujitsu.fr] Sent: Friday, March 03, 2006 1:50 AM To: r...@lanl.gov; Open MPI Users Subject: Re: [OMPI users] Spawn and Disconnect Just to add an example that may help to this "disconnect" discussion : Attached is the code of a test that does the following (and it works perfectly with OpenMPI 1.0.1) 1) master spawns slave1 2) master spawns slave2 3) exechange messages between master and slaves over intercommunicator 4) slave1 disconnects from master and finalize 5) slave2 disconnects from master and finalize (the processors used by slave 1 and slave 2 can now be re-used by new spawned processes) 6) master spawns slave3, and then slave4 7) slave3 and slave4 have NO direct communicator, but they can create one through the Open-Port mechanism and the MPI_Connect / MPI_Accept functions. The port number is relayed through the master. 8) slave3 and slave4 create this direct communicator and do some pingpong over it 9) slave3 and slave4 disconnect from each other on this direct communicator 10) slave3 and slave4 disconnect from master an finalize 11) master finalize Hope it helps Best regards, Jean Latour Ralph Castain wrote: > We expect to have much better support for the entire comm_spawn > process in the next incarnation of the RTE. I don't expect that to be > included in a release, however, until 1.1 (Jeff may be able to give > you an estimate for when that will happen). > > Jeff et al may be able to give you access to an early non-release > version sooner, if better comm_spawn support is a critical issue and > you don't mind being patient with the inevitable bugs in such versions. > > Ralph > > > Edgar Gabriel wrote: > >>Open MPI currently does not fully support a proper disconnection of >>parent and child processes. Thus, if a child dies/aborts, the parents >>will abort as well, despite of calling MPI_Comm_disconnect. (The new RTE >>will have better support for these operations, Ralph/Jeff can probably >>give a better estimate when this will be available.) >> >>However, what should not happen is, that if the child calls MPI_Finalize >>(so not a violent death but a proper shutdown), the parent goes down at >>the same time. Let me check that as well... >> >>Brignone, Sergio wrote: >> >> >> >>>Hi everybody, >>> >>> >>> >>>I am trying to run a master/slave set. >>> >>>Because of the nature of the problem I need to start and stop (kill) >>>some slaves. >>> >>>The problem is that as soon as one of the slave dies, the master dies also. >>> >>> >>> >>>This is what I am doing: >>> >>> >>> >>>MASTER: >>> >>> >>> >>>MPI_Init(...) >>> >>> >>> >>>MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1); >>> >>> >>> >>>MPI_Barrier(intercomm1); >>> >>> >>> >>>MPI_Comm_disconnect(&intercomm1); >>> >>> >>> >>>MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2); >>> >>> >>> >>>MPI_Barrier(intercomm2); >>> >>> >>> >>>MPI_Comm_disconnect(&intercomm2); >>> >>> >>> >>>MPI_Finalize(); >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>SLAVE: >>> >>> >>> >>>MPI_Init(...) >>> >>> >>> >>>MPI_Comm_get_parent(&intercomm); >>> >>> >>> >>>(does something) >>> >>> >>> >>>MPI_Barrier(intercomm); >>> >>> >>> >>>MPI_Comm_disconnect(&intercomm); >>> >>> >>> >>> MPI_Finalize(); >>> >>> >>> >>> >>> >>> >>> >>>The issue is that as soon as the first set of slaves calls MPI_Finalize, >>>the master dies also (it dies right after MPI_Comm_disconnect(&intercomm1) ) >>> >>> >>> >>> >>> >>>What am I doing wrong? >>> >>> >>> >>>Thanks >>> >>> >>> >>>Sergio >>> >>> >>> >>> >>> >>> >>>--------------------------------------------------------------------- --- >>> >>>_______________________________________________ >>>users mailing list >>>us...@open-mpi.org >>>http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >> >> >>_______________________________________________ >>users mailing list >>us...@open-mpi.org >>http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >----------------------------------------------------------------------- - > >_______________________________________________ >users mailing list >us...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/users >