I know that there was a bug in the F90 interface of spawn-multiple, however (which is fixed by now as far as I can tell). Could you send me the f77 example which you have? The concatination problem looks strange, I would like to have a look at it...

Thanks
Edgar

Brignone, Sergio wrote:

Thanks Edgar, Ralph and Jean.

It seems to me that the problem I am having is related to the operating
system or MPI configuration or compiler or all of them (I am using
Solaris).

For example, the F90 as well as the C++ interfaces could not be compiled
(I had to configure MPI without them).
I converted Jean's example to F77 and tested. It didn't work (off
course, you can always claim that I didn't convert them right ...); in
fact it seems I got errors in the Fortran to C conversion of strings
(the program fils1 exists but notice the error: it concatenates all
strings. This looks to me that the F to C conversion is not correct).
So I am assuming that the problems are related to my particular
environment.
I will debug and see what the problem is.

Thanks for your help.

Sergio Brignone



bash-2.03$ perem
 PR : rank =  0  size =  1
 PR : I am running on PE 0
PR : I am before the spawning of fils1 on PE 1 ------------------------------------------------------------------------
--
Could not execute the executable "./fils1 ./fils2 ./fils3 ./fils4 ": No
such file or directory

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed.

------------------------------------------------------------------------
--



-----Original Message-----
From: Jean Latour [mailto:lat...@fujitsu.fr] Sent: Friday, March 03, 2006 1:50 AM
To: r...@lanl.gov; Open MPI Users
Subject: Re: [OMPI users] Spawn and Disconnect

Just to add an example that may help  to this "disconnect" discussion :
Attached is the code of a test that does the following (and it works perfectly with OpenMPI 1.0.1)

 1) master spawns slave1
 2) master spawns slave2
 3) exechange messages between master and slaves over intercommunicator
 4) slave1 disconnects from master and finalize
 5) slave2 disconnects from master and finalize
(the processors used by slave 1 and slave 2 can now be re-used by new spawned processes)
 6) master spawns slave3, and then slave4
7) slave3 and slave4 have NO direct communicator, but they can create one through the Open-Port
mechanism and the MPI_Connect / MPI_Accept functions.
The port number is relayed through the master.
8) slave3 and slave4 create this direct communicator and do some pingpong over it
 9) slave3 and slave4 disconnect from each other on this direct
communicator
10) slave3 and slave4 disconnect from master an finalize
11) master finalize

Hope it helps
Best regards,
Jean Latour

Ralph Castain wrote:


We expect to have much better support for the entire comm_spawn process in the next incarnation of the RTE. I don't expect that to be included in a release, however, until 1.1 (Jeff may be able to give you an estimate for when that will happen).

Jeff et al may be able to give you access to an early non-release version sooner, if better comm_spawn support is a critical issue and you don't mind being patient with the inevitable bugs in such

versions.

Ralph


Edgar Gabriel wrote:


Open MPI currently does not fully support a proper disconnection of parent and child processes. Thus, if a child dies/aborts, the parents will abort as well, despite of calling MPI_Comm_disconnect. (The new

RTE
will have better support for these operations, Ralph/Jeff can probably


give a better estimate when this will be available.)

However, what should not happen is, that if the child calls

MPI_Finalize
(so not a violent death but a proper shutdown), the parent goes down

at
the same time. Let me check that as well...

Brignone, Sergio wrote:




Hi everybody,



I am trying to run a master/slave set.

Because of the nature of the problem I need to start and stop (kill) some slaves.

The problem is that as soon as one of the slave dies, the master dies

also.



This is what I am doing:



MASTER:



MPI_Init(...)



MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1);

MPI_Barrier(intercomm1);
MPI_Comm_disconnect(&intercomm1);
MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2);
MPI_Barrier(intercomm2);
MPI_Comm_disconnect(&intercomm2);
MPI_Finalize();
SLAVE:
MPI_Init(...)
MPI_Comm_get_parent(&intercomm);
(does something)
MPI_Barrier(intercomm);
MPI_Comm_disconnect(&intercomm);
MPI_Finalize();
The issue is that as soon as the first set of slaves calls

MPI_Finalize,
the master dies also (it dies right after

MPI_Comm_disconnect(&intercomm1) )

What am I doing wrong?
Thanks
Sergio

Reply via email to