Thanks Edgar, Ralph and Jean.

It seems to me that the problem I am having is related to the operating
system or MPI configuration or compiler or all of them (I am using
Solaris).

For example, the F90 as well as the C++ interfaces could not be compiled
(I had to configure MPI without them). 

I converted Jean's example to F77 and tested. It didn't work (off
course, you can always claim that I didn't convert them right ...); in
fact it seems I got errors in the Fortran to C conversion of strings
(the program fils1 exists but notice the error: it concatenates all
strings. This looks to me that the F to C conversion is not correct).
So I am assuming that the problems are related to my particular
environment. 

I will debug and see what the problem is.

Thanks for your help.

Sergio Brignone



bash-2.03$ perem
 PR : rank =  0  size =  1
 PR : I am running on PE 0
 PR : I am before the spawning of fils1 on PE 1 
------------------------------------------------------------------------
--
Could not execute the executable "./fils1 ./fils2 ./fils3 ./fils4 ": No
such file or directory

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed.

------------------------------------------------------------------------
--



-----Original Message-----
From: Jean Latour [mailto:lat...@fujitsu.fr] 
Sent: Friday, March 03, 2006 1:50 AM
To: r...@lanl.gov; Open MPI Users
Subject: Re: [OMPI users] Spawn and Disconnect

Just to add an example that may help  to this "disconnect" discussion :
Attached is the code of a test that does the following (and it works 
perfectly with OpenMPI 1.0.1)

 1) master spawns slave1
 2) master spawns slave2
 3) exechange messages between master and slaves over intercommunicator
 4) slave1 disconnects from master and finalize
 5) slave2 disconnects from master and finalize
(the processors used by slave 1 and slave 2 can now be re-used by new 
spawned processes)
 6) master spawns slave3, and then slave4
 7) slave3 and slave4 have NO direct communicator, but they can create 
one through the Open-Port
mechanism and the MPI_Connect / MPI_Accept functions.
The port number is relayed through the master.
 8) slave3 and slave4 create this direct communicator and do some 
pingpong over it
 9) slave3 and slave4 disconnect from each other on this direct
communicator
10) slave3 and slave4 disconnect from master an finalize
11) master finalize

Hope it helps
Best regards,
Jean Latour

Ralph Castain wrote:

> We expect to have much better support for the entire comm_spawn 
> process in the next incarnation of the RTE. I don't expect that to be 
> included in a release, however, until 1.1 (Jeff may be able to give 
> you an estimate for when that will happen).
>
> Jeff et al may be able to give you access to an early non-release 
> version sooner, if better comm_spawn support is a critical issue and 
> you don't mind being patient with the inevitable bugs in such
versions.
>
> Ralph
>
>
> Edgar Gabriel wrote:
>
>>Open MPI currently does not fully support a proper disconnection of 
>>parent and child processes. Thus, if a child dies/aborts, the parents 
>>will abort as well, despite of calling MPI_Comm_disconnect. (The new
RTE 
>>will have better support for these operations, Ralph/Jeff can probably

>>give a better estimate when this will be available.)
>>
>>However, what should not happen is, that if the child calls
MPI_Finalize 
>>(so not a violent death but a proper shutdown), the parent goes down
at 
>>the same time. Let me check that as well...
>>
>>Brignone, Sergio wrote:
>>
>>  
>>
>>>Hi everybody,
>>>
>>> 
>>>
>>>I am trying to run a master/slave set.
>>>
>>>Because of the nature of the problem I need to start and stop (kill) 
>>>some slaves.
>>>
>>>The problem is that as soon as one of the slave dies, the master dies
also.
>>>
>>> 
>>>
>>>This is what I am doing:
>>>
>>> 
>>>
>>>MASTER:
>>>
>>> 
>>>
>>>MPI_Init(...)
>>>
>>> 
>>>
>>>MPI_Comm_spawn(slave1,...,nslave1,...,intercomm1);
>>>
>>> 
>>>
>>>MPI_Barrier(intercomm1);
>>>
>>> 
>>>
>>>MPI_Comm_disconnect(&intercomm1);
>>>
>>> 
>>>
>>>MPI_Comm_spawn(slave2,...,nslave2,...,intercomm2);
>>>
>>> 
>>>
>>>MPI_Barrier(intercomm2);
>>>
>>> 
>>>
>>>MPI_Comm_disconnect(&intercomm2);
>>>
>>> 
>>>
>>>MPI_Finalize();
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>>SLAVE:
>>>
>>> 
>>>
>>>MPI_Init(...)
>>>
>>> 
>>>
>>>MPI_Comm_get_parent(&intercomm);
>>>
>>> 
>>>
>>>(does something)
>>>
>>> 
>>>
>>>MPI_Barrier(intercomm);
>>>
>>> 
>>>
>>>MPI_Comm_disconnect(&intercomm);
>>>
>>> 
>>>
>>> MPI_Finalize();
>>>
>>> 
>>>
>>> 
>>>
>>> 
>>>
>>>The issue is that as soon as the first set of slaves calls
MPI_Finalize, 
>>>the master dies also (it dies right after
MPI_Comm_disconnect(&intercomm1) )
>>>
>>> 
>>>
>>> 
>>>
>>>What am I doing wrong?
>>>
>>> 
>>>
>>>Thanks
>>>
>>> 
>>>
>>>Sergio
>>>
>>> 
>>>
>>> 
>>>
>>>
>>>---------------------------------------------------------------------
---
>>>
>>>_______________________________________________
>>>users mailing list
>>>us...@open-mpi.org
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>    
>>>
>>
>>
>>_______________________________________________
>>users mailing list
>>us...@open-mpi.org
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>  
>>
>-----------------------------------------------------------------------
-
>
>_______________________________________________
>users mailing list
>us...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Reply via email to