Re: [OMPI users] Spawn and Disconnect

Michael Kluskens Wed, 26 Apr 2006 15:33:33 -0400

Correction on this, this problem only occurs (with OpenMPI 1.2) whenI don't use mpirun to launch my process.

I know seems strange to most mpi users, it turns out that when usingOpenMPI and only needing one process (because I spawn everything elseI need), I had found it quicker just to launch the executable directly.

I have only confirmed my test code works with OpenMPI 1.2 (if I havetrouble I'll test 1.1), below is the proper output for my test ofspawning, disconnecting, and respawning:


>mpirun -np 1 parent2
parent:  0  of  1
parent: How many processes total?
2
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
child starting
parent returned from Comm_Spawn call
parent: Calling MPI_BCAST with btest =  17 .  child =  3
child 0 of 1:  Parent 3
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
child 0 of 1:  Receiving   17 from parent
child calling COMM_FREE
child calling FINALIZE
child exiting
Maximum user memory allocated: 0
child starting
parent: Calling MPI_BCAST with btest =  17 .  child =  3
child 0 of 1:  Parent 3
child 0 of 1:  Receiving   17 from parent
child calling COMM_FREE
child calling FINALIZE

Michael

On Apr 25, 2006, at 2:57 PM, Michael Kluskens wrote:

I'm running OpenMPI 1.1 (v9704)and when a spawned processes exitsthe parent does not die (see previous discussions about1.0.1/1.0.2); however, the next time the parent tries to spawn aprocess MPI_Comm_spawn does not return.
My test output below:

 parent:  0  of  1
parent: How many processes total?
2
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
child starting
parent returned from Comm_Spawn call
parent: Calling MPI_BCAST with btest =  17 .  child =  3
child 0 of 1:  Parent 3
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
child 0 of 1:  Receiving   17 from parent
child calling COMM_FREE
child calling FINALIZE
child exiting
Notice there is no message saying "parent returned from Comm_Spawn"and the parent just sits there and obviously the second set ofprocesses don't get launched.
Quick note on code fixes, my child process now calls MPI_COMM_FREE(parent,ierr) to free the communicator to the parent beforeexiting, in earlier version of 1.1 this crashed the code. I'mguessing this is the right thing to do, the Complete Reference bookhas an example without it and the Using MPI-2 book has a moredetailed example with this in. In either case, I get the sameresults regardless.
Background from previous discussion on this follows. It will costme less to test new versions of Open MPI handling this than workaround this issue in my project.
Michael

On Mar 2, 2006, at 1:55 PM, Ralph Castain wrote:
We expect to have much better support for the entire comm_spawnprocess in the next incarnation of the RTE. I don't expect that tobe included in a release, however, until 1.1 (Jeff may be able togive you an estimate for when that will happen).
Jeff et al may be able to give you access to an early non-releaseversion sooner, if better comm_spawn support is a critical issueand you don't mind being patient with the inevitable bugs in suchversions.
Ralph


Edgar Gabriel wrote:
Open MPI currently does not fully support a proper disconnectionof parent and child processes. Thus, if a child dies/aborts, theparents will abort as well, despite of callingMPI_Comm_disconnect. (The new RTE will have better support forthese operations, Ralph/Jeff can probably give a better estimatewhen this will be available.) However, what should not happen is,that if the child calls MPI_Finalize (so not a violent death buta proper shutdown), the parent goes down at the same time. Let mecheck that as well... Brignone, Sergio wrote:
Hi everybody, I am trying to run a master/slave set. Because ofthe nature of the problem I need to start and stop (kill) someslaves. The problem is that as soon as one of the slave dies,the master dies also.
<child2.f90>
<parent2.f90>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Spawn and Disconnect

Reply via email to