> -----Original Message----- > From: Prakash Velayutham [mailto:prakash.velayut...@cchmc.org] > Sent: Saturday, April 08, 2006 2:45 PM > To: Jeff Squyres (jsquyres); us...@open-mpi.org > Subject: Re: [OMPI users] Open MPI and Torque error > > >>> jsquy...@cisco.com 04/08/06 7:10 AM >>> > I am also curious as to why this would not work -- I was not under the > impression that tm_init() would fail from a non > mother-superior node...? > > What others say is that it will fail this way inside a Open MPI job as > Open MPI's RTE is taking the only TM connection available. But the
Note that Open RTE does not hold a TM connection open because of the one-TM-connection-per-MOM restriction (which was only recently alleviated with Garrick's patch). Open RTE's TM support opens a TM connection, does its thing, and then closes the connection. > strange thing is that it works from Mother Superior without Garrick's > patch (actually, regardless of the patch, the behaviour is > the same, but > I have not rigorously tested the patch in itself, so cannot comment > about that), which I think should have failed according to the above > contention. Based on my explanation above, the behavior you have observed makes sense. > FWIW: It has been our experience with both Torque and the various > flavors of PBS that you can repeatedly call tm_init() and > tm_finalize() > within a single process, so I would be surprised if that was > the issue. > Indeed, I'd have to double check, but I'm pretty sure that our MPI > processes do not call tm_init() (I believe that only mpirun does). > But I am running my code using mpirun, so is this expected > behaviour? I > am attaching my simple code below: Yes. What I am saying is that only Open MPI's mpirun invokes tm_init() -- the MPI processes do not invoke tm_init(). Hence, there is no possibility of a TM connection contention from the MPI processes. Even if you launch an MPI process on the same node as mpirun, there are synchronization points that guarantee that MPI_INIT will not complete until the TM connections from mpirun have completed and been tm_finalized(). This is why I, too, am curious as to why your tm_init() is failing. You might have to dive a bit deeper in the TM library to figure it out. :-\ -- Jeff Squyres Server Virtualization Business Unit Cisco Systems