Dick / all --

I just had a phone call with Ralph Castain who has had some additional off-list 
mails with Randolph.  Apparently, none of us understand the model that is being 
used here.  There are also apparently some confidentiality issues involved such 
that it might be difficult to publicly state enough information to allow the 
open community to understand, diagnose, and fix the issue.  So I'm not quite 
sure how to proceed here -- I'm afraid that I don't have the time or resources 
for private problem resolution in an unorthodox situation like this.

For example, I was under the impression that PVM was solely being used as a 
launcher.  This is apparently not the case -- the original code is a PVM job 
that has been modified to eventually call MPI_INIT.  I don't know how much more 
I can say on this open list.

Hence, I'm throughly confused as to the model that is being used at this point. 
 I don't think I can offer any further help unless a small [non-PVM] example is 
provided to the community that can show the problem.

I also asked a bunch of questions in a prior post that would be helpful to have 
answered before going further.

Sorry!  :-(



On Aug 12, 2010, at 9:32 AM, Richard Treumann wrote:

> 
> You said  "separate MPI  applications doing 1 to > N broadcasts over PVM".  
> You do not mean you are using pvm_bcast though - right? 
> 
> If these N MPI applications are so independent that you could run one at a 
> time or run them on N different clusters and still get the result you want 
> (not the time to solution) then I cannot imagine how there could be cross 
> talk.   
> 
> I have been assuming that when you describe this as an NxN problem, you mean 
> there is some desired interaction among the N MPI worlds.   
> 
> If I have misunderstood and the N MPI worlds stared with N mpirun operations 
> under PVM are each semantically independent of the other (N-1) then I am 
> totally at a loss for an explanation. 
> 
>   
> Dick Treumann  -  MPI Team           
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846         Fax (845) 433-8363
> 
> 
> users-boun...@open-mpi.org wrote on 08/11/2010 08:59:16 PM:
> 
> > [image removed] 
> > 
> > Re: [OMPI users] MPI_Bcast issue 
> > 
> > Randolph Pullen 
> > 
> > to: 
> > 
> > Open MPI Users 
> > 
> > 08/11/2010 09:01 PM 
> > 
> > Sent by: 
> > 
> > users-boun...@open-mpi.org 
> > 
> > Please respond to Open MPI Users 
> > 
> > I (a single user) am running N separate MPI  applications doing 1 to
> > N broadcasts over PVM, each MPI application is started on each 
> > machine simultaneously by PVM - the reasons are back in the post history.
> > 
> > The problem is that they somehow collide - yes I know this should 
> > not happen, the question is why.
> > 
> > --- On Wed, 11/8/10, Richard Treumann <treum...@us.ibm.com> wrote: 
> > 
> > From: Richard Treumann <treum...@us.ibm.com>
> > Subject: Re: [OMPI users] MPI_Bcast issue
> > To: "Open MPI Users" <us...@open-mpi.org>
> > Received: Wednesday, 11 August, 2010, 11:34 PM
> 
> > 
> > Randolf 
> > 
> > I am confused about using multiple, concurrent mpirun operations.  
> > If there are M uses of mpirun and each starts N tasks (carried out 
> > under pvm or any other way) I would expect you to have M completely 
> > independent MPI jobs with N tasks (processes) each.  You could have 
> > some root in each of the M MPI jobs do an MPI_Bcast to the other 
> > N-1) in that job but there is no way in MPI (without using 
> > accept.connect) to get tasks of job 0 to give data to tasks of jobs 
> > 1-(m-1). 
> > 
> > With M uses of mpirun, you have M worlds that are forever isolated 
> > from the other M-1 worlds (again, unless you do accept/connect) 
> > 
> > In what sense are you treating this as an single MxN application?   
> > ( I use M & N to keep them distinct. I assume if M == N, we have your case) 
> > 
> > 
> > Dick Treumann  -  MPI Team           
> > IBM Systems & Technology Group
> > Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> > Tele (845) 433-7846         Fax (845) 433-8363 
> > 
> > -----Inline Attachment Follows-----
> 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users 
> > 
> > 
> >  _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to