Re: [OMPI users] meaning of MPI_THREAD_*

2010-12-14 Thread Jeff Squyres
On Dec 6, 2010, at 9:26 AM, Hicham Mouline wrote: > Thanks, it is now clarified that a call to MPI_INIT has the same effect as a > call to MPI_INIT_THREAD with > a required = MPI_THREAD_SINGLE. Perhaps it should be added here: > http://www.open-mpi.org/doc/v1.4/man3/MPI_Init_thread.3.php > as w

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-14 Thread Jeff Squyres
On Dec 10, 2010, at 11:00 AM, Prentice Bisbal wrote: >> Would it make sense to implement this as an MPI extension, and then >> perhaps propose something to the Forum for this purpose? > > I think that makes sense. As core and socket counts go up, I imagine the need > for this information will be

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Ralph Castain
It applies to both. In the rsh/ssh launcher, there is a limit on how many concurrent ssh/rsh sessions we have open at any one time. This is required due to OS limitations. As each daemon completes its launch, it "daemonizes" and closes the ssh/rsh session, thus enabling another daemon to be laun

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Gilbert Grosdidier
Bonjour Ralph, I wonder : is this plm_rsh_num_concurrent parameter standing ONLY for rsh use, or for ssh OR rsh, depending on plm_rsh_agent, please ? Thanks, Best, G. Le 14/12/2010 18:30, Ralph Castain a écrit : That's a big cluster to be starting with rsh! :-) When you say it won't s

Re: [OMPI users] Use of -mca pml csum

2010-12-14 Thread George Bosilca
IF the checksum on both peers doesn't match, your MPI call will return with an error. This is in addition of Open MPI printing a warning message on the output (which can be silenced with the right mca parameter). So, you're supposed to check the return values, and abort if something fishy is go

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread John Hearns
On 14 December 2010 17:32, Lydia Heck wrote: > > I have experimented a bit more and found that if I set > > OMPI_MCA_plm_rsh_num_concurrent=1024 > > a job with more than 2,500 processes will start and run. > > However when I searched the open-mpi web site for the the variable I could > not find an

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Lydia Heck
I have experimented a bit more and found that if I set OMPI_MCA_plm_rsh_num_concurrent=1024 a job with more than 2,500 processes will start and run. However when I searched the open-mpi web site for the the variable I could not find any indication. Best wishes, Lydia Heck 15. jobs with

Re: [OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Ralph Castain
That's a big cluster to be starting with rsh! :-) When you say it won't start, do you mean that it hangs? Or does it fail with some error message? How many nodes are involved (this is the important number, not the number of cores)? Also, what version are you using? On Dec 14, 2010, at 9:10 AM

Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread James Dinan
Hi Rolf, Thanks for your help. I also noticed trouble with subarray data types. I attached the same test again, but with subarray rather than indexed types. It works correctly with MVAPICH on IB, but fails with OpenMPI 1.5 with the following message: $ mpiexec -n 2 ./a.out MPI RMA Strided

Re: [OMPI users] MPI_Bcast vs. per worker MPI_Send?

2010-12-14 Thread Eugene Loh
David Mathog wrote: For the receive I do not see how to use a collective. Each worker sends back a data structure, and the structures are of of varying size. This is almost always the case in Bioinformatics, where what is usually coming back from each worker is a count M of the number of signi

Re: [OMPI users] MPI_Bcast vs. per worker MPI_Send?

2010-12-14 Thread David Mathog
So the 2/2 consensus is to use the collective. That is straightforward for the send part of this, since all workers are sent the same data. For the receive I do not see how to use a collective. Each worker sends back a data structure, and the structures are of of varying size. This is almost al

Re: [OMPI users] curious behavior during wait for broadcast: 100% cpu

2010-12-14 Thread Eugene Loh
David Mathog wrote: Is there a tool in openmpi that will reveal how much "spin time" the processes are using? I don't know what sort of answer is helpful for you, but I'll describe one option. With Oracle Message Passing Toolkit (formerly Sun ClusterTools, anyhow, an OMPI distribution from

[OMPI users] jobs with more that 2, 500 processes will not even start

2010-12-14 Thread Lydia Heck
About 9 months ago we had a new installation with a system of 1800 cores and at the time we found that jobs with more than 1028 cores would not start. At the time a colleague found that setting OMPI_MCA_plm_rsh_num_concurrent=256 help with the problem. We have now increased our processor co

Re: [OMPI users] One-sided datatype errors

2010-12-14 Thread Rolf vandeVaart
Hi James: I can reproduce the problem on a single node with Open MPI 1.5 and the trunk. I have submitted a ticket with the information. https://svn.open-mpi.org/trac/ompi/ticket/2656 Rolf On 12/13/10 18:44, James Dinan wrote: Hi, I'm getting strange behavior using datatypes in a one-sided

Re: [OMPI users] Spawning with the ompi-server option

2010-12-14 Thread Ralph Castain
Not sure I fully understand the question. If you provide the --ompi-server option to mpirun, that info will be passed along to all processes, including those launched via comm_spawn, so they can subsequently connect to the server. On Dec 14, 2010, at 6:50 AM, Suraj Prabhakaran wrote: > Hello

[OMPI users] Spawning with the ompi-server option

2010-12-14 Thread Suraj Prabhakaran
Hello, Is there anyway to spawn processes with the ompi-server option? I need the child processes to open and publish ports for which I require this option. Is there an alternative? Thanks, Suraj Prabhakaran

[OMPI users] Use of -mca pml csum

2010-12-14 Thread Gilbert Grosdidier
Bonjour, Since I'm very suspicious about the condition of the IB network on my cluster, I'm trying to use the csum pml feature of OMPI (1.4.3). But I have a question: what happens if the Checksum is different on both ends ? Is there a warning printed, a flag set by the MPI_(I)recv or equ