Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
>> You sound like our vendors, "what is your app" > > ;-) I used to be one. > > Ideally OMPI should do the switch between MXM/RC/XRC internally in the > transport layer. Unfortunately, > we don't have such smart selection logic. Hopefully IB vendors will fix some > day. I actually looked i

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> > You sound like our vendors, "what is your app" ;-) I used to be one. Ideally OMPI should do the switch between MXM/RC/XRC internally in the transport layer. Unfortunately, we don't have such smart selection logic. Hopefully IB vendors will fix some day. > > Note most of our users run

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
>> Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states >> that MXM was used in the past only for >128 ranks, but is in 1.6 used for >> rank counts of any size. > > > This is reasonable threshold if you use openib btl with RC (default). Since > XRC provides better scal

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> > Our jobs range from 4 cores to 1000 cores, looking at the FAQ page it states > that MXM was used in the past only for >128 ranks, but is in 1.6 used for > rank counts of any size. This is reasonable threshold if you use openib btl with RC (default). Since XRC provides better scalability,

Re: [OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
On Jan 22, 2013, at 2:53 PM, Shamis, Pavel wrote: >> >> Switching to SRQ and some guess of queue values selected appears to let the >> code run. >> S,4096,128:S,12288,128:S,65536,12 >> >> Two questions, >> >> This is a ConnectX fabric, should I switch them to XRC queues? And should I >> use

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
No there would be no overlap. We run a large legacy condo, with several islands of Infiniband of different ages and types. Users run within their condo/ib island. So PSM users only run on PSM nodes they own, and there is no overlap. Our jobs range from 4 cores to 1000 cores, looking at the FAQ

[OMPI users] MPI_THREAD_FUNNELED and enable-mpi-thread-multiple

2013-01-22 Thread Roland Schulz
Hi, compiling 1.6.1 or 1.6.2 without enable-mpi-thread-multiple returns from MPI_Init_thread as provided level MPI_THREAD_SINGLE. Is enable-mpi-thread-multiple required even for MPI_THREAD_FUNNELED/MPI_THREAD_SERIALIZED? This question has been asked before: http://www.open-mpi.org/community/lists

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
Thanks a lot I will try it. Il giorno 22/gen/2013 21:49, "Ralph Castain" ha scritto: > Ouch - no, you'd have to take it from the developer's trunk, either via > svn checkout or the nightly developer's snapshot > > On Jan 22, 2013, at 12:35 PM, Ada Mancuso wrote: > > My problem is that I have to

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ralph Castain
Ouch - no, you'd have to take it from the developer's trunk, either via svn checkout or the nightly developer's snapshot On Jan 22, 2013, at 12:35 PM, Ada Mancuso wrote: > My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java > binding mpijava... Is it present in the late

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
My problem is that I have to use openmpi 1.7 rc5 because I'm using the Java binding mpijava... Is it present in the latest snapshot you told me? If so where can I find it? Thanks a lot Ada Il giorno 22/gen/2013 21:03, "Ralph Castain" ha scritto: > It seems to be working fine for me with the lates

Re: [OMPI users] help me understand these error msgs

2013-01-22 Thread Ralph Castain
I see - then the problem is that at least one node is unable to communicate via TCP back to where mpirun is executing. Might be a firewall, or could be that we are selecting the wrong network if multiple NICs are around. I assume that you use additional nodes when running against the larger data

Re: [OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ralph Castain
It seems to be working fine for me with the latest 1.7 tarball (not rc5 - I didn't test that one). Could be there was a problem that has since been fixed. We are getting ready to release an updated rc, so you might want to try it (or use the latest nightly 1.7 snapshot). On Jan 22, 2013, at 9:

Re: [OMPI users] MXM vs OpenIB

2013-01-22 Thread Shamis, Pavel
> We just learned about MXM, and given most our cards are Mellonox ConnectX > cards (though not all, have have islands of previous to ConnectX and Qlogic > supported in the same OpenMPI environment), > > Will MXM correctly fail though to PSM if on qlogic gear and fail though to > OpenIB if on p

Re: [OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Shamis, Pavel
> > Switching to SRQ and some guess of queue values selected appears to let the > code run. > S,4096,128:S,12288,128:S,65536,12 > > Two questions, > > This is a ConnectX fabric, should I switch them to XRC queues? And should I > use the same queue size/count? That a safe assumption? > X,4096

[OMPI users] OPENMPI_ORTE_LOG_ERROR

2013-01-22 Thread Ada Mancuso
Hi, I'm trying to run my mpi program using open mpi 1.7 rc5 on 4 machines using the command: mpirun -np4 -hostfile file a.out but i get the following message errors: ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file ../../../../../ompi/orte

Re: [OMPI users] help me understand these error msgs

2013-01-22 Thread Jure Pečar
On Thu, 17 Jan 2013 11:54:13 -0800 Ralph Castain wrote: > Or is this happening on startup of the larger job, or during a call to > MPI_Comm_spawn? This happens on a startup. Mpirun spawns processes and when they start talking to eachother during setup phase, I get this kind of error. Running t

Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi

2013-01-22 Thread Jeff Squyres (jsquyres)
On Jan 19, 2013, at 1:05 PM, Lee Eric wrote: > However, I hit another issue about fortran as configure running. > > *** Fortran 90/95 compiler > checking for armv6-rpi-linux-gnueabi-gfortran... > armv6-rpi-linux-gnueabi-gfortran > checking whether we are using the GNU Fortran compiler... yes > c

Re: [OMPI users] Help: OpenMPI Compilation in Raspberry Pi

2013-01-22 Thread Jeff Squyres (jsquyres)
Note that the original author of the ARM support chimed in on this on the devel list: http://www.open-mpi.org/community/lists/devel/2013/01/11955.php On Jan 21, 2013, at 6:50 AM, George Bosilca wrote: > Great, I pushed everything upstream: > - trunk (r27882) > - prepared a patch for the

[OMPI users] MXM vs OpenIB

2013-01-22 Thread Brock Palen
We just learned about MXM, and given most our cards are Mellonox ConnectX cards (though not all, have have islands of previous to ConnectX and Qlogic supported in the same OpenMPI environment), Will MXM correctly fail though to PSM if on qlogic gear and fail though to OpenIB if on previous to c

[OMPI users] XRC vs SRQ vs PRQ

2013-01-22 Thread Brock Palen
We hit a problem recently with memory errors when scaling a code to 1000 cores. Switching to SRQ and some guess of queue values selected appears to let the code run. S,4096,128:S,12288,128:S,65536,12 Two questions, This is a ConnectX fabric, should I switch them to XRC queues? And should I us

Re: [OMPI users] [EXTERNAL] Possible memory leak(s) in OpenMPI 1.6.3?

2013-01-22 Thread Victor Vysotskiy
Dear Brian, thank you very much for your assistance and for the bug fix. Regards, Victor.