Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Ralph Castain
I’m wondering if it is also possible that the error message is simply printing that ID incorrectly. Looking at the code, it appears that we do perform the network byte translation correctly when we setup the data for transmission between the processes. However, I don’t see that translation being

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Jeff Squyres (jsquyres)
This is not a heterogeneous run-time issue -- it's the issue that Nathan cited: that OMPI detected different receive queue setups on different machines. As the error message states; the openib BTL simply cannot handle when different MPI processes specific different receive queue specifications.

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
On 6/1/2015 9:53 AM, Ralph Castain wrote: Well, I checked and it looks to me like —hetero-apps is a stale option in the master at least - I don’t see where it gets used. Looking at the code, I would suspect that something didn’t get configured correctly - either the —enable-heterogeneous flag

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
On 6/1/2015 2:45 PM, Nathan Hjelm wrote: It looks to me like the default queue pair settings are different on the different systems. You can try setting the mca_btl_openib_receive_queues variable by hand. If this is infiniband I recommend not using any per-peer queue pairs and use something like:

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
On 6/1/2015 9:40 AM, Ralph Castain wrote: Just to check the obvious: I assume that the /usr/mpi directory is not network mounted, and both application and OMPI code are appropriately compiled on each side? Yes. There is another mpirun flag —hetero-apps that you may need to provide. It has

Re: [OMPI users] new hwloc error

2015-06-01 Thread Noam Bernstein
> On Jun 1, 2015, at 5:09 PM, Ralph Castain wrote: > > This probably isn’t very helpful, but fwiw: we added an automatic > “fingerprint” capability in the later OMPI versions just to detect things > like this. If the fingerprint of a backend node doesn’t match the head node, > we automatically

Re: [OMPI users] new hwloc error

2015-06-01 Thread Ralph Castain
This probably isn’t very helpful, but fwiw: we added an automatic “fingerprint” capability in the later OMPI versions just to detect things like this. If the fingerprint of a backend node doesn’t match the head node, we automatically assume hetero-nodes. It isn’t foolproof, but it would have pic

Re: [OMPI users] new hwloc error

2015-06-01 Thread Noam Bernstein
> On Apr 30, 2015, at 1:16 PM, Noam Bernstein > wrote: > >> On Apr 30, 2015, at 12:03 PM, Ralph Castain wrote: >> >> The planning is pretty simple: at startup, mpirun launches a daemon on each >> node. If —hetero-nodes is provided, each daemon returns the topology >> discovered by hwloc - ot

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Nathan Hjelm
It looks to me like the default queue pair settings are different on the different systems. You can try setting the mca_btl_openib_receive_queues variable by hand. If this is infiniband I recommend not using any per-peer queue pairs and use something like: S,2048,1024,1008,64:S,12288,1024,1008,64

Re: [OMPI users] Error building openmpi-v1.8.5-40-g7b9e672

2015-06-01 Thread Ralph Castain
how was this configured? We aren’t seeing this problem elsewhere. > On Jun 1, 2015, at 4:06 AM, Siegmar Gross > wrote: > > Hi, > > today I tried to build openmpi-v1.8.5-40-g7b9e672 on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 > x86_64) with gcc-4.9.2 and Sun

Re: [OMPI users] gcc: Error building openmpi-v1.10-dev-41-g57faa88

2015-06-01 Thread Nathan Hjelm
https://github.com/open-mpi/ompi-release/pull/299 On Mon, Jun 01, 2015 at 01:06:43PM +0200, Siegmar Gross wrote: > Hi, > > today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 > x86_64) with gcc-4.9.2 and Sun C 5.13 a

Re: [OMPI users] gcc: Error building openmpi-v1.10-dev-41-g57faa88

2015-06-01 Thread Nathan Hjelm
Hmm, a master-ism that made it into 1.10. Wasn't caught by Jenkins. Will fix now. -Nathan On Mon, Jun 01, 2015 at 01:06:43PM +0200, Siegmar Gross wrote: > Hi, > > today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines > (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1

Re: [OMPI users] Memory usage for MPI program

2015-06-01 Thread Nathan Hjelm
Just to be sure. How are you measuring the memory usage? If you are using /proc/meminfo are you subracting out the Cached memory usage? -Nathan On Mon, Jun 01, 2015 at 04:54:45AM -0400, Manoj Vaghela wrote: >Hi OpenMPI users, > >I have been using OpenMPI for quite a few years now. Recen

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Ralph Castain
Well, I checked and it looks to me like —hetero-apps is a stale option in the master at least - I don’t see where it gets used. Looking at the code, I would suspect that something didn’t get configured correctly - either the —enable-heterogeneous flag didn’t get set on one side, or we incorrect

[OMPI users] Bug report: Message queues debugging not working

2015-06-01 Thread Alejandro
Dear OpenMPI users/developers, We are experiencing a problem when debugging the message queues: Summary: Message queues debugging broken on recent OpenMPI versions. Affected OpenMPI versions: 1.8.3, 1.8.4 and 1.8.5 (at least). The debug message queue library is not returning any pending message

Re: [OMPI users] Building OpenMPI on Raspberry Pi 2

2015-06-01 Thread Jeff Squyres (jsquyres)
On May 30, 2015, at 9:42 AM, Jeff Layton wrote: > > The error happens during the configure step before compiling. Hmm -- I'm confused. You show output from "make" in your previous mails...? > However, I ran the make command as you indicated and I'm > attaching the output to this email. Ok, th

Re: [OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Ralph Castain
Just to check the obvious: I assume that the /usr/mpi directory is not network mounted, and both application and OMPI code are appropriately compiled on each side? There is another mpirun flag —hetero-apps that you may need to provide. It has been so long since someone tried this that I’d have

[OMPI users] problem starting a ompi job in a mix BE/LE cluster

2015-06-01 Thread Steve Wise
Hello, I'm seeing an error trying to run a simple OMPI job on a 2 node cluster where one node is a PPC64 BE byte order and the other is a X86_64 LE byte order node. OMPI 1.8.4 is configured with --enable-heterogeneous: ./configure --with-openib=/usr CC=gcc CXX=g++ F77=gfortran FC=gfortran --e

[OMPI users] Error building openmpi-v1.8.5-40-g7b9e672

2015-06-01 Thread Siegmar Gross
Hi, today I tried to build openmpi-v1.8.5-40-g7b9e672 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the same error on all three platforms with both compilers. ... make[2]: Entering directory `/export2/src/openmpi-1.8.

[OMPI users] cc: Error building openmpi-v1.10-dev-41-g57faa88

2015-06-01 Thread Siegmar Gross
Hi, today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the following error on all platforms with cc. ... make[2]: Entering directory `/export2/src/openmpi-1.10.0/openmpi

[OMPI users] gcc: Error building openmpi-v1.10-dev-41-g57faa88

2015-06-01 Thread Siegmar Gross
Hi, today I tried to build openmpi-v1.10-dev-41-g57faa88 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with gcc-4.9.2 and Sun C 5.13 and I got the following error on all platforms with gcc. ... make[2]: Entering directory `/export2/src/openmpi-1.10.0/openmp

[OMPI users] Memory usage for MPI program

2015-06-01 Thread Manoj Vaghela
Hi OpenMPI users, I have been using OpenMPI for quite a few years now. Recently I figured out some memory related issues which are quite bothering me. I have OpenMPI 1.8.3 version installed on different machines. All machines are SMPs and linux x86_64. The Machine one and one-1 are installed with