Re: [OMPI users] new hwloc error

2015-04-29 Thread Ralph Castain
Try adding —hetero-nodes to the cmd line and see if that helps resolve the problem. Of course, if all the machines are identical, then it won’t > On Apr 29, 2015, at 1:43 PM, Brice Goglin wrote: > > Le 29/04/2015 22:25, Noam Bernstein a écrit : >>> On Apr 29, 2015, at

Re: [OMPI users] new hwloc error

2015-04-29 Thread Brice Goglin
Le 29/04/2015 22:25, Noam Bernstein a écrit : >> On Apr 29, 2015, at 4:09 PM, Brice Goglin wrote: >> >> Nothing wrong in that XML. I don't see what could be happening besides a >> node rebooting with hyper-threading enabled for random reasons. >> Please run "lstopo foo.xml"

Re: [OMPI users] new hwloc error

2015-04-29 Thread Noam Bernstein
> On Apr 29, 2015, at 4:09 PM, Brice Goglin wrote: > > Nothing wrong in that XML. I don't see what could be happening besides a > node rebooting with hyper-threading enabled for random reasons. > Please run "lstopo foo.xml" again on the node next time you get the OMPI >

Re: [OMPI users] new hwloc error

2015-04-29 Thread Brice Goglin
Le 29/04/2015 18:55, Noam Bernstein a écrit : >> On Apr 29, 2015, at 12:47 PM, Brice Goglin wrote: >> >> Thanks. It's indeed normal that OMPI failed to bind to cpuset 0,16 since >> 16 doesn't exist at all. >> Can you run "lstopo foo.xml" on one node where it failed, and

Re: [OMPI users] getting OpenMPI 1.8.4 w/ CUDA to look for absolute path to libcuda.so.1

2015-04-29 Thread Lev Givon
Received from Rolf vandeVaart on Wed, Apr 29, 2015 at 11:14:15AM EDT: > > >-Original Message- > >From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lev Givon > >Sent: Wednesday, April 29, 2015 10:54 AM > >To: us...@open-mpi.org > >Subject: [OMPI users] getting OpenMPI 1.8.4 w/

Re: [OMPI users] new hwloc error

2015-04-29 Thread Noam Bernstein
> On Apr 29, 2015, at 12:47 PM, Brice Goglin wrote: > > Thanks. It's indeed normal that OMPI failed to bind to cpuset 0,16 since > 16 doesn't exist at all. > Can you run "lstopo foo.xml" on one node where it failed, and send the > foo.xml that got generated? Just want to

Re: [OMPI users] new hwloc error

2015-04-29 Thread Brice Goglin
Le 29/04/2015 14:53, Noam Bernstein a écrit : > They’re dual 8-core processor, so the 16 cores are physical ones. lstopo > output looks identical on nodes where this does happen, and nodes where it > never does. My next step is to see if I can reproduce the behavior at will - > I’m still

Re: [OMPI users] getting OpenMPI 1.8.4 w/ CUDA to look for absolute path to libcuda.so.1

2015-04-29 Thread Rolf vandeVaart
Hi Lev: Any chance you can try Open MPI 1.8.5rc3 and see if you see the same behavior? That code has changed a bit from the 1.8.4 series and I am curious if you will still see the same issue. http://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.5rc3.tar.gz Thanks, Rolf

[OMPI users] getting OpenMPI 1.8.4 w/ CUDA to look for absolute path to libcuda.so.1

2015-04-29 Thread Lev Givon
I'm trying to build/package OpenMPI 1.8.4 with CUDA support enabled on Linux x86_64 so that the compiled software can be downloaded/installed as one of the dependencies of a project I'm working on with no further user configuration. I noticed that MPI programs built with the above will try to

Re: [OMPI users] new hwloc error

2015-04-29 Thread Noam Bernstein
> On Apr 28, 2015, at 4:54 PM, Brice Goglin wrote: > > Hello, > Can you build hwloc and run lstopo on these nodes to check that everything > looks similar? > You have hyperthreading enabled on all nodes, and you're trying to bind > processes to entire cores, right? >