[OMPI users] Binding width affects allgatherv performance?

2015-07-01 Thread Saliya Ekanayake
Hi, I am getting strange performance results for allgatherv operation for the same number of procs and data, but with varying binding width. For example here are two cases with about 180x difference in performance. Each machine has 4 sockets each with 6 cores totaling 24 cores per node (topology

Re: [OMPI users] IB to some nodes but TCP for others

2015-07-01 Thread Tim Miller
Hi All, Sorry for the late reply on this. I've been digging through the OpenMPI FAQ. I've never explicitly set the subnet IDs for my IB subnets, so I suspect I'm using the factory defaults. Probably, if I change this, it will "just work". I'll see if the end user is still interested in testing

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

2015-07-01 Thread Rolf vandeVaart
Hi Stefan (and Steven who reported this earlier with CUDA-aware program) I have managed to observed the leak when running LAMMPS as well. Note that this has nothing to do with CUDA-aware features. I am going to move this discussion to the Open MPI developer’s list to dig deeper into this

Re: [OMPI users] Running 1 proc per socket but no more

2015-07-01 Thread Saliya Ekanayake
I tried this, but I get an error, --- An invalid value was given for the number of processes per resource (ppr) to be mapped on each node: PPR: 12:node,span The specification must be a comma-separated list containing combinations of number, followed by a colon, followed by the resource

Re: [OMPI users] Running 1 proc per socket but no more

2015-07-01 Thread Saliya Ekanayake
Thank you Ralph Saliya On Wed, Jul 1, 2015 at 4:01 PM, Ralph Castain wrote: > Scenario 2: --map-by ppr:12:node,span --bind-to core > > will put 12 procs on each node, load balanced across the sockets, each > proc bound to 1 core > > HTH > Ralph > > > On Wed, Jul 1, 2015 at

Re: [OMPI users] Running 1 proc per socket but no more

2015-07-01 Thread Ralph Castain
Scenario 2: --map-by ppr:12:node,span --bind-to core will put 12 procs on each node, load balanced across the sockets, each proc bound to 1 core HTH Ralph On Wed, Jul 1, 2015 at 2:42 PM, Saliya Ekanayake wrote: > Hi, > > I am doing some benchmarks and would like to test

[OMPI users] Running 1 proc per socket but no more

2015-07-01 Thread Saliya Ekanayake
Hi, I am doing some benchmarks and would like to test the following two scenarios. Each machine has 4 sockets each with 6 cores (lstopo image attached). Scenario 1 --- Run 12 procs per node each bound to 2 cores. I can do this by --map-by socket:PE=2 Scenario 2 Run 12 procs per node each bound

Re: [OMPI users] 1.8.6 w/ CUDA 7.0 & GDR Huge Memory Leak

2015-07-01 Thread Stefan Paquay
Hi all, Hopefully this mail gets posted in the right thread... I have noticed the (I guess same) leak using OpenMPI 1.8.6 with LAMMPS, a molecular dynamics program, without any use of CUDA. I am not that familiar with how the internal memory management of LAMMPS works, but it does not appear

Re: [OMPI users] Allgather Implementation Details

2015-07-01 Thread George Bosilca
Use --mca to pass the options directly through the mpirun. George. On Wed, Jul 1, 2015 at 9:14 AM, Saliya Ekanayake wrote: > Thank you George. This is very informative. > > Is it possible to pass the option in runtime rather setting up in the > config file? > > Thank you

Re: [OMPI users] Allgather Implementation Details

2015-07-01 Thread Saliya Ekanayake
Thank you George. This is very informative. Is it possible to pass the option in runtime rather setting up in the config file? Thank you Saliya On Tue, Jun 30, 2015 at 7:20 PM, George Bosilca wrote: > Saliya, > > On Tue, Jun 30, 2015 at 10:50 AM, Saliya Ekanayake