Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Jeff Squyres
What Brice is saying is that hwloc is saying that your cores all have individual caches -- they're not shared. Have a look at a graphical hwloc output to see: lstopo mymachine.png On Nov 7, 2012, at 7:17 PM, Brice Goglin wrote: > What processor and kernel is this? (see /proc/cpuinfo, or ru

Re: [OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Jeff Squyres
On Nov 7, 2012, at 7:19 PM, Brice Goglin wrote: > * hwloc does everything libnuma does, but it does a lot more (everything > that isn't related to NUMA) Here's my 1-line description: libnuma is old bustedness; hwloc is new hotness. :-) -- Jeff Squyres jsquy...@cisco.com For corporate lega

Re: [OMPI users] mpi_leave_pinned is dangerous

2012-11-07 Thread Jens Glaser
I am replying to my own post, since no one else replied. With the help of MVAPICH2 developer S. Potluri the problem was isolated and fixed. It was, as expected, due to the library not intercepting the cudaHostAlloc() and cudaFreeHost() calls to register pinned memory, as would be required for th

Re: [OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Brice Goglin
Le 07/11/2012 21:26, Jeff Squyres a écrit : > On Nov 7, 2012, at 1:33 PM, Blosch, Edwin L wrote: > >> I see hwloc is a subproject hosted under OpenMPI but, in reading the >> documentation, I was unable to figure out if hwloc is a module within >> OpenMPI, or if some of the code base is borrowed i

Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Brice Goglin
What processor and kernel is this? (see /proc/cpuinfo, or run "lstopo -v" and look for attributes on the Socket line) You're hwloc output looks like an Intel Xeon Westmere-EX (E7-48xx or E7-88xx). The likwid output is likely wrong (maybe confused by the fact that hardware threads are disabled). Br

Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
>>> In your desired ordering you have rank 0 on (socket,core) (0,0) and >>> rank 1 on (0,2). Is there an architectural reason for that? Meaning >>> are cores 0 and 1 hardware threads in the same core, or is there a >>> cache level (say L2 or L3) connecting cores 0 and 1 separate from >>> cores

Re: [OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Jeff Squyres
On Nov 7, 2012, at 1:33 PM, Blosch, Edwin L wrote: > I see hwloc is a subproject hosted under OpenMPI but, in reading the > documentation, I was unable to figure out if hwloc is a module within > OpenMPI, or if some of the code base is borrowed into OpenMPI, or something > else. Is hwloc used

Re: [OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Josh Hursey
In your desired ordering you have rank 0 on (socket,core) (0,0) and rank 1 on (0,2). Is there an architectural reason for that? Meaning are cores 0 and 1 hardware threads in the same core, or is there a cache level (say L2 or L3) connecting cores 0 and 1 separate from cores 2 and 3? hwloc's lstopo

[OMPI users] How is hwloc used by OpenMPI

2012-11-07 Thread Blosch, Edwin L
I see hwloc is a subproject hosted under OpenMPI but, in reading the documentation, I was unable to figure out if hwloc is a module within OpenMPI, or if some of the code base is borrowed into OpenMPI, or something else. Is hwloc used by OpenMPI internally? Is it a layer above libnuma? Or is

[OMPI users] Best way to map MPI processes to sockets?

2012-11-07 Thread Blosch, Edwin L
I am trying to map MPI processes to sockets in a somewhat compacted pattern and I am wondering the best way to do it. Say there are 2 sockets (0 and 1) and each processor has 4 cores (0,1,2,3) and I have 4 MPI processes, each of which will use 2 OpenMP processes. I've re-ordered my parallel wor

[OMPI users] Question on shmem MCA parameter

2012-11-07 Thread Blosch, Edwin L
I am using this parameter "shmem_mmap_relocate_backing_file" and noticed that the relocation variable is identified as "shmem_mmap_opal_shmem_mmap_backing_file_base_dir" in its documentation, but then the next parameter that appears from ompi_info is spelled differently, namely "shmem_mmap_back

Re: [OMPI users] Communication round-trip time

2012-11-07 Thread George Bosilca
Try one of these: http://www.scl.ameslab.gov/netpipe/ http://mvapich.cse.ohio-state.edu/benchmarks/osu-micro-benchmarks-3.7.tar.gz george. On Nov 7, 2012, at 00:30 , huydanlin wrote: > Hi, >Have anyone know about MPI Program use to measure communication round-trip > time on Cluster (

Re: [OMPI users] Problems with btl openib and MPI_THREAD_MULTIPLE

2012-11-07 Thread Ralph Castain
Yes, we definitely should do so - will put it on the Trac system so it gets done. Thanks - and sorry it wasn't already there. On Wed, Nov 7, 2012 at 4:49 AM, Iliev, Hristo wrote: > Hello, Markus, > > The openib BTL component is not thread-safe. It disables itself when the > thread support leve

Re: [OMPI users] Problems with btl openib and MPI_THREAD_MULTIPLE

2012-11-07 Thread Iliev, Hristo
Hello, Markus, The openib BTL component is not thread-safe. It disables itself when the thread support level is MPI_THREAD_MULTIPLE. See this rant from one of my colleagues: http://www.open-mpi.org/community/lists/devel/2012/10/11584.php A message is shown but only if the library was compiled wi

[OMPI users] Problems with btl openib and MPI_THREAD_MULTIPLE

2012-11-07 Thread Markus Wittmann
Hello, I've compiled Open MPI 1.6.3 with --enable-mpi-thread-multiple -with-tm -with-openib --enable-opal-multi-threads. When I use for example the pingpong benchmark from the Intel MPI Benchmarks, which call MPI_Init the btl openib is used and everything works fine. When instead the benchmark c

Re: [OMPI users] OpenMPI 1.7rc5 fails to build with CUDA support when CUDA is in a non-standard location

2012-11-07 Thread Matthias Jurenz
Hello Adam, > I was able to build successfully by manually substituting the correct location into the Makefile in question. Another, more convenient workaround would be to add the following option to the Open MPI configure command: --with-contrib-vt-flags="--with-cuda-dir=$CUDA_HOME" The

[OMPI users] Communication round-trip time

2012-11-07 Thread huydanlin
Hi, Have anyone know about MPI Program use to measure communication round-trip time on Cluster ( like ping command on network) That is the program have each process run on each node on Cluster. Then it use MPI to calculate the communication round-trip time among nodes. Thanks