Re: [OMPI users] shared memory performance

2015-07-22 Thread David Shrader
Hello Cristian, TAU is still under active development and the developers respond fairly fast to emails. The latest version, 2.24.1, came out just two months ago. Check out https://www.cs.uoregon.edu/research/tau/home.php for more information. If you are running in to issues getting the lates

Re: [OMPI users] shared memory performance

2015-07-22 Thread Gus Correa
Hi Christian, list I haven't been following the shared memory details of OMPI lately, but my recollection from some time ago is that in the 1.8 series the default (and recommended) shared memory transport btl switched from "sm" to "vader", which is the latest greatest. In this case, I guess the

Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Thank you for your answer Harald Actually I was already using TAU before but it seems that it is not maintained any more and there are problems when instrumenting applications with the version 1.8.5 of OpenMPI. I was using the OpenMPI 1.6.5 before to test the execution of HPC application on

Re: [OMPI users] shared memory performance

2015-07-22 Thread Gilles Gouaillardet
Christian, one explanation could be that the benchmark is memory bound, so running on more sockets means higher memory bandwidth means better performance. an other explanation is that on one node, you are running one openmp thread per mpi task, and on 8 nodes, you are running 8 openmp threads

Re: [OMPI users] shared memory performance

2015-07-22 Thread Harald Servat
Cristian, you might observe super-speedup heres because in 8 nodes you have 8 times the cache you have in only 1 node. You can also validate that by checking for cache miss activity using the tools that I mentioned in my other email. Best regards. On 22/07/15 09:42, Crisitan RUIZ wrote:

Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Sorry, I've just discovered that I was using the wrong command to run on 8 machines. I have to get rid of the "-np 8" So, I corrected the command and I used: mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp --allow-run-as-root mg.C.8 And got these results 8 cores: Mop/s total

Re: [OMPI users] shared memory performance

2015-07-22 Thread Harald Servat
Dear Cristian, as you probably know C class is one of the large classes for the NAS benchmarks. That is likely to mean that the application is taking much more time to do the actual computation rather than communication. This could explain why you see this little difference between the two

[OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Hello, I'm running OpenMPI 1.8.5 on a cluster with the following characteristics: Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8 cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter. When I run the NAS benchmarks using 8 cores in the same machine, I'm gettin