Hello Cristian,
TAU is still under active development and the developers respond fairly
fast to emails. The latest version, 2.24.1, came out just two months
ago. Check out https://www.cs.uoregon.edu/research/tau/home.php for more
information.
If you are running in to issues getting the lates
Hi Christian, list
I haven't been following the shared memory details of OMPI lately,
but my recollection from some time ago is that in the 1.8 series the
default (and recommended) shared memory transport btl switched from
"sm" to "vader", which is the latest greatest.
In this case, I guess the
Thank you for your answer Harald
Actually I was already using TAU before but it seems that it is not
maintained any more and there are problems when instrumenting
applications with the version 1.8.5 of OpenMPI.
I was using the OpenMPI 1.6.5 before to test the execution of HPC
application on
Christian,
one explanation could be that the benchmark is memory bound, so running
on more sockets means higher memory bandwidth means better performance.
an other explanation is that on one node, you are running one openmp
thread per mpi task, and on 8 nodes, you are running 8 openmp threads
Cristian,
you might observe super-speedup heres because in 8 nodes you have 8
times the cache you have in only 1 node. You can also validate that by
checking for cache miss activity using the tools that I mentioned in my
other email.
Best regards.
On 22/07/15 09:42, Crisitan RUIZ wrote:
Sorry, I've just discovered that I was using the wrong command to run on
8 machines. I have to get rid of the "-np 8"
So, I corrected the command and I used:
mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp
--allow-run-as-root mg.C.8
And got these results
8 cores:
Mop/s total
Dear Cristian,
as you probably know C class is one of the large classes for the NAS
benchmarks. That is likely to mean that the application is taking much
more time to do the actual computation rather than communication. This
could explain why you see this little difference between the two
Hello,
I'm running OpenMPI 1.8.5 on a cluster with the following characteristics:
Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8
cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.
When I run the NAS benchmarks using 8 cores in the same machine, I'm
gettin