Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ

Thank you for your answer Harald

Actually I was already using TAU before but it seems that it is not 
maintained any more and there are problems when instrumenting 
applications with the version 1.8.5 of OpenMPI.


I was using the OpenMPI 1.6.5 before to test the execution of HPC 
application on Linux containers. I tested the performance of NAS 
benchmarks in three different configurations:


- 8 Linux containers on the same machine configured with 2 cores
- 8 physical machines
- 1 physical machine

So, as I already described it, each machine counts with 2 processors (8 
cores each). I instrumented and run all NAS benchmark in these three 
configurations and I got the results that I attached in this email.
In the table "native" corresponds to using 8 physical machines and "SM" 
corresponds to 1 physical machine using the sm module, time is given in 
miliseconds.


What surprise me in the results is that using containers in the worse 
case have equal communication time than just using plain mpi processes. 
Even though the containers use virtual network interfaces to communicate 
between them. Probably this behaviour is due to process binding and 
locality, I wanted to redo the test using OpenMPI version 1.8.5 but 
unfourtunately I couldn't sucessfully instrument the applications. I was 
looking for another MPI profiler but I couldn't find any. HPCToolkit 
looks like it is not maintain anymore, Vampir does not maintain any more 
the tool that instrument the application.  I will probably give a try to 
Paraver.




Best regards,

Cristian Ruiz



On 07/22/2015 09:44 AM, Harald Servat wrote:


Cristian,

  you might observe super-speedup heres because in 8 nodes you have 8 
times the cache you have in only 1 node. You can also validate that by 
checking for cache miss activity using the tools that I mentioned in 
my other email.


Best regards.

On 22/07/15 09:42, Crisitan RUIZ wrote:

Sorry, I've just discovered that I was using the wrong command to run on
8 machines. I have to get rid of the "-np 8"

So, I corrected the command and I used:

mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp
--allow-run-as-root mg.C.8

And got these results

8 cores:

Mop/s total = 19368.43


8 machines

Mop/s total = 96094.35


Why is the performance of mult-node almost 4 times better than
multi-core? Is this normal behavior?

On 07/22/2015 09:19 AM, Crisitan RUIZ wrote:


 Hello,

I'm running OpenMPI 1.8.5 on a cluster with the following
characteristics:

Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8
cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.

When I run the NAS benchmarks using 8 cores in the same machine, I'm
getting almost the same performance as using 8 machines running a mpi
process per machine.

I used the following commands:

for running multi-node:

mpirun -np 8 --machinefile machine_file.txt --mca btl self,sm,tcp
--allow-run-as-root mg.C.8

for running in with 8 cores:

mpirun -np 8  --mca btl self,sm,tcp --allow-run-as-root mg.C.8


I got the following results:

8 cores:

 Mop/s total = 19368.43

8 machines:

Mop/s total = 19326.60


The results are similar for other benchmarks. Is this behavior normal?
I was expecting to see a better behavior using 8 cores given that they
use directly the memory to communicate.

Thank you,

Cristian
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/07/27295.php


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/07/27297.php



WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27298.php




Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Sorry, I've just discovered that I was using the wrong command to run on 
8 machines. I have to get rid of the "-np 8"


So, I corrected the command and I used:

mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp 
--allow-run-as-root mg.C.8


And got these results

8 cores:

Mop/s total = 19368.43


8 machines

Mop/s total = 96094.35


Why is the performance of mult-node almost 4 times better than 
multi-core? Is this normal behavior?


On 07/22/2015 09:19 AM, Crisitan RUIZ wrote:


 Hello,

I'm running OpenMPI 1.8.5 on a cluster with the following 
characteristics:


Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8 
cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.


When I run the NAS benchmarks using 8 cores in the same machine, I'm 
getting almost the same performance as using 8 machines running a mpi 
process per machine.


I used the following commands:

for running multi-node:

mpirun -np 8 --machinefile machine_file.txt --mca btl self,sm,tcp 
--allow-run-as-root mg.C.8


for running in with 8 cores:

mpirun -np 8  --mca btl self,sm,tcp --allow-run-as-root mg.C.8


I got the following results:

8 cores:

 Mop/s total = 19368.43

8 machines:

Mop/s total = 19326.60


The results are similar for other benchmarks. Is this behavior normal? 
I was expecting to see a better behavior using 8 cores given that they 
use directly the memory to communicate.


Thank you,

Cristian
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/07/27295.php




[OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ


 Hello,

I'm running OpenMPI 1.8.5 on a cluster with the following characteristics:

Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8 
cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.


When I run the NAS benchmarks using 8 cores in the same machine, I'm 
getting almost the same performance as using 8 machines running a mpi 
process per machine.


I used the following commands:

for running multi-node:

mpirun -np 8 --machinefile machine_file.txt --mca btl self,sm,tcp 
--allow-run-as-root mg.C.8


for running in with 8 cores:

mpirun -np 8  --mca btl self,sm,tcp --allow-run-as-root mg.C.8


I got the following results:

8 cores:

 Mop/s total = 19368.43

8 machines:

Mop/s total = 19326.60


The results are similar for other benchmarks. Is this behavior normal? I 
was expecting to see a better behavior using 8 cores given that they use 
directly the memory to communicate.


Thank you,

Cristian