Re: [OMPI users] shared memory performance

2015-07-24 Thread Gilles Gouaillardet
Cristian, one more thing... two containers on the same host cannot communicate with the sm btl. you might want to mpirun with --mca btl tcp,self on one physical machine without container, in order to asses the performance degradation due to using tcp btl and without any containerization effect.

Re: [OMPI users] shared memory performance

2015-07-24 Thread Harald Servat
Dear Cristian, according to your configuration: a) - 8 Linux containers on the same machine configured with 2 cores b) - 8 physical machines c) - 1 physical machine a) and c) have exactly the same physical computational resources despite the fact that a) is being virtualized and the

Re: [OMPI users] shared memory performance

2015-07-22 Thread David Shrader
Hello Cristian, TAU is still under active development and the developers respond fairly fast to emails. The latest version, 2.24.1, came out just two months ago. Check out https://www.cs.uoregon.edu/research/tau/home.php for more information. If you are running in to issues getting the

Re: [OMPI users] shared memory performance

2015-07-22 Thread Gus Correa
Hi Christian, list I haven't been following the shared memory details of OMPI lately, but my recollection from some time ago is that in the 1.8 series the default (and recommended) shared memory transport btl switched from "sm" to "vader", which is the latest greatest. In this case, I guess the

Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Thank you for your answer Harald Actually I was already using TAU before but it seems that it is not maintained any more and there are problems when instrumenting applications with the version 1.8.5 of OpenMPI. I was using the OpenMPI 1.6.5 before to test the execution of HPC application on

Re: [OMPI users] shared memory performance

2015-07-22 Thread Gilles Gouaillardet
Christian, one explanation could be that the benchmark is memory bound, so running on more sockets means higher memory bandwidth means better performance. an other explanation is that on one node, you are running one openmp thread per mpi task, and on 8 nodes, you are running 8 openmp

Re: [OMPI users] shared memory performance

2015-07-22 Thread Harald Servat
Cristian, you might observe super-speedup heres because in 8 nodes you have 8 times the cache you have in only 1 node. You can also validate that by checking for cache miss activity using the tools that I mentioned in my other email. Best regards. On 22/07/15 09:42, Crisitan RUIZ wrote:

Re: [OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Sorry, I've just discovered that I was using the wrong command to run on 8 machines. I have to get rid of the "-np 8" So, I corrected the command and I used: mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp --allow-run-as-root mg.C.8 And got these results 8 cores: Mop/s

Re: [OMPI users] shared memory performance

2015-07-22 Thread Harald Servat
Dear Cristian, as you probably know C class is one of the large classes for the NAS benchmarks. That is likely to mean that the application is taking much more time to do the actual computation rather than communication. This could explain why you see this little difference between the two

[OMPI users] shared memory performance

2015-07-22 Thread Crisitan RUIZ
Hello, I'm running OpenMPI 1.8.5 on a cluster with the following characteristics: Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8 cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter. When I run the NAS benchmarks using 8 cores in the same machine, I'm

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Tim Prince
On 3/30/2011 10:08 AM, Eugene Loh wrote: Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Eugene Loh
Michele Marena wrote: I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are  workers (compute). This is the case processes 1 and 2

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Michele Marena
Hi Jeff, I thank you for your help, I've launched my app with mpiP both when two processes are on different node and when two processes are on the same node. The process 0 is the manager (gathers the results only), processes 1 and 2 are workers (compute). This is the case processes 1 and 2 are

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-30 Thread Jeff Squyres
How many messages are you sending, and how large are they? I.e., if your message passing is tiny, then the network transport may not be the bottleneck here. On Mar 28, 2011, at 9:41 AM, Michele Marena wrote: > I run ompi_info --param btl sm and this is the output > > MCA

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
I run ompi_info --param btl sm and this is the output MCA btl: parameter "btl_base_debug" (current value: "0") If btl_base_debug is 1 standard debug is output, if > 1 verbose debug is output MCA btl: parameter "btl" (current value: )

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Ralph Castain
The fact that this exactly matches the time you measured with shared memory is suspicious. My guess is that you aren't actually using shared memory at all. Does your "ompi_info" output show shared memory as being available? Jeff or others may be able to give you some params that would let you

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
What happens with 2 processes on the same node with tcp? With --mca btl self,tcp my app runs in 23s. 2011/3/28 Jeff Squyres (jsquyres) > Ah, I didn't catch before that there were more variables than just tcp vs. > shmem. > > What happens with 2 processes on the same node

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince
On 3/28/2011 3:29 AM, Michele Marena wrote: Each node have two processors (no dual-core). which seems to imply that the 2 processors share memory space and a single memory buss, and the question is not about what I originally guessed. -- Tim Prince

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Jeff Squyres (jsquyres)
Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? Sent from my phone. No type good. On Mar 28, 2011, at 6:27 AM, "Michele

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
Each node have two processors (no dual-core). 2011/3/28 Michele Marena > However, I thank you Tim, Ralh and Jeff. > My sequential application runs in 24s (wall clock time). > My parallel application runs in 13s with two processes on different nodes. > With shared

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
However, I thank you Tim, Ralh and Jeff. My sequential application runs in 24s (wall clock time). My parallel application runs in 13s with two processes on different nodes. With shared memory, when two processes are on the same node, my app runs in 23s. I'm not understand why. 2011/3/28 Jeff

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Jeff Squyres
If your program runs faster across 3 processes, 2 of which are local to each other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then something is very, very strange. Tim cites all kinds of things that can cause slowdowns, but it's still very, very odd that simply enabling using

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Ralph Castain
On Mar 27, 2011, at 7:37 AM, Tim Prince wrote: > On 3/27/2011 2:26 AM, Michele Marena wrote: >> Hi, >> My application performs good without shared memory utilization, but with >> shared memory I get performance worst than without of it. >> Do I make a mistake? Don't I pay attention to something?

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Tim Prince
On 3/27/2011 2:26 AM, Michele Marena wrote: Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Michele Marena
This is my machinefile node-1-16 slots=2 node-1-17 slots=2 node-1-18 slots=2 node-1-19 slots=2 node-1-20 slots=2 node-1-21 slots=2 node-1-22 slots=2 node-1-23 slots=2 Each cluster node has 2 processors. I launch my application with 3 processes, one on node-1-16 (manager) and two on

[OMPI users] Shared Memory Performance Problem.

2011-03-27 Thread Michele Marena
Hi, My application performs good without shared memory utilization, but with shared memory I get performance worst than without of it. Do I make a mistake? Don't I pay attention to something? I know OpenMPI uses /tmp directory to allocate shared memory and it is in the local filesystem. I thank