Re: [OMPI users] Sandy Bridge performance question

2013-06-10 Thread Iliev, Hristo
s (jsquyres) > Sent: Friday, June 07, 2013 2:54 PM > To: Open MPI Users > Subject: Re: [OMPI users] Sandy Bridge performance question > > On Jun 7, 2013, at 5:28 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com> > wrote: > > > Regarding VTune, we have a code

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Jeff Squyres (jsquyres)
On Jun 7, 2013, at 5:28 AM, "Blosch, Edwin L" wrote: > Regarding VTune, we have a code that doesn't scale well so that's a good tip. > I have access to VTune, I've used it. But I only remember looking at > OpenMP, I didn't know it could handle MPI runs. That would be

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Blosch, Edwin L
of Jeff Squyres (jsquyres) [jsquy...@cisco.com] Sent: Friday, June 07, 2013 6:00 AM To: Open MPI Users Subject: EXTERNAL: Re: [OMPI users] Sandy Bridge performance question +1 Depending on how much you care, you might also want to look at some performance analysis tools to look and see what

Re: [OMPI users] Sandy Bridge performance question

2013-06-07 Thread Jeff Squyres (jsquyres)
+1 Depending on how much you care, you might also want to look at some performance analysis tools to look and see what is happening under the covers. The Intel VTune suite is the gold standard -- it shows all the counters and statistics from the CPUs themselves (be aware that there's a bit of

Re: [OMPI users] Sandy Bridge performance question

2013-06-06 Thread Ralph Castain
It depends on the application you are using. Some are "balanced" - i.e., they run faster if the number of processes is a power of two. You'll see that n8 is faster than n7, so this is likely the situation. On Jun 6, 2013, at 4:10 PM, "Blosch, Edwin L" wrote: > I am

[OMPI users] Sandy Bridge performance question

2013-06-06 Thread Blosch, Edwin L
I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling. I'm using -bind-to-core without any other options (default is -bycore I believe). These numbers indicate number of cores first, then the second digit is the run number (except for n=1, all runs repeated 3 times).