Re: [OMPI users] knem/openmpi performance?

2013-07-15 Thread Elken, Tom
> I was hoping that someone might have some examples of real application
> behaviour rather than micro benchmarks. It can be crazy hard to get that
> information from users.
[Tom] 
I don't have direct performance information on knem, but with Intel's (formerly 
QLogic's) PSM layer as delivered in our software stack (Intel True Scale Fabric 
Suite) known as IFS, there is a kcopy module that assists shared memory MPI 
bandwidth in a way similar to knem.

We ran SPEC MPI2007 benchmarks quite a while ago and kcopy showed about a 2% 
advantage on average over the 13 applications that make up the suite. -- There 
were codes which did not benefit, but no downside.  This was run over 16 nodes 
at 8 cores per node, so not very fat nodes.

More interestingly, on one of our software revs. a few years ago, a bug crept 
in which disabled kcopy.  A customer filed an issue that one of their apps 
slowed down by 30%.  Fixing that bug restored the previous performance.  The 
application was proprietary, so I don't even know what it did in general.  It 
was run over multiple nodes, so this was not a single-node performance 
comparison.

More recently, some customers with large memory nodes, and > 40 cores per node 
found kcopy was important to the performance of their most important app, a 
finite element code (I don't have a percentage figure).

kcopy works with Open MPI over PSM , so using knem instead of kcopy is not 
likely to speed up that configuration much (unless you get your PSM from OFED 
or a Linux distro, then that won't include kcopy; we weren't able to get kcopy 
accepted upstream).  Recent PSM (from OFED 3.5 say) can be built to use knem 
for kernel-assisted copies. kcopy also works with the other MPIs that 
support PSM.

Hope these anecdotes are relevant to Open MPI users considering knem.

-Tom Elken  





> 
> Unusually for us, we're putting in a second cluster with the same
> architecture, CPUs, memory and OS as the last one. I might be able to use
> this as a bigger stick to get some better feedback. If so, I'll pass it
> on.
> 
> > Darius Buntinas, Brice Goglin, et al. wrote an excellent paper about
> > exactly this set of issues; see http://runtime.bordeaux.inria.fr/knem/.
> ...
> 
> I'll definitely take a look - thanks again.
> 
> All the best,
> 
> Mark
> --
> -
> Mark Dixon   Email: m.c.di...@leeds.ac.uk
> HPC/Grid Systems Support Tel (int): 35429
> Information Systems Services Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] knem/openmpi performance?

2013-07-15 Thread Jeff Squyres (jsquyres)
On Jul 15, 2013, at 3:43 AM, Paul Kapinos  wrote:

> Jeff, I would turn the question the other way around:
> 
> - are there any penalties when using KNEM?

Good question.  I can't think of any, but then again, I haven't tried this at 
really large scale.

If you guys try it, we'd like to hear your experiences.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] knem/openmpi performance?

2013-07-15 Thread Paul Kapinos

On 07/12/13 12:55, Jeff Squyres (jsquyres) wrote:

FWIW: a long time ago (read: many Open MPI / knem versions ago),
I did a few benchmarks with knem vs. no knem Open MPI installations.
IIRC, I used the typical suspects like NetPIPE, the NPBs, etc.
There was a modest performance improvement (I don't remember the numbers 
offhand);
it was a smaller improvement than I had hoped for
-- particularly in point-to-point message passing latency (e.g., via NetPIPE).


Jeff, I would turn the question the other way around:

- are there any penalties when using KNEM?

We have a couple of Really Big Nodes (128 cores) with non-huge memory bandwidth 
(because coupled of 4x standalone nodes with 4 sockets each). So cutting the 
bandwidth in halves on these nodes sound like Very Good Thing.


But otherwise we've 1500+ nodes with 2 sockets and 24GB memory only and we do 
not wanna to disturb the production on these nodes (and different MPI 
versions for different nodes are doofy).


Best

Paul


--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature