> I was hoping that someone might have some examples of real application
> behaviour rather than micro benchmarks. It can be crazy hard to get that
> information from users.
[Tom]
I don't have direct performance information on knem, but with Intel's (formerly
QLogic's) PSM layer as delivered in our software stack (Intel True Scale Fabric
Suite) known as IFS, there is a kcopy module that assists shared memory MPI
bandwidth in a way similar to knem.
We ran SPEC MPI2007 benchmarks quite a while ago and kcopy showed about a 2%
advantage on average over the 13 applications that make up the suite. -- There
were codes which did not benefit, but no downside. This was run over 16 nodes
at 8 cores per node, so not very fat nodes.
More interestingly, on one of our software revs. a few years ago, a bug crept
in which disabled kcopy. A customer filed an issue that one of their apps
slowed down by 30%. Fixing that bug restored the previous performance. The
application was proprietary, so I don't even know what it did in general. It
was run over multiple nodes, so this was not a single-node performance
comparison.
More recently, some customers with large memory nodes, and > 40 cores per node
found kcopy was important to the performance of their most important app, a
finite element code (I don't have a percentage figure).
kcopy works with Open MPI over PSM , so using knem instead of kcopy is not
likely to speed up that configuration much (unless you get your PSM from OFED
or a Linux distro, then that won't include kcopy; we weren't able to get kcopy
accepted upstream). Recent PSM (from OFED 3.5 say) can be built to use knem
for kernel-assisted copies. kcopy also works with the other MPIs that
support PSM.
Hope these anecdotes are relevant to Open MPI users considering knem.
-Tom Elken
>
> Unusually for us, we're putting in a second cluster with the same
> architecture, CPUs, memory and OS as the last one. I might be able to use
> this as a bigger stick to get some better feedback. If so, I'll pass it
> on.
>
> > Darius Buntinas, Brice Goglin, et al. wrote an excellent paper about
> > exactly this set of issues; see http://runtime.bordeaux.inria.fr/knem/.
> ...
>
> I'll definitely take a look - thanks again.
>
> All the best,
>
> Mark
> --
> -
> Mark Dixon Email: m.c.di...@leeds.ac.uk
> HPC/Grid Systems Support Tel (int): 35429
> Information Systems Services Tel (ext): +44(0)113 343 5429
> University of Leeds, LS2 9JT, UK
> -
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users