I've been looking at a new version of an application (cp2k, for for what
it's worth) which is calling mpi_alloc_mem/mpi_free_mem, and I don't
think it did so the previous version I looked at.  I found on an
IB-based system it's spending about half its time in those allocation
routines (according to its own profiling) -- a tad surprising.

It turns out that's due to some pathological interaction with openib,
and just having openib loaded.  It shows up on a single-node run iff I
don't suppress the openib btl, and doesn't for multi-node PSM runs iff I
suppress openib (on a mixed Mellanox/Infinipath system).

Can anyone say why, and whether there's a workaround?  (I can't easily
diagnose what it's up to as ptrace is turned off on the system
concerned, and I can't find anything relevant in archives.)

I had the idea to try libfabric instead for multi-node jobs, and that
doesn't show the pathological behaviour iff openib is suppressed.
However, it requires ompi 1.10, not 1.8, which I was trying to use.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to