Gentlemen,
I have been looking at a data corruption with the MX btl or mtl with the
1.3 branch when trying to use MX registration cache. The related ticket
is #1525, opened by Tim.
In 1.3, mallopt() is used to never trim memory, in replacement of the
malloc overload by ptmalloc2. MX provides its own malloc hooks, but they
can't work when the lib is dlopen()ed, so MX has to rely on OMPI to make
the registration cache safe. Apparently, mallopt() is only called in the
initialization of the mpool component. However, MX btl or mtl do not use
the mpool. There is a mallopt memory module in opal, but it assumes that
the mpool is used.
What is the best way to fix this issue ?
* move the mallopt calls out of the mpool init.
* use a fake mpool in the MX btl and mtl.
* duplicate the mallopt calls directly in the MX btl and mtl.
I got lost looking at the mpool code, so I may be completely wrong here.
Patrick