Gentlemen,

I have been looking at a data corruption with the MX btl or mtl with the 1.3 branch when trying to use MX registration cache. The related ticket is #1525, opened by Tim.

In 1.3, mallopt() is used to never trim memory, in replacement of the malloc overload by ptmalloc2. MX provides its own malloc hooks, but they can't work when the lib is dlopen()ed, so MX has to rely on OMPI to make the registration cache safe. Apparently, mallopt() is only called in the initialization of the mpool component. However, MX btl or mtl do not use the mpool. There is a mallopt memory module in opal, but it assumes that the mpool is used.

What is the best way to fix this issue ?
* move the mallopt calls out of the mpool init.
* use a fake mpool in the MX btl and mtl.
* duplicate the mallopt calls directly in the MX btl and mtl.

I got lost looking at the mpool code, so I may be completely wrong here.

Patrick

Reply via email to