On Sep 28, 2005, at 3:46 PM, Borenstein, Bernard S wrote:
I posted an issue with the Nasa Overflow 1.8 code and have traced
it further to a program failure in the malloc
areas of the code (data in these areas gets corrupted). Overflow
is mostly fortran, but since it is an old program,
it uses some c routines to do dynamic memory allocation. I’m still
tracing down the problem, but could you enlighten me as to
how OPENMPI does the malloc_hooks and intercepts memory allocation
calls to run on a linux myrinet cluster.
Is there any easy way to debug what is happening?? I’m using brute
force to track it down.
Right now, the malloc_hooks aren't doing much of anything by
default. We're seeing some issues with Myrinet that look like they
are in the GM transport layer itself. You might want to hold off for
a couple of days until we get that straightened out, then try a new
build of Open MPI. But right now, I'd stay away from Myrinet/GM on
Open MPI.
Brian