Folks,

I'm trying to track down an instance of openMPI writing to a freed block of 
memory.
This occurs with the most recent release (1.6.3) as well as 1.6, on a 64 bit 
intel architecture, fedora 14.
It occurs with a very simple reduction (allreduce minimum), over a single int 
value.

Has anyone had any recent problems like this?  It may be showing up as an 
intermittent error
(i.e. there's no problem as long as the allocated block hasn't been 
re-allocated, which depends upon malloc).
You may not know about it unless you've been debugging malloc with valgrind or 
dmalloc or the like.

I'm wondering if the openMPI developers use power tools such as valgrind / 
dmalloc / etc
on the releases to try to catch these things via exhaustive testing -
but I understand memory problems in C are of the nature that anyone making a 
mistake can propogate,
so I haven't ruled out problems in our own code.
Also, I'm wondering if anyone has suggestions on how to track this down further.

I'm using allinea DDT and their builtin dmalloc, which catches the error, which 
appears in
the second memcpy in  opal_convertor_pack(), but I don't have more details than 
that at the moment.
All I know so far is that one of those values has been freed.
Obviously, I haven't seen anything in earlier parts of the code which might 
have triggered memory corruption,
although both openMPI and intel IPP do things with uninitialized values before 
this (according to Valgrind).

Steve H.

Reply via email to