It looks that we are touching some QP that was released. Before close the QP we
make sure to complete all outstanding messages on the endpoint. Once all qps
(and other resources) are closed , we signal to async thread to remove this hca
from monitoring list. For me it looks that somehow we clos
I'd guess thesame thing as George - a race condition in the shutdown of the
async thread...? I haven't looked at that code in a long log time to remember
how it tried to defend against the race condition.
Sent from my PDA. No type good.
On Jan 3, 2011, at 2:31 PM, "Eugene Loh" wrote:
> Geo
George Bosilca wrote:
Eugene,
This error indicate that somehow we're accessing the QP while the QP is in
"down" state. As the asynchronous thread is the one that see this error, I
wonder if it doesn't look for some information about a QP that has been destroyed by the
main thread (as this on
Eugene,
This error indicate that somehow we're accessing the QP while the QP is in
"down" state. As the asynchronous thread is the one that see this error, I
wonder if it doesn't look for some information about a QP that has been
destroyed by the main thread (as this only occurs in MPI_Finalize
I was running a bunch of np=4 test programs over two nodes.
Occasionally, *one* of the codes would see an IBV_EVENT_QP_ACCESS_ERR
during MPI_Finalize(). I traced the code and ran another program that
mimicked the particular MPI calls made by that program. This other
program, too, would occas