Re: [OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2011-01-03 Thread Shamis, Pavel
It looks that we are touching some QP that was released. Before close the QP we make sure to complete all outstanding messages on the endpoint. Once all qps (and other resources) are closed , we signal to async thread to remove this hca from monitoring list. For me it looks that somehow we clos

Re: [OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2011-01-03 Thread Jeff Squyres (jsquyres)
I'd guess thesame thing as George - a race condition in the shutdown of the async thread...? I haven't looked at that code in a long log time to remember how it tried to defend against the race condition. Sent from my PDA. No type good. On Jan 3, 2011, at 2:31 PM, "Eugene Loh" wrote: > Geo

Re: [OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2011-01-03 Thread Eugene Loh
George Bosilca wrote: Eugene, This error indicate that somehow we're accessing the QP while the QP is in "down" state. As the asynchronous thread is the one that see this error, I wonder if it doesn't look for some information about a QP that has been destroyed by the main thread (as this on

Re: [OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2010-12-31 Thread George Bosilca
Eugene, This error indicate that somehow we're accessing the QP while the QP is in "down" state. As the asynchronous thread is the one that see this error, I wonder if it doesn't look for some information about a QP that has been destroyed by the main thread (as this only occurs in MPI_Finalize

[OMPI devel] IBV_EVENT_QP_ACCESS_ERR

2010-12-30 Thread Eugene Loh
I was running a bunch of np=4 test programs over two nodes. Occasionally, *one* of the codes would see an IBV_EVENT_QP_ACCESS_ERR during MPI_Finalize(). I traced the code and ran another program that mimicked the particular MPI calls made by that program. This other program, too, would occas