Re: [OMPI devel] OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Ralph, The changeset avoids SIGSEGV by calling mpi_abort before bad things happen. The attached patch seems to fix the problem (and makes the changeset kind of useless). Once again, the patch was very little tested and might break other parts of coll/m.laposte Cheers, Gilles Ralph Castain

Re: [OMPI devel] race condition in coll/ml

2014-09-01 Thread Ralph Castain
Usually we have trouble with coll/ml because the process locality isn't being reported sufficiently for its needs. Given the recent change in data exchange, I suspect that is the root cause here - I have a note to Nathan asking for clarification of the coll/ml locality requirement. Did this

[OMPI devel] race condition in coll/ml

2014-09-01 Thread Gilles Gouaillardet
Folks, mtt recently failed a bunch of times with the trunk. a good suspect is the collective/ibarrier test from the ibm test suite. most of the time, CHECK_AND_RECYCLE will fail /* IS_COLL_SYNCMEM(coll_op) is true */ with this test case, we just get a glory SIGSEGV since OBJ_RELEASE is called