Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm
Posted a possible fix to the intercomm hang. See https://github.com/open-mpi/ompi/pull/2061 -Nathan > On Sep 7, 2016, at 6:53 AM, Nathan Hjelm wrote: > > Looking at the code now. This code was more or less directly translated from > the blocking version. I wouldn’t be surprised if there is a

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Nathan Hjelm
Looking at the code now. This code was more or less directly translated from the blocking version. I wouldn’t be surprised if there is an error that I didn’t catch with MTT on my laptop. That said, there is an old comment about not using bcast to avoid a possible deadlock. Since the collective

Re: [OMPI devel] Hanging tests

2016-09-07 Thread Gilles Gouaillardet
Thanks guys, so i was finally able to reproduce the bug on my (oversubscribed) VM with tcp. MPI_Intercomm_merge (indirectly) incorrectly invokes iallgatherv. 1,main (MPI_Issend_rtoa_c.c:196) 1, MPITEST_get_communicator (libmpitest.c:3544) 1,PMPI_Intercomm_merge (pintercomm_merge.c:131)

Re: [OMPI devel] Hanging tests

2016-09-06 Thread George Bosilca
I can make MPI_Issend_rtoa deadlock with vader and sm. George. On Tue, Sep 6, 2016 at 12:06 PM, r...@open-mpi.org wrote: > FWIW: those tests hang for me with TCP (I don’t have openib on my > cluster). I’ll check it with your change as well > > > On Sep 6, 2016, at 1:29 AM, Gilles Gouaillarde

Re: [OMPI devel] Hanging tests

2016-09-06 Thread r...@open-mpi.org
FWIW: those tests hang for me with TCP (I don’t have openib on my cluster). I’ll check it with your change as well > On Sep 6, 2016, at 1:29 AM, Gilles Gouaillardet wrote: > > Ralph, > > > this looks like an other hang :-( > > > i ran MPI_Issend_rtoa_c on 32 tasks (2 nodes, 2 sockets per n

Re: [OMPI devel] Hanging tests

2016-09-06 Thread Gilles Gouaillardet
Ralph, this looks like an other hang :-( i ran MPI_Issend_rtoa_c on 32 tasks (2 nodes, 2 sockets per node, 8 cores per socket) with infiniband, and i always observe the same hang at the same place. surprisingly, i do not get any hang if i blacklist the openib btl the patch below can be

Re: [OMPI devel] Hanging tests

2016-09-05 Thread Gilles Gouaillardet
ok, will double check tomorrow this was the very same hang i fixed earlier Cheers, Gilles On Monday, September 5, 2016, r...@open-mpi.org wrote: > I was just looking at the overnight MTT report, and these were present > going back a long ways in both branches. They are in the Intel test suite

Re: [OMPI devel] Hanging tests

2016-09-05 Thread r...@open-mpi.org
I was just looking at the overnight MTT report, and these were present going back a long ways in both branches. They are in the Intel test suite. If you have already addressed them, then thanks! > On Sep 5, 2016, at 7:48 AM, Gilles Gouaillardet > wrote: > > Ralph, > > I fixed a hang earlier

Re: [OMPI devel] Hanging tests

2016-09-05 Thread Gilles Gouaillardet
Ralph, I fixed a hang earlier today in master, and the PR for v2.x is at https://github.com/open-mpi/ompi-release/pull/1368 Can you please make sure you are running the latest master ? Which testsuite do these tests come from ? I will have a look tomorrow if the hang is still there Cheers, Gi

[OMPI devel] Hanging tests

2016-09-05 Thread r...@open-mpi.org
Hey folks All of the tests that involve either ISsend_ator, SSend_ator, ISsend_rtoa, or SSend_rtoa are hanging on master and v2.x. Does anyone know what these tests do, and why we never seem to pass them? Do we care? Ralph ___ devel mailing list deve