Posted a possible fix to the intercomm hang. See
https://github.com/open-mpi/ompi/pull/2061
-Nathan
> On Sep 7, 2016, at 6:53 AM, Nathan Hjelm wrote:
>
> Looking at the code now. This code was more or less directly translated from
> the blocking version. I wouldn’t be surprised if there is a
Looking at the code now. This code was more or less directly translated from
the blocking version. I wouldn’t be surprised if there is an error that I
didn’t catch with MTT on my laptop.
That said, there is an old comment about not using bcast to avoid a possible
deadlock. Since the collective
Thanks guys,
so i was finally able to reproduce the bug on my (oversubscribed) VM
with tcp.
MPI_Intercomm_merge (indirectly) incorrectly invokes iallgatherv.
1,main (MPI_Issend_rtoa_c.c:196)
1, MPITEST_get_communicator (libmpitest.c:3544)
1,PMPI_Intercomm_merge (pintercomm_merge.c:131)
I can make MPI_Issend_rtoa deadlock with vader and sm.
George.
On Tue, Sep 6, 2016 at 12:06 PM, r...@open-mpi.org wrote:
> FWIW: those tests hang for me with TCP (I don’t have openib on my
> cluster). I’ll check it with your change as well
>
>
> On Sep 6, 2016, at 1:29 AM, Gilles Gouaillarde
FWIW: those tests hang for me with TCP (I don’t have openib on my cluster).
I’ll check it with your change as well
> On Sep 6, 2016, at 1:29 AM, Gilles Gouaillardet wrote:
>
> Ralph,
>
>
> this looks like an other hang :-(
>
>
> i ran MPI_Issend_rtoa_c on 32 tasks (2 nodes, 2 sockets per n
Ralph,
this looks like an other hang :-(
i ran MPI_Issend_rtoa_c on 32 tasks (2 nodes, 2 sockets per node, 8
cores per socket) with infiniband,
and i always observe the same hang at the same place.
surprisingly, i do not get any hang if i blacklist the openib btl
the patch below can be
ok, will double check tomorrow this was the very same hang i fixed earlier
Cheers,
Gilles
On Monday, September 5, 2016, r...@open-mpi.org wrote:
> I was just looking at the overnight MTT report, and these were present
> going back a long ways in both branches. They are in the Intel test suite
I was just looking at the overnight MTT report, and these were present going
back a long ways in both branches. They are in the Intel test suite.
If you have already addressed them, then thanks!
> On Sep 5, 2016, at 7:48 AM, Gilles Gouaillardet
> wrote:
>
> Ralph,
>
> I fixed a hang earlier
Ralph,
I fixed a hang earlier today in master, and the PR for v2.x is at
https://github.com/open-mpi/ompi-release/pull/1368
Can you please make sure you are running the latest master ?
Which testsuite do these tests come from ?
I will have a look tomorrow if the hang is still there
Cheers,
Gi
Hey folks
All of the tests that involve either ISsend_ator, SSend_ator, ISsend_rtoa, or
SSend_rtoa are hanging on master and v2.x. Does anyone know what these tests
do, and why we never seem to pass them?
Do we care?
Ralph
___
devel mailing list
deve
10 matches
Mail list logo