Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Lewis,Sean via users
Oops, I knew I forgot something!
I am using OpenMPI 3.1.1
I have tried loading in a OpenMPI 4.0.3 module but receive a repeating error at 
runtime:

[tcn560.bullx:16698] pml_ucx.c:175  Error: Failed to receive UCX worker 
address: Not found (-13)
[tcn560.bullx:16698] [[42671,6],29] ORTE_ERROR_LOG: Error in file dpm/dpm.c at 
line 493

Which makes me think I am in fact on an IB cluster, I will configure my own 
local OpenMPI 4.0.4 with the config options you suggest and test it out.
I will also set up a run with the mca parameters suggested by Gilles. The 
cluster I work on will be down for the next two days so it will be a bit before 
I can provide any updates! Thank you for your patience and your help, much 
appreciated. 

-Sean


On 7/25/20, 4:33 AM, "users on behalf of Joseph Schuchart via users" 
 wrote:

External.

Hi Sean,

Thanks for the report! I have a few questions/suggestions:

1) What version of Open MPI are you using?
2) What is your network? It sounds like you are on an IB cluster using
btl/openib (which is essentially discontinued). Can you try the Open MPI
4.0.4 release with UCX instead of openib (configure with --without-verbs
and --with-ucx)?
3) If that does not help, can you boil your code down to a minimum
working example? That would make it easier for people to try to
reproduce what happens.

Cheers
Joseph

On 7/24/20 11:34 PM, Lewis,Sean via users wrote:
> Hi all,
>
> I am encountering a silent hang involving MPI_Ssend and MPI_Irecv. The
> subroutine in question is called by each processor and is structured
> similar to the pseudo code below. The subroutine is successfully called
> several thousand times before the silent hang behavior manifests and
> never resolves. The hang will occur in nearly (but not exactly) the same
> spot for bit-wise identical tests. During the hang, all MPI ranks will
> be at the Line 18 Barrier except for two. One will be waiting at Line
> 17, waiting for its Irecv to complete, and the other at one of the Ssend
> Line 9 or 14. This suggests that a MPI_Irecv never completes and a
> processor is indefinitely blocked in the Ssend unable to complete the
> transfer.
>
> I’ve found similar discussion of this kind of behavior on the OpenMPI
> mailing list:
> 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fusers%40lists.open-mpi.org%2Fmsg19227.html&data=02%7C01%7Cscl63%40drexel.edu%7C3b0c3aea09a64654a4c108d830756470%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C637312628045924572&sdata=H2yKKppFOFeJo1SrwQSi2Lts%2BtPQBjpf%2FjLVSlKrCiY%3D&reserved=0
> ultimately resolving in setting the mca parameter btl_openib_flags to
> 304 or 305 (default 310):
> 
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fusers%40lists.open-mpi.org%2Fmsg19277.html&data=02%7C01%7Cscl63%40drexel.edu%7C3b0c3aea09a64654a4c108d830756470%7C3664e6fa47bd45a696708c4f080f8ca6%7C0%7C0%7C637312628045924572&sdata=GulI2%2F%2F%2BnQskKGz7HVSmwVG053T0v4w66TSxrWMvTEc%3D&reserved=0.
 I
> have seen some promising behavior by doing the same. As the mailer
> suggests, this implies a problem with the RDMA protocols in infiniband
> for large messages.
>
> I wanted to breathe life back into this conversation as the silent hang
> issue is particularly debilitating and confusing to me.
> Increasing/decreasing the number of processors used does not seem to
> alleviate the issue, using MPI_Send results in the same behavior,
> perhaps a message has exceeded a memory limit? I am running a test now
> that reports the individual message sizes but I previously implemented a
> switch to check for buffer size discrepancies which is not triggered. In
> the meantime, has anyone run into similar issues or have thoughts as to
> remedies for this behavior?
>
> 1:  call MPI_BARRIER(…)
>
> 2:  do i = 1,nprocs
>
> 3:   if(commatrix_recv(i) .gt. 0) then ! Identify which procs to
> receive from via predefined matrix
>
> 4: call Mpi_Irecv(…)
>
> 5:   endif
>
> 6:   enddo
>
> 7:   do j = mype+1,nproc
>
> 8:   if(commatrix_send(j) .gt. 0) then ! Identify which procs to
> send to via predefined matrix
>
> 9: MPI_Ssend(…)
>
> 10: endif
>
> 11: enddo
>
> 12: do j = 1,mype
>
> 13:  if(commatrix_send(j) .gt. 0) then ! Identify which procs to
> send to via predefined matrix
>
> 14:MPI_Ssend(…)
>
> 15: endif
>
> 16: enddo
>
> 17: call MPI_Waitall(…) ! Wait for all Irecv to complete
>
> 18: call MPI_Barrier(…)
>
> Cluster information:
>
> 30 processors
>
> Managed by slurm
>
> OS: Red Hat v. 7.7
>
> Thank you for help/advice you can provide,
>
   

Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Gilles Gouaillardet via users
Sean,

you might also want to confirm openib is (part of) the issue by running 
your app on TCP only.

mpirun --mca pml ob1 --mca btl tcp,self, ...

Cheers,

Gilles


- Original Message -
> Hi Sean,
> 
> Thanks for the report! I have a few questions/suggestions:
> 
> 1) What version of Open MPI are you using?
> 2) What is your network? It sounds like you are on an IB cluster using 
> btl/openib (which is essentially discontinued). Can you try the Open 
MPI 
> 4.0.4 release with UCX instead of openib (configure with --without-
verbs 
> and --with-ucx)?
> 3) If that does not help, can you boil your code down to a minimum 
> working example? That would make it easier for people to try to 
> reproduce what happens.
> 
> Cheers
> Joseph
> 
> On 7/24/20 11:34 PM, Lewis,Sean via users wrote:
> > Hi all,
> > 
> > I am encountering a silent hang involving MPI_Ssend and MPI_Irecv. 
The 
> > subroutine in question is called by each processor and is structured 
> > similar to the pseudo code below. The subroutine is successfully 
called 
> > several thousand times before the silent hang behavior manifests and 
> > never resolves. The hang will occur in nearly (but not exactly) the 
same 
> > spot for bit-wise identical tests. During the hang, all MPI ranks 
will 
> > be at the Line 18 Barrier except for two. One will be waiting at 
Line 
> > 17, waiting for its Irecv to complete, and the other at one of the 
Ssend 
> > Line 9 or 14. This suggests that a MPI_Irecv never completes and a 
> > processor is indefinitely blocked in the Ssend unable to complete 
the 
> > transfer.
> > 
> > I’ve found similar discussion of this kind of behavior on the 
OpenMPI 
> > mailing list: 
> > https://www.mail-archive.com/users@lists.open-mpi.org/msg19227.html 
> > ultimately resolving in setting the mca parameter btl_openib_flags 
to 
> > 304 or 305 (default 310): 
> > https://www.mail-archive.com/users@lists.open-mpi.org/msg19277.html. 
I 
> > have seen some promising behavior by doing the same. As the mailer 
> > suggests, this implies a problem with the RDMA protocols in 
infiniband 
> > for large messages.
> > 
> > I wanted to breathe life back into this conversation as the silent 
hang 
> > issue is particularly debilitating and confusing to me. 
> > Increasing/decreasing the number of processors used does not seem to 
> > alleviate the issue, using MPI_Send results in the same behavior, 
> > perhaps a message has exceeded a memory limit? I am running a test 
now 
> > that reports the individual message sizes but I previously 
implemented a 
> > switch to check for buffer size discrepancies which is not triggered.
 In 
> > the meantime, has anyone run into similar issues or have thoughts as 
to 
> > remedies for this behavior?
> > 
> > 1:  call MPI_BARRIER(…)
> > 
> > 2:  do i = 1,nprocs
> > 
> > 3:   if(commatrix_recv(i) .gt. 0) then ! Identify which procs to 
> > receive from via predefined matrix
> > 
> > 4: call Mpi_Irecv(…)
> > 
> > 5:   endif
> > 
> > 6:   enddo
> > 
> > 7:   do j = mype+1,nproc
> > 
> > 8:   if(commatrix_send(j) .gt. 0) then ! Identify which procs to 
> > send to via predefined matrix
> > 
> > 9:     MPI_Ssend(…)
> > 
> > 10: endif
> > 
> > 11: enddo
> > 
> > 12: do j = 1,mype
> > 
> > 13:  if(commatrix_send(j) .gt. 0) then ! Identify which procs to 
> > send to via predefined matrix
> > 
> > 14:    MPI_Ssend(…)
> > 
> > 15: endif
> > 
> > 16: enddo
> > 
> > 17: call MPI_Waitall(…) ! Wait for all Irecv to complete
> > 
> > 18: call MPI_Barrier(…)
> > 
> > Cluster information:
> > 
> > 30 processors
> > 
> > Managed by slurm
> > 
> > OS: Red Hat v. 7.7
> > 
> > Thank you for help/advice you can provide,
> > 
> > Sean
> > 
> > *Sean C. Lewis*
> > 
> > Doctoral Candidate
> > 
> > Department of Physics
> > 
> > Drexel University
> > 
> 


Re: [OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-25 Thread Joseph Schuchart via users

Hi Sean,

Thanks for the report! I have a few questions/suggestions:

1) What version of Open MPI are you using?
2) What is your network? It sounds like you are on an IB cluster using 
btl/openib (which is essentially discontinued). Can you try the Open MPI 
4.0.4 release with UCX instead of openib (configure with --without-verbs 
and --with-ucx)?
3) If that does not help, can you boil your code down to a minimum 
working example? That would make it easier for people to try to 
reproduce what happens.


Cheers
Joseph

On 7/24/20 11:34 PM, Lewis,Sean via users wrote:

Hi all,

I am encountering a silent hang involving MPI_Ssend and MPI_Irecv. The 
subroutine in question is called by each processor and is structured 
similar to the pseudo code below. The subroutine is successfully called 
several thousand times before the silent hang behavior manifests and 
never resolves. The hang will occur in nearly (but not exactly) the same 
spot for bit-wise identical tests. During the hang, all MPI ranks will 
be at the Line 18 Barrier except for two. One will be waiting at Line 
17, waiting for its Irecv to complete, and the other at one of the Ssend 
Line 9 or 14. This suggests that a MPI_Irecv never completes and a 
processor is indefinitely blocked in the Ssend unable to complete the 
transfer.


I’ve found similar discussion of this kind of behavior on the OpenMPI 
mailing list: 
https://www.mail-archive.com/users@lists.open-mpi.org/msg19227.html 
ultimately resolving in setting the mca parameter btl_openib_flags to 
304 or 305 (default 310): 
https://www.mail-archive.com/users@lists.open-mpi.org/msg19277.html. I 
have seen some promising behavior by doing the same. As the mailer 
suggests, this implies a problem with the RDMA protocols in infiniband 
for large messages.


I wanted to breathe life back into this conversation as the silent hang 
issue is particularly debilitating and confusing to me. 
Increasing/decreasing the number of processors used does not seem to 
alleviate the issue, using MPI_Send results in the same behavior, 
perhaps a message has exceeded a memory limit? I am running a test now 
that reports the individual message sizes but I previously implemented a 
switch to check for buffer size discrepancies which is not triggered. In 
the meantime, has anyone run into similar issues or have thoughts as to 
remedies for this behavior?


1:  call MPI_BARRIER(…)

2:  do i = 1,nprocs

3:   if(commatrix_recv(i) .gt. 0) then ! Identify which procs to 
receive from via predefined matrix


4: call Mpi_Irecv(…)

5:   endif

6:   enddo

7:   do j = mype+1,nproc

8:   if(commatrix_send(j) .gt. 0) then ! Identify which procs to 
send to via predefined matrix


9:     MPI_Ssend(…)

10: endif

11: enddo

12: do j = 1,mype

13:  if(commatrix_send(j) .gt. 0) then ! Identify which procs to 
send to via predefined matrix


14:    MPI_Ssend(…)

15: endif

16: enddo

17: call MPI_Waitall(…) ! Wait for all Irecv to complete

18: call MPI_Barrier(…)

Cluster information:

30 processors

Managed by slurm

OS: Red Hat v. 7.7

Thank you for help/advice you can provide,

Sean

*Sean C. Lewis*

Doctoral Candidate

Department of Physics

Drexel University