[OMPI users] Silent hangs with MPI_Ssend and MPI_Irecv

2020-07-24 Thread Lewis,Sean via users
Hi all,

I am encountering a silent hang involving MPI_Ssend and MPI_Irecv. The 
subroutine in question is called by each processor and is structured similar to 
the pseudo code below. The subroutine is successfully called several thousand 
times before the silent hang behavior manifests and never resolves. The hang 
will occur in nearly (but not exactly) the same spot for bit-wise identical 
tests. During the hang, all MPI ranks will be at the Line 18 Barrier except for 
two. One will be waiting at Line 17, waiting for its Irecv to complete, and the 
other at one of the Ssend Line 9 or 14. This suggests that a MPI_Irecv never 
completes and a processor is indefinitely blocked in the Ssend unable to 
complete the transfer.

I’ve found similar discussion of this kind of behavior on the OpenMPI mailing 
list: https://www.mail-archive.com/users@lists.open-mpi.org/msg19227.html 
ultimately resolving in setting the mca parameter btl_openib_flags to 304 or 
305 (default 310): 
https://www.mail-archive.com/users@lists.open-mpi.org/msg19277.html. I have 
seen some promising behavior by doing the same. As the mailer suggests, this 
implies a problem with the RDMA protocols in infiniband for large messages.

I wanted to breathe life back into this conversation as the silent hang issue 
is particularly debilitating and confusing to me. Increasing/decreasing the 
number of processors used does not seem to alleviate the issue, using MPI_Send 
results in the same behavior, perhaps a message has exceeded a memory limit? I 
am running a test now that reports the individual message sizes but I 
previously implemented a switch to check for buffer size discrepancies which is 
not triggered. In the meantime, has anyone run into similar issues or have 
thoughts as to remedies for this behavior?

1:  call MPI_BARRIER(…)
2:  do i = 1,nprocs
3:   if(commatrix_recv(i) .gt. 0) then ! Identify which procs to receive 
from via predefined matrix
4: call Mpi_Irecv(…)
5:   endif
6:   enddo
7:   do j = mype+1,nproc
8:   if(commatrix_send(j) .gt. 0) then ! Identify which procs to send to 
via predefined matrix
9: MPI_Ssend(…)
10: endif
11: enddo
12: do j = 1,mype
13:  if(commatrix_send(j) .gt. 0) then ! Identify which procs to send to 
via predefined matrix
14:MPI_Ssend(…)
15: endif
16: enddo
17: call MPI_Waitall(…) ! Wait for all Irecv to complete
18: call MPI_Barrier(…)

Cluster information:
30 processors
Managed by slurm
OS: Red Hat v. 7.7

Thank you for help/advice you can provide,
Sean

Sean C. Lewis
Doctoral Candidate
Department of Physics
Drexel University


Re: [OMPI users] Moving an installation

2020-07-24 Thread Reuti via users
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

Am 24.07.2020 um 18:55 schrieb Lana Deere via users:

> I have open-mpi 4.0.4 installed on my desktop and my small test programs are 
> working.
> 
> I would like to migrate the open-mpi to a cluster and run a larger program 
> there.  When moved, the open-mpi installation is in a different pathname than 
> it was on my desktop and it doesn't seem to work any longer.  I can make the 
> libraries visible via LD_LIBRARY_PATH but this seems insufficient.  Is there 
> an environment variable which can be used to tell the open-mpi where it is 
> installed?

There is OPAL_PREFIX to be set:

https://www.open-mpi.org/faq/?category=building#installdirs

- -- Reuti
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAl8bIa0ACgkQo/GbGkBRnRrGywCgj5PHSKdMRwSx3jVB4en+wbmV
yG8AniYxICQCHrAsxg/Mbx59YpC9ElvW
=y8nX
-END PGP SIGNATURE-


Re: [OMPI users] Moving an installation

2020-07-24 Thread Steven Varga via users
Hi
Currently i am approaching a similar problem/workflow with spack and an AWS
S3 shared storage. Mounting the storage from a laptop gives you same layout
as on each node of my AWC EC2 cluster.

As others mentioned before: you still have to recompile your work, to take
advantage of the XEON class cpu-s.

This should not be a problem as SLURM can distribute your appilication;
however on a proper cluster you already have a parallel filesystem running
so all nodes can run the mpi application.

Steve

On Fri., Jul. 24, 2020, 13:00 Lana Deere via users, <
users@lists.open-mpi.org> wrote:

> I have open-mpi 4.0.4 installed on my desktop and my small test programs
> are working.
>
> I would like to migrate the open-mpi to a cluster and run a larger program
> there.  When moved, the open-mpi installation is in a different pathname
> than it was on my desktop and it doesn't seem to work any longer.  I can
> make the libraries visible via LD_LIBRARY_PATH but this seems
> insufficient.  Is there an environment variable which can be used to tell
> the open-mpi where it is installed?
>
> Is it mandatory to actually compile the release in the ultimate
> destination on each system where it will be used?
>
> Thanks.
>
> .. Lana (lana.de...@gmail.com)
>
>
>


Re: [OMPI users] Moving an installation

2020-07-24 Thread Gus Correa via users
+1
In my experience moving software, especially something of the complexity of
(Open) MPI,
is much more troublesome (and often just useless frustration) and time
consuming than recompiling it.
Hardware, OS, kernel, libraries, etc, are unlikely to be compatible.
Gus Correa

On Fri, Jul 24, 2020 at 1:03 PM Ralph Castain via users <
users@lists.open-mpi.org> wrote:

> While possible, it is highly unlikely that your desktop version is going
> to be binary compatible with your cluster...
>
> On Jul 24, 2020, at 9:55 AM, Lana Deere via users <
> users@lists.open-mpi.org> wrote:
>
> I have open-mpi 4.0.4 installed on my desktop and my small test programs
> are working.
>
> I would like to migrate the open-mpi to a cluster and run a larger program
> there.  When moved, the open-mpi installation is in a different pathname
> than it was on my desktop and it doesn't seem to work any longer.  I can
> make the libraries visible via LD_LIBRARY_PATH but this seems
> insufficient.  Is there an environment variable which can be used to tell
> the open-mpi where it is installed?
>
> Is it mandatory to actually compile the release in the ultimate
> destination on each system where it will be used?
>
> Thanks.
>
> .. Lana (lana.de...@gmail.com)
>
>
>
>


Re: [OMPI users] Moving an installation

2020-07-24 Thread Benson Muite via users

On 7/24/20 7:55 PM, Lana Deere via users wrote:
I have open-mpi 4.0.4 installed on my desktop and my small test programs 
are working.


I would like to migrate the open-mpi to a cluster and run a larger 
program there.  When moved, the open-mpi installation is in a different 
pathname than it was on my desktop and it doesn't seem to work any 
longer.  I can make the libraries visible via LD_LIBRARY_PATH but this 
seems insufficient.  Is there an environment variable which can be used 
to tell the open-mpi where it is installed?


Is it mandatory to actually compile the release in the ultimate 
destination on each system where it will be used?


Thanks.

.. Lana (lana.de...@gmail.com )


You may want to install OpenMPI on the cluster directly if it is not 
already installed, or if you need a specific version - most versions 
should likely work ok unless your code uses something very new or that 
has been deprecated. This will then use the appropriate libraries. You 
will then likely want to compile your program again.


Re: [OMPI users] Moving an installation

2020-07-24 Thread Ralph Castain via users
While possible, it is highly unlikely that your desktop version is going to be 
binary compatible with your cluster...

On Jul 24, 2020, at 9:55 AM, Lana Deere via users mailto:users@lists.open-mpi.org> > wrote:

I have open-mpi 4.0.4 installed on my desktop and my small test programs are 
working.

I would like to migrate the open-mpi to a cluster and run a larger program 
there.  When moved, the open-mpi installation is in a different pathname than 
it was on my desktop and it doesn't seem to work any longer.  I can make the 
libraries visible via LD_LIBRARY_PATH but this seems insufficient.  Is there an 
environment variable which can be used to tell the open-mpi where it is 
installed?

Is it mandatory to actually compile the release in the ultimate destination on 
each system where it will be used?

Thanks.

.. Lana (lana.de...@gmail.com  )





[OMPI users] Moving an installation

2020-07-24 Thread Lana Deere via users
I have open-mpi 4.0.4 installed on my desktop and my small test programs
are working.

I would like to migrate the open-mpi to a cluster and run a larger program
there.  When moved, the open-mpi installation is in a different pathname
than it was on my desktop and it doesn't seem to work any longer.  I can
make the libraries visible via LD_LIBRARY_PATH but this seems
insufficient.  Is there an environment variable which can be used to tell
the open-mpi where it is installed?

Is it mandatory to actually compile the release in the ultimate destination
on each system where it will be used?

Thanks.

.. Lana (lana.de...@gmail.com)


Re: [OMPI users] MPI test suite

2020-07-24 Thread Zhang, Junchao via users
Hi, Chris,
  The website you gave is almost empty.  svn checkout 
https://scm.projects.hlrs.de/anonscm/svn/mpitestsuite/ does not work.
  Our code uses MPI point to point, collectives, and communicator, attributes, 
basically MPI-2.1 stuff.

  Thanks
--Junchao Zhang



On Jul 24, 2020, at 2:34 AM, Christoph Niethammer 
mailto:nietham...@hlrs.de>> wrote:

Hello,

What do you wanne test in detail?

If you are interested in testing combinations of datatypes and communicators 
the mpi_test_suite [1] may be of interest for you.

Best
Christoph Niethammer

[1] https://projects.hlrs.de/projects/mpitestsuite/



- Original Message -
From: "Open MPI Users" 
mailto:users@lists.open-mpi.org>>
To: "Open MPI Users" mailto:users@lists.open-mpi.org>>
Cc: "Zhang, Junchao" mailto:jczh...@mcs.anl.gov>>
Sent: Thursday, 23 July, 2020 22:25:18
Subject: Re: [OMPI users] MPI test suite

I know OSU micro-benchmarks.  But it is not an extensive test suite.

Thanks
--Junchao Zhang



On Jul 23, 2020, at 2:00 PM, Marco Atzeri via users 
mailto:users@lists.open-mpi.org>> wrote:

On 23.07.2020 20:28, Zhang, Junchao via users wrote:
Hello,
 Does OMPI have a test suite that can let me validate MPI implementations from 
other vendors?
 Thanks
--Junchao Zhang

Have you considered the OSU Micro-Benchmarks ?

http://mvapich.cse.ohio-state.edu/benchmarks/



Re: [OMPI users] MPI test suite

2020-07-24 Thread Christoph Niethammer via users
Hi,

MTT is a testing infrastructure to automate building MPI libraries and tests, 
running tests and collecting test results but does not come with MPI testsuites 
itself.

Best
Christoph

- Original Message -
From: "Open MPI Users" 
To: "Open MPI Users" 
Cc: "Joseph Schuchart" 
Sent: Friday, 24 July, 2020 09:00:34
Subject: Re: [OMPI users] MPI test suite

You may want to look into MTT: https://github.com/open-mpi/mtt

Cheers
Joseph

On 7/23/20 8:28 PM, Zhang, Junchao via users wrote:
> Hello,
>    Does OMPI have a test suite that can let me validate MPI 
> implementations from other vendors?
> 
>    Thanks
> --Junchao Zhang
> 
> 
>


Re: [OMPI users] MPI test suite

2020-07-24 Thread Christoph Niethammer via users
Hello,

What do you wanne test in detail?

If you are interested in testing combinations of datatypes and communicators 
the mpi_test_suite [1] may be of interest for you.

Best
Christoph Niethammer

[1] https://projects.hlrs.de/projects/mpitestsuite/



- Original Message -
From: "Open MPI Users" 
To: "Open MPI Users" 
Cc: "Zhang, Junchao" 
Sent: Thursday, 23 July, 2020 22:25:18
Subject: Re: [OMPI users] MPI test suite

I know OSU micro-benchmarks.  But it is not an extensive test suite.

Thanks
--Junchao Zhang



> On Jul 23, 2020, at 2:00 PM, Marco Atzeri via users 
>  wrote:
> 
> On 23.07.2020 20:28, Zhang, Junchao via users wrote:
>> Hello,
>>   Does OMPI have a test suite that can let me validate MPI implementations 
>> from other vendors?
>>   Thanks
>> --Junchao Zhang
> 
> Have you considered the OSU Micro-Benchmarks ?
> 
> http://mvapich.cse.ohio-state.edu/benchmarks/


Re: [OMPI users] MPI test suite

2020-07-24 Thread Joseph Schuchart via users

You may want to look into MTT: https://github.com/open-mpi/mtt

Cheers
Joseph

On 7/23/20 8:28 PM, Zhang, Junchao via users wrote:

Hello,
   Does OMPI have a test suite that can let me validate MPI 
implementations from other vendors?


   Thanks
--Junchao Zhang