Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-11 Thread Cabral, Matias A
BTW, just to be explicit about using the psm2 OFI provider:

/tmp> mpirun -np 2 -mca mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 
./a
Hello World from proccess 0 out of 2
This is process 0 reporting::
Hello World from proccess 1 out of 2
Process 1 received number 10 from process 0

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, 
Matias A
Sent: Friday, January 11, 2019 3:22 PM
To: Open MPI Users 
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Hi Eduardo,

The OFI MTL got some new features during 2018 that went into v4.0.0 but are not 
backported to older OMPI versions.

What version of libfabric are you using and where are you installing it from?  
I will try to reproduce your error. I'm running some quick tests and I see it 
working:



/tmp >ompi_info
 Package: Open MPI 
macab...@sperf-41.sc.intel.com
  Distribution
Open MPI: 4.0.0rc5
  Open MPI repo revision: v4.0.0
   Open MPI release date: Unreleased developer copy
Open RTE: 4.0.0rc5
  Open RTE repo revision: v4.0.0
   Open RTE release date: Unreleased developer copy
OPAL: 4.0.0rc5
  OPAL repo revision: v4.0.0
   OPAL release date: Unreleased developer copy
 MPI API: 3.1.0
Ident string: 4.0.0rc5
  Prefix: /nfs/sc/disks/fabric_work/macabral/tmp/ompi-4.0.0
Configured architecture: x86_64-unknown-linux-gnu
  Configure host: sperf-41.sc.intel.com
   Configured by: macabral
   Configured on: Fri Jan 11 17:42:06 EST 2019
  Configure host: sperf-41.sc.intel.com
  Configure command line: '--with-ofi' '--with-verbs=no'
  '--prefix=/tmp/ompi-4.0.0'

/tmp> rpm -qi libfabric
Name: libfabric
Version : 1.6.0
Release : 80
Architecture: x86_64
Install Date: Wed 19 Dec 2018 05:45:41 PM EST
Group   : System Environment/Libraries
Size: 10131964
License : GPLv2 or BSD
Signature   : (none)
Source RPM  : libfabric-1.6.0-80.src.rpm
Build Date  : Wed 22 Aug 2018 11:08:29 PM EDT
Build Host  : ph-bld-node-27.ph.intel.com
Relocations : (not relocatable)
URL : http://www.github.com/ofiwg/libfabric
Summary : User-space RDMA Fabric Interfaces
Description :
libfabric provides a user-space API to access high-performance fabric
services, such as RDMA.

/tmp> mpirun -np 2 -mca mtl ofi -mca pml cm ./a
Hello World from proccess 0 out of 2
This is process 0 reporting::
Hello World from proccess 1 out of 2
Process 1 received number 10 from process 0


From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of ROTHE 
Eduardo - externe
Sent: Thursday, January 10, 2019 10:02 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send


Hi Gilles, thank you so much once again!

I have a success using directly the psm2 mtl. Indeed, I do not need to use the 
cm pml (I guess this might be because the cm pml gets automatically selected 
when I enforce the psm2 mtl?). So both the following two commands execute 
successfully with Open MPI 4.0.0:

  > mpirun --mca pml cm --mca mtl psm2 -np 2 ./a.out
  > mpirun --mca mtl psm2 -np 2 ./a.out

The error persists using libfabric. The following command returns the MPI_Send 
error:

  > mpirun --mca pml cm --mca mtl ofi -np 2 ./a.out

It seems the problem sits between libfabric and Open MPI 4.0.0 (remember, I 
don't see the same behaviour with Open MPI 3.1.3). So I guess if I want to use 
libfabric I will have to dig a bit more regarding the interface between this 
library and Open MPI 4.0.0. Any lines of thought on how to start here would be 
(very!) appreciated.

If you have any scheme that would help me to understand the framework/modules 
architecture and why some modules are automatically selected (like the above 
case), I would be very pleased (even more!?:).

Regards,
Eduardo




De : users 
mailto:users-boun...@lists.open-mpi.org>> de 
la part de gilles.gouaillar...@gmail.com 
mailto:gilles.gouaillar...@gmail.com>>
Envoyé : jeudi 10 janvier 2019 13:51
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Eduardo,

You have two options to use OmniPath

- "directly" via the psm2 mtl
mpirun -mca pml cm -mca mtl psm2 ...

- "indirectly" via libfabric
mpirun -mca pml cm -mca mtl ofi ...

I do invite you to try both. By explicitly requesting the mtl you will avoid 
potential conflicts.

libfabric is used in production by Cisco and AWS (both major contributors to 
both Open MPI and libfabric) so this is clearly not something to stay away 
from. That being said, bug always happen and they could be related to Open MPI, 
libfabric and/or OmniPath (and fwiw, Intel is a major contributor to libfabric 
too)

Cheers,

Gilles

On Thursday, January 10, 2019, ROTHE Ed

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-11 Thread Cabral, Matias A
Hi Eduardo,

The OFI MTL got some new features during 2018 that went into v4.0.0 but are not 
backported to older OMPI versions.

What version of libfabric are you using and where are you installing it from?  
I will try to reproduce your error. I'm running some quick tests and I see it 
working:


/tmp >ompi_info
 Package: Open MPI macab...@sperf-41.sc.intel.com
  Distribution
Open MPI: 4.0.0rc5
  Open MPI repo revision: v4.0.0
   Open MPI release date: Unreleased developer copy
Open RTE: 4.0.0rc5
  Open RTE repo revision: v4.0.0
   Open RTE release date: Unreleased developer copy
OPAL: 4.0.0rc5
  OPAL repo revision: v4.0.0
   OPAL release date: Unreleased developer copy
 MPI API: 3.1.0
Ident string: 4.0.0rc5
  Prefix: /nfs/sc/disks/fabric_work/macabral/tmp/ompi-4.0.0
Configured architecture: x86_64-unknown-linux-gnu
  Configure host: sperf-41.sc.intel.com
   Configured by: macabral
   Configured on: Fri Jan 11 17:42:06 EST 2019
  Configure host: sperf-41.sc.intel.com
  Configure command line: '--with-ofi' '--with-verbs=no'
  '--prefix=/tmp/ompi-4.0.0'

/tmp> rpm -qi libfabric
Name: libfabric
Version : 1.6.0
Release : 80
Architecture: x86_64
Install Date: Wed 19 Dec 2018 05:45:41 PM EST
Group   : System Environment/Libraries
Size: 10131964
License : GPLv2 or BSD
Signature   : (none)
Source RPM  : libfabric-1.6.0-80.src.rpm
Build Date  : Wed 22 Aug 2018 11:08:29 PM EDT
Build Host  : ph-bld-node-27.ph.intel.com
Relocations : (not relocatable)
URL : http://www.github.com/ofiwg/libfabric
Summary : User-space RDMA Fabric Interfaces
Description :
libfabric provides a user-space API to access high-performance fabric
services, such as RDMA.

/tmp> mpirun -np 2 -mca mtl ofi -mca pml cm ./a
Hello World from proccess 0 out of 2
This is process 0 reporting::
Hello World from proccess 1 out of 2
Process 1 received number 10 from process 0


From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of ROTHE 
Eduardo - externe
Sent: Thursday, January 10, 2019 10:02 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send


Hi Gilles, thank you so much once again!

I have a success using directly the psm2 mtl. Indeed, I do not need to use the 
cm pml (I guess this might be because the cm pml gets automatically selected 
when I enforce the psm2 mtl?). So both the following two commands execute 
successfully with Open MPI 4.0.0:

  > mpirun --mca pml cm --mca mtl psm2 -np 2 ./a.out
  > mpirun --mca mtl psm2 -np 2 ./a.out

The error persists using libfabric. The following command returns the MPI_Send 
error:

  > mpirun --mca pml cm --mca mtl ofi -np 2 ./a.out

It seems the problem sits between libfabric and Open MPI 4.0.0 (remember, I 
don't see the same behaviour with Open MPI 3.1.3). So I guess if I want to use 
libfabric I will have to dig a bit more regarding the interface between this 
library and Open MPI 4.0.0. Any lines of thought on how to start here would be 
(very!) appreciated.

If you have any scheme that would help me to understand the framework/modules 
architecture and why some modules are automatically selected (like the above 
case), I would be very pleased (even more!?:).

Regards,
Eduardo




De : users 
mailto:users-boun...@lists.open-mpi.org>> de 
la part de gilles.gouaillar...@gmail.com 
mailto:gilles.gouaillar...@gmail.com>>
Envoyé : jeudi 10 janvier 2019 13:51
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Eduardo,

You have two options to use OmniPath

- "directly" via the psm2 mtl
mpirun -mca pml cm -mca mtl psm2 ...

- "indirectly" via libfabric
mpirun -mca pml cm -mca mtl ofi ...

I do invite you to try both. By explicitly requesting the mtl you will avoid 
potential conflicts.

libfabric is used in production by Cisco and AWS (both major contributors to 
both Open MPI and libfabric) so this is clearly not something to stay away 
from. That being said, bug always happen and they could be related to Open MPI, 
libfabric and/or OmniPath (and fwiw, Intel is a major contributor to libfabric 
too)

Cheers,

Gilles

On Thursday, January 10, 2019, ROTHE Eduardo - externe 
mailto:eduardo-externe.ro...@edf.fr>> wrote:

Hi Gilles, thank you so much for your support!

For now I'm just testing the software, so it's running on a single node.

Your suggestion was very precise. In fact, choosing the ob1 component leads to 
a successfull execution! The tcp component had no effect.

mpirun --mca pml ob1 -mca btl tcp,self -np 2 ./a.out > Success
mpirun --mca pml ob1 -np 2 ./a.out > Success

But... our cluster is equiped with Intel OMNI Path interconnects and we are 
aiming to use psm2 through ofi component in order to take full adv