Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-16 Thread Cabral, Matias A
Hey Eduardo,

Using up to date libraries is advisable, especially given that 1.4.0 is a 
couple years old.  1.6.2 is the latest on the 1.6.x line. 1.7.0 was released 
last week, however I have not played with it yet.

Thanks
_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of ROTHE 
Eduardo - externe
Sent: Wednesday, January 16, 2019 9:29 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send


Hi Matias,



thanks so much for your support!



Actually running this simple example with --mca mtl_ofi_tag_mode ofi_tag_1  
turns out to be a good choice! I mean, the following execution do  not return 
the MPI_Send error any more:



mpirun -np 2 --mca mtl_ofi_tag_mode ofi_tag_1 ./a



Are you suggesting that upgrading  libfabric   to  1.6.0   might save the day?



Regards,

Eduardo




De : users 
mailto:users-boun...@lists.open-mpi.org>> de 
la part de matias.a.cab...@intel.com 
mailto:matias.a.cab...@intel.com>>
Envoyé : mercredi 16 janvier 2019 00:54
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Hi Eduardo,


> When you say that "The OFI MTL got some new features during 2018 that went 
> into v4.0.0 but are not backported to older OMPI versions." this agrees with 
> the bahaviour that I witness - using Open MPI 3.1.3 I don't get this error. 
> Could this be related?

Yes. I suspect this may be related to the inclusion of support for 
FI_REMOTE_CQ_DATA that was added to extend scalability of OFI MTL. This went 
into 4.x, but is not in 3.1.x. In addition there is a bug in the PSM2 OFI 
provider that reports support for FI_REMOTE_CQ_DATA when it actually did not 
support the API that OMPI uses (this was fixed in libfabric 1.6.0). A quick way 
to test this would be adding  '-mca mtl_ofi_tag_mode ofi_tag_1' to your command 
line. This would force OMPI not using FI_REMOTE_CQ_DATA.

Thanks,

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of ROTHE 
Eduardo - externe
Sent: Tuesday, January 15, 2019 2:31 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send


Hi Matias,



Thank you so much for your feedback!



It's really embarrassing, but running



mpirun -np 2 -mca mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 ./a



still doesn't get the job done. I'm still getting the same MPI_Send error:



Hello World from proccess 1 out of 2
Hello World from proccess 0 out of 2
[gafront4:18272] *** An error occurred in MPI_Send
[gafront4:18272] *** reported by process [2565799937,0]
[gafront4:18272] *** on communicator MPI_COMM_WORLD
[gafront4:18272] *** MPI_ERR_OTHER: known error not in list
[gafront4:18272] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[gafront4:18272] ***and potentially your MPI job)



I'm using libfabric-1.4.0  issued from Debian Stretch with a minor modification 
to use PSM2. It can be found here:



https://github.com/scibian/libfabric/commits/scibian/opa10.7/stretch



When you say that "The OFI MTL got some new features during 2018 that went into 
v4.0.0 but are not backported to older OMPI versions." this agrees with the 
bahaviour that I witness - using Open MPI 3.1.3 I don't get this error. Could 
this be related?



Regards,

Eduardo




De : users 
mailto:users-boun...@lists.open-mpi.org>> de 
la part de matias.a.cab...@intel.com 
mailto:matias.a.cab...@intel.com>>
Envoyé : samedi 12 janvier 2019 00:32
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

BTW, just to be explicit about using the psm2 OFI provider:

/tmp> mpirun -np 2 -mca mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 
./a
Hello World from proccess 0 out of 2
This is process 0 reporting::
Hello World from proccess 1 out of 2
Process 1 received number 10 from process 0

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, 
Matias A
Sent: Friday, January 11, 2019 3:22 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Hi Eduardo,

The OFI MTL got some new features during 2018 that went into v4.0.0 but are not 
backported to older OMPI versions.

What version of libfabric are you using and where are you installing it from?  
I will try to reproduce your error. I'm running some quick tests and I see it 
working:



/tmp >ompi_info
 Package: Open MPI 
macab...@sperf-41.sc.intel.com
  Distribution
Open MPI: 4.0.0rc5
  Open MPI repo revision: v4.0.0
   Open MPI release date: Unreleased developer copy
Open RTE: 4.0.0rc5
  Open RTE repo revision: v4.0.0
   Open RTE release date: Unreleased developer copy
OPAL: 

Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

2019-01-16 Thread ROTHE Eduardo - externe
Hi Matias,


thanks so much for your support!


Actually running this simple example with --mca mtl_ofi_tag_mode ofi_tag_1  
turns out to be a good choice! I mean, the following execution do  not return 
the MPI_Send error any more:


mpirun -np 2 --mca mtl_ofi_tag_mode ofi_tag_1 ./a


Are you suggesting that upgrading  libfabric   to  1.6.0   might save the day?


Regards,

Eduardo



De : users  de la part de 
matias.a.cab...@intel.com 
Envoyé : mercredi 16 janvier 2019 00:54
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Hi Eduardo,


> When you say that "The OFI MTL got some new features during 2018 that went 
> into v4.0.0 but are not backported to older OMPI versions." this agrees with 
> the bahaviour that I witness - using Open MPI 3.1.3 I don't get this error. 
> Could this be related?

Yes. I suspect this may be related to the inclusion of support for 
FI_REMOTE_CQ_DATA that was added to extend scalability of OFI MTL. This went 
into 4.x, but is not in 3.1.x. In addition there is a bug in the PSM2 OFI 
provider that reports support for FI_REMOTE_CQ_DATA when it actually did not 
support the API that OMPI uses (this was fixed in libfabric 1.6.0). A quick way 
to test this would be adding  ‘-mca mtl_ofi_tag_mode ofi_tag_1’ to your command 
line. This would force OMPI not using FI_REMOTE_CQ_DATA.

Thanks,

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of ROTHE 
Eduardo - externe
Sent: Tuesday, January 15, 2019 2:31 AM
To: Open MPI Users 
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send


Hi Matias,



Thank you so much for your feedback!



It's really embarrassing, but running



mpirun -np 2 -mca mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 ./a



still doesn't get the job done. I'm still getting the same MPI_Send error:



Hello World from proccess 1 out of 2
Hello World from proccess 0 out of 2
[gafront4:18272] *** An error occurred in MPI_Send
[gafront4:18272] *** reported by process [2565799937,0]
[gafront4:18272] *** on communicator MPI_COMM_WORLD
[gafront4:18272] *** MPI_ERR_OTHER: known error not in list
[gafront4:18272] *** MPI_ERRORS_ARE_FATAL (processes in this communicator 
will now abort,
[gafront4:18272] ***and potentially your MPI job)



I'm using libfabric-1.4.0  issued from Debian Stretch with a minor modification 
to use PSM2. It can be found here:



https://github.com/scibian/libfabric/commits/scibian/opa10.7/stretch



When you say that "The OFI MTL got some new features during 2018 that went into 
v4.0.0 but are not backported to older OMPI versions." this agrees with the 
bahaviour that I witness - using Open MPI 3.1.3 I don't get this error. Could 
this be related?



Regards,

Eduardo




De : users 
mailto:users-boun...@lists.open-mpi.org>> de 
la part de matias.a.cab...@intel.com 
mailto:matias.a.cab...@intel.com>>
Envoyé : samedi 12 janvier 2019 00:32
À : Open MPI Users
Objet : Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

BTW, just to be explicit about using the psm2 OFI provider:

/tmp> mpirun -np 2 -mca mtl ofi -mca pml cm -mca mtl_ofi_provider_include psm2 
./a
Hello World from proccess 0 out of 2
This is process 0 reporting::
Hello World from proccess 1 out of 2
Process 1 received number 10 from process 0

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Cabral, 
Matias A
Sent: Friday, January 11, 2019 3:22 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Subject: Re: [OMPI users] Open MPI 4.0.0 - error with MPI_Send

Hi Eduardo,

The OFI MTL got some new features during 2018 that went into v4.0.0 but are not 
backported to older OMPI versions.

What version of libfabric are you using and where are you installing it from?  
I will try to reproduce your error. I’m running some quick tests and I see it 
working:



/tmp >ompi_info
 Package: Open MPI 
macab...@sperf-41.sc.intel.com
  Distribution
Open MPI: 4.0.0rc5
  Open MPI repo revision: v4.0.0
   Open MPI release date: Unreleased developer copy
Open RTE: 4.0.0rc5
  Open RTE repo revision: v4.0.0
   Open RTE release date: Unreleased developer copy
OPAL: 4.0.0rc5
  OPAL repo revision: v4.0.0
   OPAL release date: Unreleased developer copy
 MPI API: 3.1.0
Ident string: 4.0.0rc5
  Prefix: /nfs/sc/disks/fabric_work/macabral/tmp/ompi-4.0.0
Configured architecture: x86_64-unknown-linux-gnu
  Configure host: sperf-41.sc.intel.com
   Configured by: macabral
   Configured on: Fri Jan 11 17:42:06 EST 2019
  Configure host: sperf-41.sc.intel.com
  Configure command line: '--with-ofi' '--with-verbs=no'
  '--prefix=/tmp/ompi-4.0.0'
….
/tmp> rpm -qi