Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Aw, sheesh.  Thanks.  Somehow I missed that despite being on the page - lack of 
focus,  I guess.

Best,

Charlie

> On Jun 14, 2018, at 4:38 PM, Pavel Shamis  wrote:
> 
> You just have to switch PML to UCX.
> You have some example of the command line here: 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openucx_ucx_wiki_OpenMPI-2Dand-2DOpenSHMEM-2Dinstallation-2Dwith-2DUCX=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8=AtjVGlnk5Sxl6o7bcEa0LnFxgfmLD0qjnMKDqvn085s=2BFC-oRO8l3PwqI2eZFfGzFCa4eVxg8xmlx3adKzjug=
>  
> 
> Best,
> P.
> 
> 
> On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor  > wrote:
> Hmmm.  ompi_info only shows the ucx pml.  I don’t see any “transports”.   
> Will they show up somewhere or are they documented.   Right now it looks like 
> the only UCX related thing I can do with openmpi 3.1.0 is
> 
> export OMPI_MCA_pml=ucx
> mpiexec ….
> 
> From ompi_info…
> 
> $ ompi_info --param all all  | more | grep ucx
>  MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
>  MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)
> 
> I’m assuming there is more to it than that.
> 
> Regards,
> 
> Charlie
> 
> 
> > On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users 
> > mailto:users@lists.open-mpi.org>> wrote:
> > 
> > Charles --
> > 
> > It may have gotten lost in the middle of this thread, but the 
> > vendor-recommended way of running on InfiniBand these days is with UCX.  
> > I.e., install OpenUCX and use one of the UCX transports in Open MPI.  
> > Unless you have special requirements, you should likely give this a try and 
> > see if it works for you.
> > 
> > The libfabric / verbs combo *may* work, but I don't know how robust the 
> > verbs libfabric support was in the v1.5 release series.
> > 
> > 
> >> On Jun 14, 2018, at 10:01 AM, Charles A Taylor  >> > wrote:
> >> 
> >> Hi Matias,
> >> 
> >> Thanks for the response.  
> >> 
> >> As of a couple of hours ago we are running: 
> >> 
> >>   libfabric-devel-1.5.3-1.el7.x86_64
> >>   libfabric-1.5.3-1.el7.x86_64
> >> 
> >> As for the provider, I saw that one but just listed “verbs”.  I’ll go with 
> >> the “verbs;ofi_rxm” going forward.
> >> 
> >> Regards,
> >> 
> >> Charlie
> >> 
> >> 
> >>> On Jun 14, 2018, at 12:49 PM, Cabral, Matias A  >>> > wrote:
> >>> 
> >>> Hi Charles,
> >>> 
> >>> What version of libfabric do you have installed? To run OMPI using the 
> >>> verbs provider you need to pair it with the ofi_rxm provider. fi_info 
> >>> should list it like:
> >>> …
> >>> provider: verbs;ofi_rxm
> >>> …
> >>> 
> >>> So in your command line you have to specify:
> >>> mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
> >>> “verbs;ofi_rxm”  ….
> >>> 
> >>> (don’t skip the quotes)
> >>> 
> >> 
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org 
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=
> >>  
> >> 
> > 
> > 
> > -- 
> > Jeff Squyres
> > jsquy...@cisco.com 
> > 
> > ___
> > users mailing list
> > users@lists.open-mpi.org 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=
> >  
> > 
> 
> ___
> users mailing list
> users@lists.open-mpi.org 
> 

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Pavel Shamis
You just have to switch PML to UCX.
You have some example of the command line here:
https://github.com/openucx/ucx/wiki/OpenMPI-and-OpenSHMEM-installation-with-UCX
Best,
P.


On Thu, Jun 14, 2018 at 3:25 PM Charles A Taylor  wrote:

> Hmmm.  ompi_info only shows the ucx pml.  I don’t see any “transports”.
>  Will they show up somewhere or are they documented.   Right now it looks
> like the only UCX related thing I can do with openmpi 3.1.0 is
>
> export OMPI_MCA_pml=ucx
> mpiexec ….
>
> From ompi_info…
>
> $ ompi_info --param all all  | more | grep ucx
>  MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
>  MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)
>
> I’m assuming there is more to it than that.
>
> Regards,
>
> Charlie
>
>
> > On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users <
> users@lists.open-mpi.org> wrote:
> >
> > Charles --
> >
> > It may have gotten lost in the middle of this thread, but the
> vendor-recommended way of running on InfiniBand these days is with UCX.
> I.e., install OpenUCX and use one of the UCX transports in Open MPI.
> Unless you have special requirements, you should likely give this a try and
> see if it works for you.
> >
> > The libfabric / verbs combo *may* work, but I don't know how robust the
> verbs libfabric support was in the v1.5 release series.
> >
> >
> >> On Jun 14, 2018, at 10:01 AM, Charles A Taylor  wrote:
> >>
> >> Hi Matias,
> >>
> >> Thanks for the response.
> >>
> >> As of a couple of hours ago we are running:
> >>
> >>   libfabric-devel-1.5.3-1.el7.x86_64
> >>   libfabric-1.5.3-1.el7.x86_64
> >>
> >> As for the provider, I saw that one but just listed “verbs”.  I’ll go
> with the “verbs;ofi_rxm” going forward.
> >>
> >> Regards,
> >>
> >> Charlie
> >>
> >>
> >>> On Jun 14, 2018, at 12:49 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
> >>>
> >>> Hi Charles,
> >>>
> >>> What version of libfabric do you have installed? To run OMPI using the
> verbs provider you need to pair it with the ofi_rxm provider. fi_info
> should list it like:
> >>> …
> >>> provider: verbs;ofi_rxm
> >>> …
> >>>
> >>> So in your command line you have to specify:
> >>> mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include
> “verbs;ofi_rxm”  ….
> >>>
> >>> (don’t skip the quotes)
> >>>
> >>
> >> ___
> >> users mailing list
> >> users@lists.open-mpi.org
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=
> >
> >
> > --
> > Jeff Squyres
> > jsquy...@cisco.com
> >
> > ___
> > users mailing list
> > users@lists.open-mpi.org
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=
>
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hmmm.  ompi_info only shows the ucx pml.  I don’t see any “transports”.   Will 
they show up somewhere or are they documented.   Right now it looks like the 
only UCX related thing I can do with openmpi 3.1.0 is

export OMPI_MCA_pml=ucx
mpiexec ….

From ompi_info…

$ ompi_info --param all all  | more | grep ucx
 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v3.1.0)
 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v3.1.0)

I’m assuming there is more to it than that.

Regards,

Charlie


> On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users 
>  wrote:
> 
> Charles --
> 
> It may have gotten lost in the middle of this thread, but the 
> vendor-recommended way of running on InfiniBand these days is with UCX.  
> I.e., install OpenUCX and use one of the UCX transports in Open MPI.  Unless 
> you have special requirements, you should likely give this a try and see if 
> it works for you.
> 
> The libfabric / verbs combo *may* work, but I don't know how robust the verbs 
> libfabric support was in the v1.5 release series.
> 
> 
>> On Jun 14, 2018, at 10:01 AM, Charles A Taylor  wrote:
>> 
>> Hi Matias,
>> 
>> Thanks for the response.  
>> 
>> As of a couple of hours ago we are running: 
>> 
>>   libfabric-devel-1.5.3-1.el7.x86_64
>>   libfabric-1.5.3-1.el7.x86_64
>> 
>> As for the provider, I saw that one but just listed “verbs”.  I’ll go with 
>> the “verbs;ofi_rxm” going forward.
>> 
>> Regards,
>> 
>> Charlie
>> 
>> 
>>> On Jun 14, 2018, at 12:49 PM, Cabral, Matias A  
>>> wrote:
>>> 
>>> Hi Charles,
>>> 
>>> What version of libfabric do you have installed? To run OMPI using the 
>>> verbs provider you need to pair it with the ofi_rxm provider. fi_info 
>>> should list it like:
>>> …
>>> provider: verbs;ofi_rxm
>>> …
>>> 
>>> So in your command line you have to specify:
>>> mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
>>> “verbs;ofi_rxm”  ….
>>> 
>>> (don’t skip the quotes)
>>> 
>> 
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIGaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=6DdoqVoTIfPtbcYwMs5Kf4wAb1E-3ip44LC0DodP-qM=Tj45vOxdXErSAFSkD9LEyWCCMfBkS345sgPIqmLRy5c=

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hey Jeff, 

I will help with the OFI part. 

Thanks,
_MAC

-Original Message-
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Thursday, June 14, 2018 12:50 PM
To: Open MPI User's List 
Cc: Jeff Squyres (jsquyres) 
Subject: Re: [OMPI users] A couple of general questions

Yeah, keeping the documentation / FAQ up to date is... difficult.  :-(

We could definitely use some help with that.

Does anyone have some cycles to help update our FAQ, perchance?


> On Jun 14, 2018, at 11:08 AM, Charles A Taylor  wrote:
> 
> Thank you, Jeff.
> 
> The ofi MTL with the verbs provider seems to be working well at the moment.  
> I’ll need to let it run a day or so before I know whether we can avoid the 
> deadlocks experienced with the straight openib BTL.
> 
> I’ve also built-in UCX support so I’ll be trying that next.  
> 
> Again, thanks for the response.
> 
> Oh, before I forget and I hope this doesn’t sound snarky, but how does the 
> community find out that things like UCX and libfabric exist as well as how to 
> use them when the FAQs on open-mpi.org don’t have much information beyond the 
> now ancient 1.8 series?   Afterall, this is hardly your typical “mpiexec” 
> command line…
> 
>  mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
> “verbs;ofi_rxm ...”  ,
> 
> if you get my drift.  Even google doesn’t seem to know all that much about 
> these things.  I’m feeling more than a little ignorant these days.  :)
> 
> Thanks to all for the responses.  It has been a huge help.
> 
> Charlie
> 
>> On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users 
>>  wrote:
>> 
>> Charles --
>> 
>> It may have gotten lost in the middle of this thread, but the 
>> vendor-recommended way of running on InfiniBand these days is with UCX.  
>> I.e., install OpenUCX and use one of the UCX transports in Open MPI.  Unless 
>> you have special requirements, you should likely give this a try and see if 
>> it works for you.
>> 
>> The libfabric / verbs combo *may* work, but I don't know how robust the 
>> verbs libfabric support was in the v1.5 release series.
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
Yeah, keeping the documentation / FAQ up to date is... difficult.  :-(

We could definitely use some help with that.

Does anyone have some cycles to help update our FAQ, perchance?


> On Jun 14, 2018, at 11:08 AM, Charles A Taylor  wrote:
> 
> Thank you, Jeff.
> 
> The ofi MTL with the verbs provider seems to be working well at the moment.  
> I’ll need to let it run a day or so before I know whether we can avoid the 
> deadlocks experienced with the straight openib BTL.
> 
> I’ve also built-in UCX support so I’ll be trying that next.  
> 
> Again, thanks for the response.
> 
> Oh, before I forget and I hope this doesn’t sound snarky, but how does the 
> community find out that things like UCX and libfabric exist as well as how to 
> use them when the FAQs on open-mpi.org don’t have much information beyond the 
> now ancient 1.8 series?   Afterall, this is hardly your typical “mpiexec” 
> command line…
> 
>  mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
> “verbs;ofi_rxm ...”  ,
> 
> if you get my drift.  Even google doesn’t seem to know all that much about 
> these things.  I’m feeling more than a little ignorant these days.  :)
> 
> Thanks to all for the responses.  It has been a huge help.
> 
> Charlie
> 
>> On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users 
>>  wrote:
>> 
>> Charles --
>> 
>> It may have gotten lost in the middle of this thread, but the 
>> vendor-recommended way of running on InfiniBand these days is with UCX.  
>> I.e., install OpenUCX and use one of the UCX transports in Open MPI.  Unless 
>> you have special requirements, you should likely give this a try and see if 
>> it works for you.
>> 
>> The libfabric / verbs combo *may* work, but I don't know how robust the 
>> verbs libfabric support was in the v1.5 release series.
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Thank you, Jeff.

The ofi MTL with the verbs provider seems to be working well at the moment.  
I’ll need to let it run a day or so before I know whether we can avoid the 
deadlocks experienced with the straight openib BTL.

I’ve also built-in UCX support so I’ll be trying that next.  

Again, thanks for the response.

Oh, before I forget and I hope this doesn’t sound snarky, but how does the 
community find out that things like UCX and libfabric exist as well as how to 
use them when the FAQs on open-mpi.org 

 don’t have much information beyond the now ancient 1.8 series?   Afterall, 
this is hardly your typical “mpiexec” command line…

 mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
“verbs;ofi_rxm ...”  ,

if you get my drift.  Even google doesn’t seem to know all that much about 
these things.  I’m feeling more than a little ignorant these days.  :)

Thanks to all for the responses.  It has been a huge help.

Charlie

> On Jun 14, 2018, at 1:18 PM, Jeff Squyres (jsquyres) via users 
>  wrote:
> 
> Charles --
> 
> It may have gotten lost in the middle of this thread, but the 
> vendor-recommended way of running on InfiniBand these days is with UCX.  
> I.e., install OpenUCX and use one of the UCX transports in Open MPI.  Unless 
> you have special requirements, you should likely give this a try and see if 
> it works for you.
> 
> The libfabric / verbs combo *may* work, but I don't know how robust the verbs 
> libfabric support was in the v1.5 release series.

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Jeff Squyres (jsquyres) via users
Charles --

It may have gotten lost in the middle of this thread, but the 
vendor-recommended way of running on InfiniBand these days is with UCX.  I.e., 
install OpenUCX and use one of the UCX transports in Open MPI.  Unless you have 
special requirements, you should likely give this a try and see if it works for 
you.

The libfabric / verbs combo *may* work, but I don't know how robust the verbs 
libfabric support was in the v1.5 release series.


> On Jun 14, 2018, at 10:01 AM, Charles A Taylor  wrote:
> 
> Hi Matias,
> 
> Thanks for the response.  
> 
> As of a couple of hours ago we are running: 
> 
>libfabric-devel-1.5.3-1.el7.x86_64
>libfabric-1.5.3-1.el7.x86_64
> 
> As for the provider, I saw that one but just listed “verbs”.  I’ll go with 
> the “verbs;ofi_rxm” going forward.
> 
> Regards,
> 
> Charlie
> 
> 
>> On Jun 14, 2018, at 12:49 PM, Cabral, Matias A  
>> wrote:
>> 
>> Hi Charles,
>>  
>> What version of libfabric do you have installed? To run OMPI using the verbs 
>> provider you need to pair it with the ofi_rxm provider. fi_info should list 
>> it like:
>> …
>> provider: verbs;ofi_rxm
>> …
>>  
>> So in your command line you have to specify:
>> mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include 
>> “verbs;ofi_rxm”  ….
>>  
>> (don’t skip the quotes)
>>  
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Hi Matias,

Thanks for the response.  

As of a couple of hours ago we are running: 

   libfabric-devel-1.5.3-1.el7.x86_64
   libfabric-1.5.3-1.el7.x86_64

As for the provider, I saw that one but just listed “verbs”.  I’ll go with the 
“verbs;ofi_rxm” going forward.

Regards,

Charlie


> On Jun 14, 2018, at 12:49 PM, Cabral, Matias A  
> wrote:
> 
> Hi Charles, <>
>  
> What version of libfabric do you have installed? To run OMPI using the verbs 
> provider you need to pair it with the ofi_rxm provider. fi_info should list 
> it like:
> …
> provider: verbs;ofi_rxm
> …
>  
> So in your command line you have to specify:
> mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm” 
>  ….
>  
> (don’t skip the quotes)
>  

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Cabral, Matias A
Hi Charles,

What version of libfabric do you have installed? To run OMPI using the verbs 
provider you need to pair it with the ofi_rxm provider. fi_info should list it 
like:
…
provider: verbs;ofi_rxm
…

So in your command line you have to specify:
mpirun -mca pml cm -mca mtl ofi -mca mtl_ofi_provider_include “verbs;ofi_rxm”  
….

(don’t skip the quotes)


> Unfortunately, the openmpi-org website FAQ’s covering OpenFabrics support 
> don’t mention anything beyond OpenMPI 1.8.
Good feedback, I’ll look to see how this could be improved.

Thanks,

_MAC

From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Charles A 
Taylor
Sent: Thursday, June 14, 2018 6:09 AM
To: Open MPI Users 
Subject: Re: [OMPI users] A couple of general questions

FYI…

GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: 
fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed.

GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008.  
Backtrace:
/usr/lib64/libc.so.6(gsignal+0x37)[0x2add5835c1f7]
/usr/lib64/libc.so.6(abort+0x148)[0x2add5835d8e8]
/usr/lib64/libc.so.6(+0x2e266)[0x2add58355266]
/usr/lib64/libc.so.6(+0x2e312)[0x2add58355312]
/lib64/libfabric.so.1(+0x4df43)[0x2add5b87df43]
/lib64/libfabric.so.1(+0x43af2)[0x2add5b873af2]
/lib64/libfabric.so.1(+0x43ea9)[0x2add5b873ea9]



On Jun 14, 2018, at 7:48 AM, Howard Pritchard 
mailto:hpprit...@gmail.com>> wrote:

Hello Charles

You are heading in the right direction.

First you might want to run the libfabric fi_info command to see what 
capabilities you picked up from the libfabric RPMs.

Next you may well not actually be using the OFI  mtl.

Could you run your app with

export OMPI_MCA_mtl_base_verbose=100

and post the output?

It would also help if you described the system you are using :  OS interconnect 
cpu type etc.

Howard

Charles A Taylor mailto:chas...@ufl.edu>> schrieb am Do. 14. 
Juni 2018 um 06:36:
Because of the issues we are having with OpenMPI and the openib BTL (questions 
previously asked), I’ve been looking into what other transports are available.  
I was particularly interested in OFI/libfabric support but cannot find any 
information on it more recent than a reference to the usNIC BTL from 2015 (Jeff 
Squyres, Cisco).  Unfortunately, the openmpi-org website FAQ’s covering 
OpenFabrics support don’t mention anything beyond OpenMPI 1.8.  Given that 3.1 
is the current stable version, that seems odd.

That being the case, I thought I’d ask here. After laying down the 
libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up 
with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi and 
OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so.   
(mpi_waitall() higher up the stack).

GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.  
Backtrace:
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]

Questions: Am I using the OFI MTL as intended?   Should there be an “ofi” BTL?  
 Does anyone use this?

Thanks,

Charlie Taylor
UF Research Computing

PS - If you could use some help updating the FAQs, I’d be willing to put in 
some time.  I’d probably learn a lot.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwICAg=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk=

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
FYI…

GIZMO: prov/verbs/src/ep_rdm/verbs_tagged_ep_rdm.c:443: 
fi_ibv_rdm_tagged_release_remote_sbuff: Assertion `0' failed.

GIZMO:10405 terminated with signal 6 at PC=2add5835c1f7 SP=7fff8071b008.  
Backtrace:
/usr/lib64/libc.so.6(gsignal+0x37)[0x2add5835c1f7]
/usr/lib64/libc.so.6(abort+0x148)[0x2add5835d8e8]
/usr/lib64/libc.so.6(+0x2e266)[0x2add58355266]
/usr/lib64/libc.so.6(+0x2e312)[0x2add58355312]
/lib64/libfabric.so.1(+0x4df43)[0x2add5b87df43]
/lib64/libfabric.so.1(+0x43af2)[0x2add5b873af2]
/lib64/libfabric.so.1(+0x43ea9)[0x2add5b873ea9]


> On Jun 14, 2018, at 7:48 AM, Howard Pritchard  wrote:
> 
> Hello Charles
> 
> You are heading in the right direction.
> 
> First you might want to run the libfabric fi_info command to see what 
> capabilities you picked up from the libfabric RPMs.
> 
> Next you may well not actually be using the OFI  mtl.
> 
> Could you run your app with
> 
> export OMPI_MCA_mtl_base_verbose=100
> 
> and post the output?
> 
> It would also help if you described the system you are using :  OS 
> interconnect cpu type etc. 
> 
> Howard
> 
> Charles A Taylor mailto:chas...@ufl.edu>> schrieb am Do. 
> 14. Juni 2018 um 06:36:
> Because of the issues we are having with OpenMPI and the openib BTL 
> (questions previously asked), I’ve been looking into what other transports 
> are available.  I was particularly interested in OFI/libfabric support but 
> cannot find any information on it more recent than a reference to the usNIC 
> BTL from 2015 (Jeff Squyres, Cisco).  Unfortunately, the openmpi-org website 
> FAQ’s covering OpenFabrics support don’t mention anything beyond OpenMPI 1.8. 
>  Given that 3.1 is the current stable version, that seems odd.
> 
> That being the case, I thought I’d ask here. After laying down the 
> libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up 
> with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi and 
> OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so. 
>   (mpi_waitall() higher up the stack).
> 
> GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.  
> Backtrace:
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]
> 
> Questions: Am I using the OFI MTL as intended?   Should there be an “ofi” 
> BTL?   Does anyone use this?
> 
> Thanks,
> 
> Charlie Taylor
> UF Research Computing
> 
> PS - If you could use some help updating the FAQs, I’d be willing to put in 
> some time.  I’d probably learn a lot.
> ___
> users mailing list
> users@lists.open-mpi.org 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwIFaQ=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=8sBODgXZKw_dNqkFqkTqbGD3_7nNlm_pat-D6AqiaC8=pDOR2yTEZWtS3wHCqrASHkfd22e7kPU3D1XnttWrL7Y=UYlpo1EvM2cQqSZ5N-DoOLoE-G9_kWlffvJ2WfuESP4=
>  
> ___
> users mailing list
> users@lists.open-mpi.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.open-2Dmpi.org_mailman_listinfo_users=DwICAg=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=HOtXciFqK5GlgIgLAxthUQ=nOFQDWuhmU9qhe6be-0JeNMGn1q64kJj0nWQV-vZg7k=PoOVfxkE7rR9spMSFabAs8TokTpgbCIyJRGuWTf5jIk=

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
I see what you mean.  Below is the output (filtered for a single host). Our 
setup is very generic.

Dell SOS6320 hosts (haswell)
Mellanox connectx-3 HCAs (mlx4 drivers - native RHEL, not mofed).
FDR/EDR switches (stand-alone opensm)
RHEL7.4
slurm 16.05.11
pmix (pmix-1.1.5-1.el7.x86_64)
openmpi (3.0.0, 3.1.0)

Apps include the well known, LAMMPS, VASP, GROMACS, amber,  raxml, espresso, 
namd2, (i.e. the usual list of research university apps).
gadget/gizmo/arepo are really the only ones giving us trouble but I know they 
run fine under both openmpi and impi/mpich/mvapich at other sites.  I’m trying 
to figure out why we can’t seem to run it reliably but I’d also like to get 
up-to-date with our transport API’s.  Seems we’ve fallen behind and are just 
doing the things we’ve always done (openib BTL).

I’ll try running with modified “provider_include” list and see what happens.  
The fi_info output shows the verbs, udp, and sockets providers.

Thanks,

Charlie


[chasman@login4 mufasa]$ grep 'c29a-s2.ufhpc' mz0.e 
[c29a-s2.ufhpc:01463] mca: base: components_register: registering framework mtl 
components
[c29a-s2.ufhpc:01463] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01463] mca: base: components_register: component ofi register 
function successful
[c29a-s2.ufhpc:01463] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01463] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01463] mca: base: components_open: component ofi open function 
successful
[c29a-s2.ufhpc:01464] mca: base: components_register: registering framework mtl 
components
[c29a-s2.ufhpc:01464] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01464] mca: base: components_register: component ofi register 
function successful
[c29a-s2.ufhpc:01464] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01464] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01464] mca: base: components_open: component ofi open function 
successful
[c29a-s2.ufhpc:01465] mca: base: components_register: registering framework mtl 
components
[c29a-s2.ufhpc:01465] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01465] mca: base: components_register: component ofi register 
function successful
[c29a-s2.ufhpc:01465] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01465] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01465] mca: base: components_open: component ofi open function 
successful
[c29a-s2.ufhpc:01466] mca: base: components_register: registering framework mtl 
components
[c29a-s2.ufhpc:01466] mca: base: components_register: found loaded component ofi
[c29a-s2.ufhpc:01466] mca: base: components_register: component ofi register 
function successful
[c29a-s2.ufhpc:01466] mca: base: components_open: opening mtl components
[c29a-s2.ufhpc:01466] mca: base: components_open: found loaded component ofi
[c29a-s2.ufhpc:01466] mca: base: components_open: component ofi open function 
successful
[c29a-s2.ufhpc:01463] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01463] mca:base:select:(  mtl) Querying component [ofi]
[c29a-s2.ufhpc:01463] mca:base:select:(  mtl) Query of component [ofi] set 
priority to 25
[c29a-s2.ufhpc:01463] mca:base:select:(  mtl) Selected component [ofi]
[c29a-s2.ufhpc:01463] select: initializing mtl component ofi
[c29a-s2.ufhpc:01464] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01464] mca:base:select:(  mtl) Querying component [ofi]
[c29a-s2.ufhpc:01464] mca:base:select:(  mtl) Query of component [ofi] set 
priority to 25
[c29a-s2.ufhpc:01464] mca:base:select:(  mtl) Selected component [ofi]
[c29a-s2.ufhpc:01464] select: initializing mtl component ofi
[c29a-s2.ufhpc:01465] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01465] mca:base:select:(  mtl) Querying component [ofi]
[c29a-s2.ufhpc:01465] mca:base:select:(  mtl) Query of component [ofi] set 
priority to 25
[c29a-s2.ufhpc:01465] mca:base:select:(  mtl) Selected component [ofi]
[c29a-s2.ufhpc:01465] select: initializing mtl component ofi
[c29a-s2.ufhpc:01466] mca:base:select: Auto-selecting mtl components
[c29a-s2.ufhpc:01466] mca:base:select:(  mtl) Querying component [ofi]
[c29a-s2.ufhpc:01466] mca:base:select:(  mtl) Query of component [ofi] set 
priority to 25
[c29a-s2.ufhpc:01466] mca:base:select:(  mtl) Selected component [ofi]
[c29a-s2.ufhpc:01466] select: initializing mtl component ofi
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:269: mtl:ofi:provider_include = 
"psm,psm2,gni"
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:272: mtl:ofi:provider_exclude = 
"(null)"
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "verbs" not in include 
list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in 
include list
[c29a-s2.ufhpc:01464] mtl_ofi_component.c:280: mtl:ofi: "sockets" not in 
include list
[c29a-s2.ufhpc:01464] 

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Howard Pritchard
Hello Charles

You are heading in the right direction.

First you might want to run the libfabric fi_info command to see what
capabilities you picked up from the libfabric RPMs.

Next you may well not actually be using the OFI  mtl.

Could you run your app with

export OMPI_MCA_mtl_base_verbose=100

and post the output?

It would also help if you described the system you are using :  OS
interconnect cpu type etc.

Howard

Charles A Taylor  schrieb am Do. 14. Juni 2018 um 06:36:

> Because of the issues we are having with OpenMPI and the openib BTL
> (questions previously asked), I’ve been looking into what other transports
> are available.  I was particularly interested in OFI/libfabric support but
> cannot find any information on it more recent than a reference to the usNIC
> BTL from 2015 (Jeff Squyres, Cisco).  Unfortunately, the openmpi-org
> website FAQ’s covering OpenFabrics support don’t mention anything beyond
> OpenMPI 1.8.  Given that 3.1 is the current stable version, that seems odd.
>
> That being the case, I thought I’d ask here. After laying down the
> libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end
> up with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi
> and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in
> libopen-pal.so.   (mpi_waitall() higher up the stack).
>
> GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.
> Backtrace:
>
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d]
>
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
>
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
>
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]
>
> Questions: Am I using the OFI MTL as intended?   Should there be an “ofi”
> BTL?   Does anyone use this?
>
> Thanks,
>
> Charlie Taylor
> UF Research Computing
>
> PS - If you could use some help updating the FAQs, I’d be willing to put
> in some time.  I’d probably learn a lot.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] A couple of general questions

2018-06-14 Thread Gilles Gouaillardet
Charles,


If you are using infiniband hardware, the recommended way is to use UCX.


Cheers,

Gilles

On Thursday, June 14, 2018, Charles A Taylor  wrote:

> Because of the issues we are having with OpenMPI and the openib BTL
> (questions previously asked), I’ve been looking into what other transports
> are available.  I was particularly interested in OFI/libfabric support but
> cannot find any information on it more recent than a reference to the usNIC
> BTL from 2015 (Jeff Squyres, Cisco).  Unfortunately, the openmpi-org
> website FAQ’s covering OpenFabrics support don’t mention anything beyond
> OpenMPI 1.8.  Given that 3.1 is the current stable version, that seems odd.
>
> That being the case, I thought I’d ask here. After laying down the
> libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end
> up with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi
> and OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in
> libopen-pal.so.   (mpi_waitall() higher up the stack).
>
> GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.
> Backtrace:
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-
> pal.so.40(+0x9391d)[0x2b4d4b68a91d]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-
> pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.
> 40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
> /apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.
> 40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]
>
> Questions: Am I using the OFI MTL as intended?   Should there be an “ofi”
> BTL?   Does anyone use this?
>
> Thanks,
>
> Charlie Taylor
> UF Research Computing
>
> PS - If you could use some help updating the FAQs, I’d be willing to put
> in some time.  I’d probably learn a lot.
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] A couple of general questions

2018-06-14 Thread Charles A Taylor
Because of the issues we are having with OpenMPI and the openib BTL (questions 
previously asked), I’ve been looking into what other transports are available.  
I was particularly interested in OFI/libfabric support but cannot find any 
information on it more recent than a reference to the usNIC BTL from 2015 (Jeff 
Squyres, Cisco).  Unfortunately, the openmpi-org website FAQ’s covering 
OpenFabrics support don’t mention anything beyond OpenMPI 1.8.  Given that 3.1 
is the current stable version, that seems odd.

That being the case, I thought I’d ask here. After laying down the 
libfabric-devel RPM and building (3.1.0) with —with-libfabric=/usr, I end up 
with an “ofi” MTL but nothing else.   I can run with OMPI_MCA_mtl=ofi and 
OMPI_MCA_btl=“self,vader,openib” but it eventually crashes in libopen-pal.so.   
(mpi_waitall() higher up the stack).

GIZMO:9185 terminated with signal 11 at PC=2b4d4b68a91d SP=7ffcfbde9ff0.  
Backtrace:
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(+0x9391d)[0x2b4d4b68a91d]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libopen-pal.so.40(opal_progress+0x24)[0x2b4d4b632754]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(ompi_request_default_wait_all+0x11f)[0x2b4d47be2a6f]
/apps/mpi/intel/2018.1.163/openmpi/3.1.0/lib64/libmpi.so.40(PMPI_Waitall+0xbd)[0x2b4d47c2ce4d]

Questions: Am I using the OFI MTL as intended?   Should there be an “ofi” BTL?  
 Does anyone use this?

Thanks,

Charlie Taylor
UF Research Computing

PS - If you could use some help updating the FAQs, I’d be willing to put in 
some time.  I’d probably learn a lot.
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users