Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-19 Thread Barrett, Brian via users
Yeah, there’s no good answer here from an “automatically do the right thing” 
point of view.  The reachable:netlink component (which is used for the TCP BTL) 
only works with libnl-3 because libnl-1 is a real pain to deal with if you’re 
trying to parse route behaviors.  It will do the right thing if you’re using 
OpenIB (the other place the libnl-1/libnl-3 thing comes into play) because 
OpenIB runs its configure test before reachable:netlink, but UCX’s tests run 
way later (for reasons that aren’t fixable).

Mellanox should really update everything to use libnl3 so that there’s at least 
hope of getting the right answer (not just in Open MPI, but in general; libnl-1 
is old and not awesome).  In the mean time, I *think* you can work around this 
problem via two paths.  First, which I know will work, is to remove the libnl-3 
devel package.  That’s probably not optimal for obvious reasons.  The second is 
to specify --enable-mca-no-build=reachable-netlink, which will disable the 
component that is preferring libnl-3 and then UCX should be happy.

Hope this helps,

Brian

> On Sep 19, 2018, at 9:12 AM, Jeff Squyres (jsquyres) via users 
>  wrote:
> 
> Alan --
> 
> Sorry for the delay.
> 
> I agree with Gilles: Brian's commit had to do with "reachable" plugins in 
> Open MPI -- they do not appear to be the problem here.
> 
> From the config.log you sent, it looks like configure aborted because you 
> requested UCX support (via --with-ucx) but configure wasn't able to find it.  
> And it looks like it didn't find it because of libnl v1 vs. v3 issues, as you 
> stated.
> 
> I think we're going to have to refer you to Mellanox support on this one.  
> The libnl situation is kind of a nightmare: your entire stack must be 
> compiled for either libnl v1 *or* v3.  If you have both libnl v1 *and* v3 
> appear in a process together, the process will crash before main() even 
> executes.  :-(  This is precisely why we have these warnings in Open MPI's 
> configure.
> 
> 
> 
> 
>> On Sep 14, 2018, at 4:35 PM, Alan Wild  wrote:
>> 
>> As request I've attached the config.log.  I also included the output from 
>> configure itself.
>> 
>> -Alan
>> 
>> On Fri, Sep 14, 2018, 10:20 AM Alan Wild  wrote:
>> I apologize if this has been discussed before but I've been unable to find 
>> discussion on the topic.
>> 
>> I recently went to build 3.1.2 on our cluster only to have the build 
>> completely fail during configure due to issues with libnl versions.
>> 
>> Specifically I was had requested support for  mellanox's libraries (mxm, 
>> hcoll, sharp, etc) which was fine for me in 3.0.0 and 3.0.1.  However it 
>> appears all of those libraries are built with libnl version 1 but the 
>> netlink component is now requiring netlink version 3 and aborts the build if 
>> it finds anything else in LIBS that using version 1.
>> 
>> I don't believe mellanox's is providing releases of these libraries linked 
>> agsinst liblnl version 3 (love to find out I'm wrong on that) at least not 
>> for CentOS 6.9.
>> 
>> According to github, it appears bwbarret's commit a543e7f (from one year ago 
>> today) which was merged into 3.1.0 is responsible.  However I'm having a 
>> hard time believing that openmpi would want to break support for these 
>> libraries or there isn't some other kind of workaround.
>> 
>> I'm on a short timeline to deliver this build of openmpi to my users but I 
>> know they won't accept a build that doesn't support mellanox's libraries.
>> 
>> Hoping there's an easy fix here (short of trying to reverse the commit in my 
>> build) that I'm overlooking here.
>> 
>> Thanks,
>> 
>> -Alan
>> ___
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> 
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-19 Thread Jeff Squyres (jsquyres) via users
Alan --

Sorry for the delay.

I agree with Gilles: Brian's commit had to do with "reachable" plugins in Open 
MPI -- they do not appear to be the problem here.

>From the config.log you sent, it looks like configure aborted because you 
>requested UCX support (via --with-ucx) but configure wasn't able to find it.  
>And it looks like it didn't find it because of libnl v1 vs. v3 issues, as you 
>stated.

I think we're going to have to refer you to Mellanox support on this one.  The 
libnl situation is kind of a nightmare: your entire stack must be compiled for 
either libnl v1 *or* v3.  If you have both libnl v1 *and* v3 appear in a 
process together, the process will crash before main() even executes.  :-(  
This is precisely why we have these warnings in Open MPI's configure.




> On Sep 14, 2018, at 4:35 PM, Alan Wild  wrote:
> 
> As request I've attached the config.log.  I also included the output from 
> configure itself.
> 
> -Alan
> 
> On Fri, Sep 14, 2018, 10:20 AM Alan Wild  wrote:
> I apologize if this has been discussed before but I've been unable to find 
> discussion on the topic.
> 
> I recently went to build 3.1.2 on our cluster only to have the build 
> completely fail during configure due to issues with libnl versions.
> 
> Specifically I was had requested support for  mellanox's libraries (mxm, 
> hcoll, sharp, etc) which was fine for me in 3.0.0 and 3.0.1.  However it 
> appears all of those libraries are built with libnl version 1 but the netlink 
> component is now requiring netlink version 3 and aborts the build if it finds 
> anything else in LIBS that using version 1.
> 
> I don't believe mellanox's is providing releases of these libraries linked 
> agsinst liblnl version 3 (love to find out I'm wrong on that) at least not 
> for CentOS 6.9.
> 
> According to github, it appears bwbarret's commit a543e7f (from one year ago 
> today) which was merged into 3.1.0 is responsible.  However I'm having a hard 
> time believing that openmpi would want to break support for these libraries 
> or there isn't some other kind of workaround.
> 
> I'm on a short timeline to deliver this build of openmpi to my users but I 
> know they won't accept a build that doesn't support mellanox's libraries.
> 
> Hoping there's an easy fix here (short of trying to reverse the commit in my 
> build) that I'm overlooking here.
> 
> Thanks,
> 
> -Alan
> ___
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-14 Thread Alan Wild
As request I've attached the config.log.  I also included the output from
configure itself.

-Alan

On Fri, Sep 14, 2018, 10:20 AM Alan Wild  wrote:

> I apologize if this has been discussed before but I've been unable to find
> discussion on the topic.
>
> I recently went to build 3.1.2 on our cluster only to have the build
> completely fail during configure due to issues with libnl versions.
>
> Specifically I was had requested support for  mellanox's libraries (mxm,
> hcoll, sharp, etc) which was fine for me in 3.0.0 and 3.0.1.  However it
> appears all of those libraries are built with libnl version 1 but the
> netlink component is now requiring netlink version 3 and aborts the build
> if it finds anything else in LIBS that using version 1.
>
> I don't believe mellanox's is providing releases of these libraries linked
> agsinst liblnl version 3 (love to find out I'm wrong on that) at least not
> for CentOS 6.9.
>
> According to github, it appears bwbarret's commit a543e7f (from one year
> ago today) which was merged into 3.1.0 is responsible.  However I'm having
> a hard time believing that openmpi would want to break support for these
> libraries or there isn't some other kind of workaround.
>
> I'm on a short timeline to deliver this build of openmpi to my users but I
> know they won't accept a build that doesn't support mellanox's libraries.
>
> Hoping there's an easy fix here (short of trying to reverse the commit in
> my build) that I'm overlooking here.
>
> Thanks,
>
> -Alan
>


openmpi-3.1.2.config.tar.xz
Description: Binary data
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-14 Thread Gilles Gouaillardet
Alan,

Can you please compress and post your config.log ?

My understanding of the mentioned commit is it does not build the
reachable/netlink component if libnl version 1 is used (by third party libs
such as mxm).
I do not believe it should abort configure

Cheers,

Gilles

On Saturday, September 15, 2018, Alan Wild  wrote:

> I apologize if this has been discussed before but I've been unable to find
> discussion on the topic.
>
> I recently went to build 3.1.2 on our cluster only to have the build
> completely fail during configure due to issues with libnl versions.
>
> Specifically I was had requested support for  mellanox's libraries (mxm,
> hcoll, sharp, etc) which was fine for me in 3.0.0 and 3.0.1.  However it
> appears all of those libraries are built with libnl version 1 but the
> netlink component is now requiring netlink version 3 and aborts the build
> if it finds anything else in LIBS that using version 1.
>
> I don't believe mellanox's is providing releases of these libraries linked
> agsinst liblnl version 3 (love to find out I'm wrong on that) at least not
> for CentOS 6.9.
>
> According to github, it appears bwbarret's commit a543e7f (from one year
> ago today) which was merged into 3.1.0 is responsible.  However I'm having
> a hard time believing that openmpi would want to break support for these
> libraries or there isn't some other kind of workaround.
>
> I'm on a short timeline to deliver this build of openmpi to my users but I
> know they won't accept a build that doesn't support mellanox's libraries.
>
> Hoping there's an easy fix here (short of trying to reverse the commit in
> my build) that I'm overlooking here.
>
> Thanks,
>
> -Alan
>
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

[OMPI users] How do I build 3.1.0 (or later) with mellanox's libraries

2018-09-14 Thread Alan Wild
I apologize if this has been discussed before but I've been unable to find
discussion on the topic.

I recently went to build 3.1.2 on our cluster only to have the build
completely fail during configure due to issues with libnl versions.

Specifically I was had requested support for  mellanox's libraries (mxm,
hcoll, sharp, etc) which was fine for me in 3.0.0 and 3.0.1.  However it
appears all of those libraries are built with libnl version 1 but the
netlink component is now requiring netlink version 3 and aborts the build
if it finds anything else in LIBS that using version 1.

I don't believe mellanox's is providing releases of these libraries linked
agsinst liblnl version 3 (love to find out I'm wrong on that) at least not
for CentOS 6.9.

According to github, it appears bwbarret's commit a543e7f (from one year
ago today) which was merged into 3.1.0 is responsible.  However I'm having
a hard time believing that openmpi would want to break support for these
libraries or there isn't some other kind of workaround.

I'm on a short timeline to deliver this build of openmpi to my users but I
know they won't accept a build that doesn't support mellanox's libraries.

Hoping there's an easy fix here (short of trying to reverse the commit in
my build) that I'm overlooking here.

Thanks,

-Alan
___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users