Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread David Shrader
Interestingly enough, I have found that using --disable-dlopen causes the seg fault whether or not --enable-mca-no-build=coll-ml is used. That is, the following configure line generates a build of Open MPI that will *not* seg fault when running a simple hello world program: ./configure

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Nathan Hjelm
David, to modify that option modify the toss-common file. It is in the same location as the platform file. We have a number of component we disable by default. Just add coll-ml to the end of the list. -Nathan On Thu, Aug 13, 2015 at 05:19:35PM +, Jeff Squyres (jsquyres) wrote: > Ah, if

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Jeff Squyres (jsquyres)
Ah, if you're disable-dlopen, then you won't find individual plugin DSOs. Instead, you can configure this way: ./configure --enable-mca-no-build=coll-ml ... This will disable the build of the coll/ml component altogether. > On Aug 13, 2015, at 11:23 AM, David Shrader

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Nathan Hjelm
David, our platform files disable dlopen. That is why you are not seeing any component files. coll/ml is built into libmpi.so. -Nathan On Thu, Aug 13, 2015 at 09:23:09AM -0600, David Shrader wrote: > Hey Jeff, > > I'm actually not able to find coll_ml related files at that location. All I >

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread David Shrader
Hey Jeff, I'm actually not able to find coll_ml related files at that location. All I see are the following files: [dshrader@zo-fe1 openmpi]$ ls /usr/projects/hpcsoft/toss2/zorrillo/openmpi/1.8.8-gcc-4.4/lib/openmpi/ libompi_dbg_msgq.a libompi_dbg_msgq.la libompi_dbg_msgq.so In this

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread David Shrader
I don't have that option on the configure command line, but my platform file is using "enable_dlopen=no." I imagine that is getting the same result. Thank you for the pointer! Thanks, David On 08/12/2015 05:04 PM, Deva wrote: do you have "-disable-dlopen" in your configure option? This might

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Jeff Squyres (jsquyres)
Note that this will require you to have fairly recent GNU Autotools installed. Another workaround for avoiding the coll ml module would be to install Open MPI as normal, and then rm the following files after installation: rm $prefix/lib/openmpi/mca_coll_ml* This will physically remove the

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-13 Thread Gilles Gouaillardet
David, i guess you do not want to use the ml coll module at all in openmpi 1.8.8 you can simply do touch ompi/mca/coll/ml/.ompi_ignore ./autogen.pl ./configure ... make && make install so the ml component is not even built Cheers, Gilles On 8/13/2015 7:30 AM, David Shrader wrote: I

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
do you have "-disable-dlopen" in your configure option? This might force coll_ml to be loaded first even with -mca coll ^ml. next HPCX is expected to release by end of Aug. -Devendar On Wed, Aug 12, 2015 at 3:30 PM, David Shrader wrote: > I remember seeing those, but forgot

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
I remember seeing those, but forgot about them. I am curious, though, why using '-mca coll ^ml' wouldn't work for me. We'll watch for the next HPCX release. Is there an ETA on when that release may happen? Thank you for the help! David On 08/12/2015 04:04 PM, Deva wrote: David, This is

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
David, This is because of hcoll symbols conflict with ml coll module inside OMPI. HCOLL is derived from ml module. This issue is fixed in hcoll library and will be available in next HPCX release. Some earlier discussion on this issue:

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
The admin that rolled the hcoll rpm that we're using (and got it in system space) said that she got it from hpcx-v1.3.336-gcc-OFED-1.5.4.1-redhat6.6-x86_64.tar. Thanks, David On 08/12/2015 10:51 AM, Deva wrote: From where did you grab this HCOLL lib? MOFED or HPCX? what version? On Wed,

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
>From where did you grab this HCOLL lib? MOFED or HPCX? what version? On Wed, Aug 12, 2015 at 9:47 AM, David Shrader wrote: > Hey Devendar, > > It looks like I still get the error: > > [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out > App launch reported: 1 (out

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
Hey Devendar, It looks like I still get the error: Konsole output [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out App launch reported: 1 (out of 1) daemons - 2 (out of 2) procs [1439397957.351764] [zo-fe1:14678:0] shm.c:65 MXM WARN Could not open the KNEM device file at

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Deva
Hi David, This issue is from hcoll library. This could be because of symbol conflict with ml module. This is fixed recently in HCOLL. Can you try with "-mca coll ^ml" and see if this workaround works in your setup? -Devendar On Wed, Aug 12, 2015 at 9:30 AM, David Shrader

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread David Shrader
Hello Gilles, Thank you very much for the patch! It is much more complete than mine. Using that patch and re-running autogen.pl, I am able to build 1.8.8 with './configure --with-hcoll' without errors. I do have issues when it comes to running 1.8.8 with hcoll built in, however. In my quick

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-12 Thread Gilles Gouaillardet
Thanks David, i made a PR for the v1.8 branch at https://github.com/open-mpi/ompi-release/pull/492 the patch is attached (it required some back-porting) Cheers, Gilles On 8/12/2015 4:01 AM, David Shrader wrote: I have cloned Gilles' topic/hcoll_config branch and, after running autogen.pl,

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread David Shrader
I have cloned Gilles' topic/hcoll_config branch and, after running autogen.pl, have found that './configure --with-hcoll' does indeed work now. I used Gilles' branch as I wasn't sure how best to get the pull request changes in to my own clone of master. It looks like the proper checks are

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread Jeff Squyres (jsquyres)
On Aug 11, 2015, at 1:39 AM, Åke Sandgren wrote: > > Please fix the hcoll test (and code) to be correct. > > Any configure test that adds /usr/lib and/or /usr/include to any compile > flags is broken. +1 Gilles filed https://github.com/open-mpi/ompi/pull/796; I

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread Åke Sandgren
On 08/11/2015 10:22 AM, Gilles Gouaillardet wrote: i do not know the context, so i should not jump to any conclusion ... if xxx.h is in $HCOLL_HOME/include/hcoll in hcoll version Y, but in $HCOLL_HOME/include/hcoll/api in hcoll version Z, then the relative path to $HCOLL_HOME/include cannot be

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread Gilles Gouaillardet
i do not know the context, so i should not jump to any conclusion ... if xxx.h is in $HCOLL_HOME/include/hcoll in hcoll version Y, but in $HCOLL_HOME/include/hcoll/api in hcoll version Z, then the relative path to $HCOLL_HOME/include cannot be hard coded. anyway, let's assume it is ok to hard

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-11 Thread Åke Sandgren
Please fix the hcoll test (and code) to be correct. Any configure test that adds /usr/lib and/or /usr/include to any compile flags is broken. And if hcoll include files are under $HCOLL_HOME/include/hcoll (and hcoll/api) then the include directives in the source should be #include and

Re: [OMPI users] Open MPI 1.8.8 and hcoll in system space

2015-08-10 Thread Gilles Gouaillardet
David, the configure help is misleading about hcoll ... --with-hcoll(=DIR) Build hcoll (Mellanox Hierarchical Collectives) support, searching for libraries in DIR the =DIR is not really optional ... configure will not complain if you configure with --with-hcoll