Interestingly enough, I have found that using --disable-dlopen causes
the seg fault whether or not --enable-mca-no-build=coll-ml is used. That
is, the following configure line generates a build of Open MPI that will
*not* seg fault when running a simple hello world program:
./configure
David, to modify that option modify the toss-common file. It is in the
same location as the platform file. We have a number of component we
disable by default. Just add coll-ml to the end of the list.
-Nathan
On Thu, Aug 13, 2015 at 05:19:35PM +, Jeff Squyres (jsquyres) wrote:
> Ah, if
Ah, if you're disable-dlopen, then you won't find individual plugin DSOs.
Instead, you can configure this way:
./configure --enable-mca-no-build=coll-ml ...
This will disable the build of the coll/ml component altogether.
> On Aug 13, 2015, at 11:23 AM, David Shrader
David, our platform files disable dlopen. That is why you are not seeing
any component files. coll/ml is built into libmpi.so.
-Nathan
On Thu, Aug 13, 2015 at 09:23:09AM -0600, David Shrader wrote:
> Hey Jeff,
>
> I'm actually not able to find coll_ml related files at that location. All I
>
Hey Jeff,
I'm actually not able to find coll_ml related files at that location.
All I see are the following files:
[dshrader@zo-fe1 openmpi]$ ls
/usr/projects/hpcsoft/toss2/zorrillo/openmpi/1.8.8-gcc-4.4/lib/openmpi/
libompi_dbg_msgq.a libompi_dbg_msgq.la libompi_dbg_msgq.so
In this
I don't have that option on the configure command line, but my platform
file is using "enable_dlopen=no." I imagine that is getting the same
result. Thank you for the pointer!
Thanks,
David
On 08/12/2015 05:04 PM, Deva wrote:
do you have "-disable-dlopen" in your configure option? This might
Note that this will require you to have fairly recent GNU Autotools installed.
Another workaround for avoiding the coll ml module would be to install Open MPI
as normal, and then rm the following files after installation:
rm $prefix/lib/openmpi/mca_coll_ml*
This will physically remove the
David,
i guess you do not want to use the ml coll module at all in openmpi 1.8.8
you can simply do
touch ompi/mca/coll/ml/.ompi_ignore
./autogen.pl
./configure ...
make && make install
so the ml component is not even built
Cheers,
Gilles
On 8/13/2015 7:30 AM, David Shrader wrote:
I
do you have "-disable-dlopen" in your configure option? This might force
coll_ml to be loaded first even with -mca coll ^ml.
next HPCX is expected to release by end of Aug.
-Devendar
On Wed, Aug 12, 2015 at 3:30 PM, David Shrader wrote:
> I remember seeing those, but forgot
I remember seeing those, but forgot about them. I am curious, though,
why using '-mca coll ^ml' wouldn't work for me.
We'll watch for the next HPCX release. Is there an ETA on when that
release may happen? Thank you for the help!
David
On 08/12/2015 04:04 PM, Deva wrote:
David,
This is
David,
This is because of hcoll symbols conflict with ml coll module inside OMPI.
HCOLL is derived from ml module. This issue is fixed in hcoll library and
will be available in next HPCX release.
Some earlier discussion on this issue:
The admin that rolled the hcoll rpm that we're using (and got it in
system space) said that she got it from
hpcx-v1.3.336-gcc-OFED-1.5.4.1-redhat6.6-x86_64.tar.
Thanks,
David
On 08/12/2015 10:51 AM, Deva wrote:
From where did you grab this HCOLL lib? MOFED or HPCX? what version?
On Wed,
>From where did you grab this HCOLL lib? MOFED or HPCX? what version?
On Wed, Aug 12, 2015 at 9:47 AM, David Shrader wrote:
> Hey Devendar,
>
> It looks like I still get the error:
>
> [dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out
> App launch reported: 1 (out
Hey Devendar,
It looks like I still get the error:
Konsole output
[dshrader@zo-fe1 tests]$ mpirun -n 2 -mca coll ^ml ./a.out
App launch reported: 1 (out of 1) daemons - 2 (out of 2) procs
[1439397957.351764] [zo-fe1:14678:0] shm.c:65 MXM WARN Could
not open the KNEM device file at
Hi David,
This issue is from hcoll library. This could be because of symbol conflict
with ml module. This is fixed recently in HCOLL. Can you try with "-mca
coll ^ml" and see if this workaround works in your setup?
-Devendar
On Wed, Aug 12, 2015 at 9:30 AM, David Shrader
Hello Gilles,
Thank you very much for the patch! It is much more complete than mine.
Using that patch and re-running autogen.pl, I am able to build 1.8.8
with './configure --with-hcoll' without errors.
I do have issues when it comes to running 1.8.8 with hcoll built in,
however. In my quick
Thanks David,
i made a PR for the v1.8 branch at
https://github.com/open-mpi/ompi-release/pull/492
the patch is attached (it required some back-porting)
Cheers,
Gilles
On 8/12/2015 4:01 AM, David Shrader wrote:
I have cloned Gilles' topic/hcoll_config branch and, after running
autogen.pl,
I have cloned Gilles' topic/hcoll_config branch and, after running
autogen.pl, have found that './configure --with-hcoll' does indeed work
now. I used Gilles' branch as I wasn't sure how best to get the pull
request changes in to my own clone of master. It looks like the proper
checks are
On Aug 11, 2015, at 1:39 AM, Åke Sandgren wrote:
>
> Please fix the hcoll test (and code) to be correct.
>
> Any configure test that adds /usr/lib and/or /usr/include to any compile
> flags is broken.
+1
Gilles filed https://github.com/open-mpi/ompi/pull/796; I
On 08/11/2015 10:22 AM, Gilles Gouaillardet wrote:
i do not know the context, so i should not jump to any conclusion ...
if xxx.h is in $HCOLL_HOME/include/hcoll in hcoll version Y, but in
$HCOLL_HOME/include/hcoll/api in hcoll version Z, then the relative path
to $HCOLL_HOME/include cannot be
i do not know the context, so i should not jump to any conclusion ...
if xxx.h is in $HCOLL_HOME/include/hcoll in hcoll version Y, but in
$HCOLL_HOME/include/hcoll/api in hcoll version Z, then the relative path
to $HCOLL_HOME/include cannot be hard coded.
anyway, let's assume it is ok to hard
Please fix the hcoll test (and code) to be correct.
Any configure test that adds /usr/lib and/or /usr/include to any compile
flags is broken.
And if hcoll include files are under $HCOLL_HOME/include/hcoll (and
hcoll/api) then the include directives in the source should be
#include
and
David,
the configure help is misleading about hcoll ...
--with-hcoll(=DIR) Build hcoll (Mellanox Hierarchical Collectives)
support, searching for libraries in DIR
the =DIR is not really optional ...
configure will not complain if you configure with --with-hcoll
23 matches
Mail list logo